Part II: Chapter 2.1

Blobs (The Raw Bytes)

The most fundamental object in Git is the Blob. It stores your file data. But here is the secret: A Blob does not know its own filename.

Stripped Metadata

If you take an image named `puppy.jpg` and copy it to `dog.jpg`, your OS uses twice the storage. In Git, because both files resolve to the exact same raw bytes, Git ignores the filenames and only stores one single Blob object. It deduplicates your entire repository automatically.

Working Directory (What you see)

↓ How Git saves it under the hood ↓

The Tree Object

Object Hash:
b2xb357851f9eafaa9430c5e7b2f6efb357851
Mode
Type
Blob Hash Pointer
Filename
100644
blob
b2x685a9...
index.js
100644
blob
b2x685a9...
utils.js
100644
blob
b2x1e2b2...
style.css

A Tree is basically a directory file. It maps human-readable filenames to the raw cryptographic Blobs.

The Blob Objects

BLOB OBJECTb2x685a9a8df9eafaa9430c5e7b2f6ef685a9a8d
console.log('hello');
DEDUPLICATED! Used by 2 files.
BLOB OBJECTb2x1e2b2510f9eafaa9430c5e7b2f6ef1e2b2510
body { background: black; }

Blobs strip away filenames. If you have 5,000 files with the exact same content, Git only stores ONE blob.

Try It Out

In the interactive simulation above, notice the two files: index.js and utils.js. They both have the exact same contents: console.log('hello');.

If you look at the BLOB OBJECTS section on the bottom right, you will see it only created ONE blob! Git completely ignores your filename when saving file data. Hover over that blob to see which files are sharing it.

Testing Deduplication

In the simulation, try deleting the word 'hello' from `index.js`. The moment the contents change, Git creates a completely new Blob object down below!

Reading a Blob

We can use Git plumbing commands to inspect a Blob if we know its hash. Since blobs only contain data and no filenames, the output is just raw text.

Terminal
$git cat-file -p b2e650d3fc3c04f9eafaa9430c5e7b2f6ef329f6
console.log('hello');
$git cat-file -t b2e650d3fc3c04f9eafaa9430c5e7b2f6ef329f6
blob