Part II: Chapter 2.7

The Packfile (Compression)

Git is famous for being incredibly fast and lightweight. But if it saves an entirely new Blob every time you edit a file, wouldn't the repository eventually explode in size?

Garbage Collection

The answer is Yes. When you are writing code, Git optimizes for speed by saving loose objects (entire uncompressed blobs). If you save a 1MB file 50 times, your `.git` folder grows by 50MB.

But occasionally, Git automatically runs a command called git gc (Garbage Collect). This routine sweeps through all your objects, finds versions that look similar, and compresses them into a single binary file called a Packfile using Delta Compression.

Space Used: 622 KB

.git/objects/ (Loose Objects)

BLOBa1a1
Version 1 (Base)
124 KB
BLOBb2b2
Version 2 (Added Header)
125 KB
BLOBc3c3
Version 3 (Fixed Typo)
125 KB
BLOBd4d4
Version 4 (Added Footer)
128 KB
BLOBe5e5
Version 5 (Removed CSS)
120 KB

.git/objects/pack/

No Packfiles present.

Delta Compression

In the interactive simulation above, we have 5 versions of the exact same file. In their loose uncompressed state, they total 622 KB of storage.

When you click $ git gc, Git discovers that these 5 files are 99% identical. Instead of storing 5 full copies, it stores the largest most recent file as the Base Object. For the remaining 4 files, it simply calculates the mathematical difference ("The Delta") between them.

Reverse Engineering History

Unlike typical diffing where you start at Version 1 and apply diffs to reach Version 5, Git stores Version 5 intact, and stores the diffs to go backwards to Version 1! Why? Because you are overwhelmingly more likely to read recent code than 5-year-old code.

Loose vs Packed

If you look inside a fresh `.git/objects` folder, you will see your two-character hash folders containing loose objects. If you clone a repository from GitHub, you'll likely see almost nothing there—because GitHub sends repositories over the network pre-packed into the `pack/` directory to save bandwidth.

Terminal
$ls .git/objects/
info/ pack/ 1a/ 2b/ c3/
$git count-objects -v
count: 5
size: 622
in-pack: 0
$git gc
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (5/5), done.
$git count-objects -v
count: 0
size: 0
in-pack: 5