Chapter 1.6

Content-Addressable Storage

How do those crazy SHA-1 hashes actually get saved inside `.git/objects/`? Git fundamentally changes how it categorizes data.

In a normal operating system, the "Key" to to find your data is the filename. In Git's database, the "Key" is the SHA-1 hash of the contents. Change the filename below, and watch how Git responds. Then, change the contents.

Your Text Editor (Working Directory)

Standard OS Filesystem

KEY
VALUE
app.js
console.log('hi');

Standard OS looks up files by their name.

Git Object Database

KEY (SHA-1)
b2e650d3fc3c04f9eafaa9430c5e7b2f6ef329f6
zlib_compress("blob 18\0console.log('hi');")
Storage location inside .git:
.git/objects/b2/
e650d3fc3c04f9eafaa9430c5e7b2f6ef329f6

Git ignores the filename entirely here! It looks up blobs strictly by their content's hash.

The Magic Trick

Did you notice that changing the Filename in the top-left box did absolutely nothing to the Git Object Database?

Git is completely blind to filenames at this level. When you save a file, it hashes the text inside. If the hash matches an existing file in `.git/objects/`, Git just ignores it. This is why renaming a folder with 5,000 files in Git takes 1 millisecond. Git doesn't move 5,000 files—it just updates a tiny text pointer to say "Hey, those 5,000 blobs you already have mapped? They live in this new folder now."

Why split the hash intro a 2-character folder?

In `.git/objects/`, you'll see a bunch of folders with 2-character names like `4b` or `e6`. Inside them, you'll see the files named with 38 characters.

Why? Performance. Operating systems are very bad at rendering a single folder containing 100,000 files. By automatically splitting the first two characters of the hash into a directory name, Git ensures that no single folder ever has more than 2^8 files in it. It's a brilliant hardware optimization trick.

Try It: Hashing manually

You can actually bypass git commit and use Git's plumbing directly to hash any string you want into the `.git/objects/` database!

Terminal
$echo "hello from the other side" | git hash-object -w --stdin
cc94870f2f3efd6837ca7c8ebcf4ecbf7dd7cd25
$ls .git/objects/cc
94870f2f3efd6837ca7c8ebcf4ecbf7dd7cd25