Chapter 1.6

Content-Addressable Storage

How do those crazy SHA-1 hashes actually get saved inside `.git/objects/`? Git fundamentally changes how it categorizes data.

In a normal operating system, the "Key" to to find your data is the filename. In Git's database, the "Key" is the SHA-1 hash of the contents. Change the filename below, and watch how Git responds. Then, change the contents.

Your Text Editor (Working Directory)

Filename

File Contents

Standard OS Filesystem

KEY

VALUE

app.js

console.log('hi');

Standard OS looks up files by their name.

Git Object Database

KEY (SHA-1)

VALUE

b2e650d3fc3c04f9eafaa9430c5e7b2f6ef329f6

zlib_compress("blob 18\0console.log('hi');")

Storage location inside .git:

.git/objects/b2/

e650d3fc3c04f9eafaa9430c5e7b2f6ef329f6

Git ignores the filename entirely here! It looks up blobs strictly by their content's hash.

The Magic Trick

Did you notice that changing the Filename in the top-left box did absolutely nothing to the Git Object Database?

Git is completely blind to filenames at this level. When you save a file, it hashes the text inside. If the hash matches an existing file in `.git/objects/`, Git just ignores it. This is why renaming a folder with 5,000 files in Git takes 1 millisecond. Git doesn't move 5,000 files—it just updates a tiny text pointer to say "Hey, those 5,000 blobs you already have mapped? They live in this new folder now."

Why split the hash intro a 2-character folder?

In `.git/objects/`, you'll see a bunch of folders with 2-character names like `4b` or `e6`. Inside them, you'll see the files named with 38 characters.

Why? Performance. Operating systems are very bad at rendering a single folder containing 100,000 files. By automatically splitting the first two characters of the hash into a directory name, Git ensures that no single folder ever has more than 2^8 files in it. It's a brilliant hardware optimization trick.

Try It: Hashing manually

You can actually bypass git commit and use Git's plumbing directly to hash any string you want into the `.git/objects/` database!

Terminal

$echo "hello from the other side" | git hash-object -w --stdin

cc94870f2f3efd6837ca7c8ebcf4ecbf7dd7cd25

$ls .git/objects/cc

94870f2f3efd6837ca7c8ebcf4ecbf7dd7cd25