Content-Addressable Storage
How do those crazy SHA-1 hashes actually get saved inside `.git/objects/`? Git fundamentally changes how it categorizes data.
In a normal operating system, the "Key" to to find your data is the filename. In Git's database, the "Key" is the SHA-1 hash of the contents. Change the filename below, and watch how Git responds. Then, change the contents.
Your Text Editor (Working Directory)
Standard OS Filesystem
Standard OS looks up files by their name.
Git Object Database
Git ignores the filename entirely here! It looks up blobs strictly by their content's hash.
The Magic Trick
Did you notice that changing the Filename in the top-left box did absolutely nothing to the Git Object Database?
Git is completely blind to filenames at this level. When you save a file, it hashes the text inside. If the hash matches an existing file in `.git/objects/`, Git just ignores it. This is why renaming a folder with 5,000 files in Git takes 1 millisecond. Git doesn't move 5,000 files—it just updates a tiny text pointer to say "Hey, those 5,000 blobs you already have mapped? They live in this new folder now."
Why split the hash intro a 2-character folder?
In `.git/objects/`, you'll see a bunch of folders with 2-character names like `4b` or `e6`. Inside them, you'll see the files named with 38 characters.
Why? Performance. Operating systems are very bad at rendering a single folder containing 100,000 files. By automatically splitting the first two characters of the hash into a directory name, Git ensures that no single folder ever has more than 2^8 files in it. It's a brilliant hardware optimization trick.
Try It: Hashing manually
You can actually bypass git commit and use Git's plumbing directly to hash any string you want into the `.git/objects/` database!