Chapter 1.5

SHA-1 Hashes

Those scary 40-character hex strings you see everywhere in Git? They are the cryptographic heart of the database. Let's demystify them.

The Avalanche Effect

d
3
4
8
6
a
e
9
1
3
6
e
7
8
5
6
b
c
4
2
2
1
2
3
8
5
e
a
7
9
7
0
9
4
4
7
5
8
0
2

Try it: Change just a single letter or add a space to the input above. Notice how the entire 40-character hash fingerprint completely changes mathematical structure? This guarantees silent data corruption is physically impossible in Git.

Why Git Uses Hashes

Git doesn't name files like `v1`, `v2`, or `FINAL`. Git mathematically crunches the contents of a file into a precise 40-character string (the SHA-1 hash), and uses that string as the file's ID.

This architectural choice creates three massive advantages:

Absolute Integrity

You literally cannot change a file, date, or author message without recalculating its hash. Malicious code injection or hard drive corruption will instantly mismatch the hash signature and Git will alert you.

Built-in Deduplication

If you copy a 50MB image 10 times in your project, Git doesn't store it 10 times. Since the contents are identical, generating the hash results in the exact same ID. It only gets stored once.

Lightning Fast Comparison

Instead of comparing the code line-by-line across 10,000 files to find changes, Git just compares two 40-character hash strings in milliseconds.

Gotcha: 'Can two different files produce the same hash?'

This is called a Hash Collision. Mathematically? Yes, it is possible.

Practically? The chances are so infinitesimally small that if you had 5 million programmers each generating one commit per second for the history of the universe, the sun would explode before you ran into a collision. You don't need to worry about it.