Skip to main content

Command Palette

Search for a command to run...

Inside Git: How It Works and the Role of the .git Folder

Published
11 min read
A

MERN Stack Developer

When you type git commit, have you ever wondered what actually happens behind the scenes? Most developers use Git daily without understanding its internal machinery. This article pulls back the curtain to reveal how Git works at a fundamental level, helping you develop a mental model that goes far beyond memorizing commands.

The .git Folder: Your Repository's Brain

Every Git repository has a .git folder. This hidden directory is the entire repository—it contains everything Git needs to reconstruct your project's complete history. Delete this folder, and your version control disappears. Copy it, and you've backed up everything.

When you run git init, Git creates this folder with a specific structure. Think of it as a database that stores snapshots of your project over time, along with metadata about who made changes, when, and why.

What Lives Inside .git?

The .git folder contains several key components:

HEAD - This is a pointer that tells Git which branch you're currently working on. When you switch branches with git checkout, HEAD moves to point to the new branch. It's like a bookmark showing where you are in your project's history.

objects/ - This is the content database, the heart of Git's storage system. Every version of every file, every commit message, and every directory structure is stored here as an "object." Git compresses these objects and names them using cryptographic hashes.

refs/ - These are friendly names that point to specific commits. Your branches live here as files containing commit hashes. When you create a branch called feature-login, Git creates a file at refs/heads/feature-login that contains the hash of the commit that branch points to.

index - This is the staging area, stored as a binary file. When you run git add, Git updates this index to prepare for the next commit. Think of it as a draft of your next commit.

config - Repository-specific configuration settings live here. This includes information about remote repositories, your name and email for commits, and various Git behaviors.

hooks/ - This directory contains scripts that Git can execute automatically at certain points in the version control workflow. For example, you can run tests before allowing a commit.

Understanding Git Objects: The Building Blocks

Git doesn't store files the way you might expect. Instead, it uses a content-addressable storage system built on three fundamental object types. Understanding these objects is key to understanding how Git works.

The Blob: Pure Content Storage

A blob (binary large object) is how Git stores file content. When you add a file to Git, it doesn't store the filename or any metadata—just the raw content of the file.

Here's what makes blobs interesting: Git identifies each blob by running the content through a SHA-1 hash function. This produces a 40-character hexadecimal string that serves as the blob's unique identifier. If two files have identical content, even if they have different names or are in different directories, Git stores only one blob.

This is incredibly efficient. Imagine you have README.md and INSTALL.md with identical content. Git stores this content once, and both files point to the same blob. Change one character in either file, and Git creates a new blob—but the old one remains unchanged, preserving history.

The hash isn't arbitrary. It's calculated from the content itself, which means:

  • The same content always produces the same hash

  • Different content (even by a single byte) produces a completely different hash

  • You can verify content hasn't been corrupted by recalculating the hash

The Tree: Directory Structure Representation

While blobs store file content, trees store directory structure. A tree object is like a snapshot of a directory at a moment in time. It contains a list of entries, where each entry is either a blob (representing a file) or another tree (representing a subdirectory).

Each entry in a tree has several pieces of information:

  • The file mode (permissions, like whether it's executable)

  • The type (blob or tree)

  • The hash of the object it points to

  • The filename

Let's visualize a simple project structure:

project/
├── README.md
├── src/
│   ├── main.js
│   └── utils.js
└── package.json

Git represents this with a tree object for the root directory. This tree contains:

  • An entry pointing to a blob for README.md

  • An entry pointing to a tree for the src/ directory

  • An entry pointing to a blob for package.json

The src/ tree contains:

  • An entry pointing to a blob for main.js

  • An entry pointing to a blob for utils.js

Trees are also identified by SHA-1 hashes calculated from their content. If any file changes, or if any filename changes, the tree gets a new hash. This cascades upward—if a file deep in your directory structure changes, every tree from that point to the root gets a new hash.

The Commit: Tying It All Together

A commit object is a snapshot of your entire project at a specific point in time. It's the object you create when you run git commit.

A commit contains several pieces of information:

Tree hash - This points to the root tree object representing your project's complete directory structure at this moment. Following this hash lets you reconstruct every file and folder exactly as they were.

Parent commit(s) - Most commits have one parent—the commit that came before. Merge commits have multiple parents. The first commit in a repository has no parent.

Author information - Name and email of who wrote the changes, plus a timestamp.

Committer information - Usually the same as author, but can differ if someone else applied the changes (like in email-based workflows).

Commit message - Your description of what changed and why.

The commit itself is also stored as an object with a SHA-1 hash. This hash becomes the commit's unique identifier—the long string you see in git log.

How These Objects Connect: A Complete Picture

Let's trace through a concrete example. You have a project with this structure:

myapp/
├── index.html
└── styles.css

When you commit this, Git creates:

Two blobs:

  • One containing the content of index.html

  • One containing the content of styles.css

One tree representing the myapp/ directory, containing:

  • Entry: "index.html" → points to index.html blob

  • Entry: "styles.css" → points to styles.css blob

One commit containing:

  • Pointer to the root tree

  • Author: Your name and email

  • Date: When you made the commit

  • Message: "Initial commit"

Now you edit index.html. When you commit again, Git creates:

One new blob for the updated index.html content

One new tree for the myapp/ directory with:

  • Entry: "index.html" → points to NEW index.html blob

  • Entry: "styles.css" → points to SAME old styles.css blob

One new commit with:

  • Pointer to the new tree

  • Pointer to the previous commit as parent

  • Your author info and message

Notice that styles.css hasn't changed, so Git reuses the existing blob. Only the changed file gets a new blob. The tree must be new because it now points to a different index.html blob. The commit is new because it represents a new snapshot.

This is how Git efficiently stores history. Unchanged files don't create new blobs—they're referenced by new trees. Over time, you build a chain of commits, each pointing to its parent, creating your project's history.

How Git Tracks Changes: It Doesn't

Here's a counterintuitive truth: Git doesn't track changes. It tracks snapshots.

Unlike some version control systems that store deltas (differences between versions), Git stores complete snapshots of your project at each commit. Every commit points to a complete tree structure representing your entire project at that moment.

This might sound wasteful, but remember: Git only creates new blobs for changed content. Unchanged files are simply referenced by new tree objects. The result is an efficient system that stores complete snapshots without duplicating unchanged content.

When you run git diff, Git doesn't look up stored changes. Instead, it reconstructs two snapshots (two commits) and compares them on the fly. When you see lines marked with plus and minus signs, that's Git calculating the difference between two complete snapshots.

This snapshot-based approach provides several advantages:

Speed - Git can quickly show you any version of your project by following hashes. There's no need to replay a series of deltas from the beginning of time.

Integrity - Because everything is hashed, Git can verify that nothing has been corrupted. If a single bit flips in a file, the hash won't match, and Git knows something is wrong.

Branching - Creating a branch is just creating a pointer to a commit. It's instant and uses almost no space because you're not copying files—you're creating a new reference.

Merging - Git can find the common ancestor of two branches by following parent pointers, then compare three snapshots to perform the merge.

What Happens During git add?

When you run git add filename, Git performs several operations that prepare your changes for committing:

First, Git computes the SHA-1 hash of the file's content. This hash becomes the name of the object that will store this content.

Second, Git compresses the file content using zlib compression. This reduces storage space significantly, especially for text files.

Third, Git stores this compressed content in the objects directory. The path is constructed from the hash: the first two characters become a subdirectory name, and the remaining 38 characters become the filename. So a hash like a1b2c3d4... creates the file .git/objects/a1/b2c3d4...

Fourth, Git updates the index (staging area). The index is a binary file that lists all files that will be in the next commit, along with their hashes. Think of it as a draft tree object.

Here's what's important: running git add creates a blob in the object database immediately. The file content is now safely stored in .git/objects/, even though you haven't committed yet. If you accidentally delete your working file, you can recover it from this blob.

The index now knows about this file and its hash. When you run git add on multiple files, you're building up a list of files and their hashes in the index. This is your staged snapshot.

What Happens During git commit?

When you run git commit, Git takes the contents of the index and creates the permanent snapshot we discussed earlier. Here's the step-by-step process:

First, Git creates tree objects from the index. It starts with your working directory structure and creates tree objects for each directory. Each tree lists the files and subdirectories it contains, along with their corresponding blob or tree hashes. This builds up a complete representation of your project's structure.

Second, Git creates a commit object. This object contains:

  • The hash of the root tree (your project's complete structure)

  • The hash of the parent commit (from where HEAD points)

  • Your name and email (from git config)

  • A timestamp

  • Your commit message

Third, Git computes a SHA-1 hash for this commit object. This becomes the commit's unique identifier—that long string you see everywhere in Git.

Fourth, Git stores this commit object in the object database, just like blobs and trees.

Fifth, Git updates the branch reference. If you're on the main branch, Git updates the file .git/refs/heads/main to contain your new commit's hash. This moves the branch forward to point to your new commit.

Sixth, because HEAD points to the branch (usually), and the branch now points to the new commit, HEAD indirectly points to your new commit. This is how Git knows where you are in the project history.

After committing, your working directory, staging area, and repository are all in sync. The index still contains the same entries, so if you run git status, Git will report no changes (assuming you haven't modified files since committing).

The Power of Hashing: Integrity Guaranteed

Git's use of SHA-1 hashes for everything isn't just about storage—it's a security and integrity feature. Every object in Git is identified by a hash of its content, and commits include hashes of their parent commits and tree objects. This creates a cryptographic chain of integrity.

If any bit of data changes anywhere in your repository—a single character in a file, a byte in a commit message, or even a timestamp—the hash changes. Because commits contain parent hashes, and trees contain blob hashes, changing any historical data would break the chain. You'd have to recalculate hashes for every subsequent object.

This means Git can detect corruption. If a file gets corrupted on disk, Git knows immediately because it can recalculate the hash and compare it to the expected value.

It also means history is tamper-evident. You can't change a commit message from three months ago without changing that commit's hash, which changes all descendant commits' hashes, which makes it obvious that history was rewritten.

This is why commit hashes are useful as identifiers. When you share a commit hash with a colleague, you're not just sharing a pointer—you're sharing a cryptographic fingerprint that guarantees they're looking at exactly the same content you are.

Branches: Just Movable Pointers

Now that you understand objects and commits, branches become almost trivial to understand. A branch is simply a file in .git/refs/heads/ that contains a commit hash.

When you create a branch with git branch feature, Git creates a file at .git/refs/heads/feature and writes the current commit's hash into it. That's it. No files are copied, no complicated setup—just a 40-character hash written to a file.

When you make commits on a branch, Git simply updates that file with the new commit hash. The branch pointer moves forward automatically with each commit.

This explains why branching in Git is so fast and lightweight compared to older version control systems. You're not copying your entire project—you're creating a tiny text file with a hash in it.

HEAD determines which branch is active. When HEAD points to a branch (which is the normal case), commits move that branch forward. When HEAD points directly to a commit hash instead of a branch (detached HEAD state), commits don't move any branch—they're orphaned unless you create a new branch to capture them.