Architecture Maps

Git Version Control Architecture

Interactive architecture map of Git's internal design — the content-addressable object store, DAG commit model, packfiles, transport protocols, and merge strategies.

Open Source Created: 2005 by Linus Torvalds Language: C Updated: Mar 2026
01

Architectural Overview

Git is a distributed version control system built on a content-addressable filesystem. Every piece of data is identified by its SHA-1 hash, forming an immutable object store with mutable references layered on top. The architecture separates cleanly into plumbing (low-level data operations) and porcelain (user-facing commands).

4
Object Types
SHA-1
Content Addressing
DAG
Commit Model
3
Working Layers
6
Merge Strategies
Git Three-Layer Architecture
graph TD
    subgraph WorkTree["Working Tree"]
        WT["Files on disk
(editable workspace)"] end subgraph StagingArea["Index / Staging Area"] IDX["Binary cache: .git/index
(proposed next commit)"] end subgraph Repository["Object Database"] ODB["Content-addressable store
.git/objects/"] REFS["References
.git/refs/"] end WT -->|"git add"| IDX IDX -->|"git commit"| ODB ODB -->|"git checkout"| WT REFS -->|"points to"| ODB style WT fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style IDX fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style ODB fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style REFS fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
02

Content-Addressable Object Store

Git is fundamentally a content-addressable filesystem — a key-value store where every piece of data is identified by its SHA-1 hash. All objects live in .git/objects/, stored with the first 2 hex characters as a subdirectory and the remaining 38 as the filename. Every object is zlib-compressed on disk.

Four Object Types & Relationships
graph TD
    subgraph Objects["Git Object Types"]
        COMMIT["Commit
tree + parents + metadata"] TAG["Annotated Tag
target + tagger + message"] TREE["Tree
directory listing of entries"] BLOB["Blob
raw file content (no name)"] end TAG -->|"points to"| COMMIT COMMIT -->|"references"| TREE COMMIT -->|"parent(s)"| COMMIT TREE -->|"entry: blob SHA-1"| BLOB TREE -->|"entry: subtree SHA-1"| TREE style COMMIT fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style TAG fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style TREE fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style BLOB fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1

Object Format

Each object is hashed as "<type> <bytesize>\0<content>". The SHA-1 of this string becomes the object's address. Identical content always produces the same hash, enabling automatic deduplication.

blob

Blob Object

Stores raw file content with no filename or metadata. The same content in two different files is stored once.

tree

Tree Object

A directory listing: entries of <mode> <name> pointing to blob or subtree SHA-1 hashes.

commit

Commit Object

Points to a tree (snapshot), parent commit(s), author/committer identity, and the commit message.

tag

Annotated Tag

Points to a target object with tagger identity, message, and optional GPG signature.

Tree Entry Modes

Mode Type Description
100644Normal fileStandard read/write file permissions
100755ExecutableFile with execute permission set
120000SymlinkSymbolic link (blob contains target path)
040000SubdirectorySubtree reference to another tree object
160000GitlinkSubmodule commit reference
Plumbing Commands

git hash-object -w creates a blob, git cat-file -p/-t inspects any object, git update-index writes to the index, git write-tree converts the index to a tree, and git commit-tree creates a commit object.

03

DAG Commit Model

The commit history forms a Directed Acyclic Graph where nodes are commit objects and edges are parent pointers. Edges flow from child to parent (newer to older). SHA-1 hashing makes cycles impossible — no commit can be its own ancestor.

DAG with Branches & Merge
graph RL
    subgraph History["Commit History (DAG)"]
        C1["C1
initial commit"] C2["C2"] C3["C3"] C4["C4
(feature branch)"] C5["C5
(feature branch)"] C6["C6
(main branch)"] M["M
merge commit"] end subgraph Pointers["Mutable References"] MAIN["main"] FEAT["feature"] HEAD["HEAD"] end C2 -->|"parent"| C1 C3 -->|"parent"| C2 C4 -->|"parent"| C3 C5 -->|"parent"| C4 C6 -->|"parent"| C3 M -->|"parent 1"| C6 M -->|"parent 2"| C5 MAIN -->|"refs/heads/main"| M FEAT -->|"refs/heads/feature"| C5 HEAD -->|"ref: refs/heads/main"| MAIN style C1 fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style C2 fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style C3 fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style C4 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style C5 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style C6 fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style M fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1 style MAIN fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style FEAT fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style HEAD fill:#8B6914,stroke:#D4B458,color:#F5F0E1

Initial Commits

Zero parents. Root nodes of the DAG. Most repositories have exactly one.

Normal Commits

Exactly one parent. The most common type, forming a linear chain of snapshots.

Merge Commits

Two or more parents, recording the union of divergent histories into one lineage.

Append-Only Semantics

The DAG is append-only — existing nodes are never mutated. Commands like git commit --amend and git rebase create new nodes and move references, leaving old nodes unreachable until garbage collection removes them.

04

Refs, Branches & HEAD

References are human-readable names stored as plain text files containing a single SHA-1 hash. A branch is a 41-byte file (40-char hash + newline) in .git/refs/heads/. Creating a branch means writing a new file; deleting means removing it.

Reference Directory Layout
graph TD
    subgraph GitDir[".git/"]
        HEAD_F["HEAD
ref: refs/heads/main"] PACKED["packed-refs"] end subgraph Refs[".git/refs/"] HEADS["heads/
local branches"] TAGS["tags/
tag references"] REMOTES["remotes/
remote-tracking refs"] end subgraph Branches["Branch Files"] MAIN_R["main
(41 bytes: SHA-1)"] FEAT_R["feature
(41 bytes: SHA-1)"] end subgraph RemoteRefs["Remote Tracking"] ORIGIN["origin/main"] ORIGIN_F["origin/feature"] end HEAD_F -->|"symbolic ref"| HEADS HEADS --> MAIN_R HEADS --> FEAT_R REMOTES --> ORIGIN REMOTES --> ORIGIN_F style HEAD_F fill:#8B6914,stroke:#D4B458,color:#F5F0E1 style PACKED fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style HEADS fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style TAGS fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style REMOTES fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style MAIN_R fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style FEAT_R fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style ORIGIN fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style ORIGIN_F fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1

HEAD States

symbolic

Normal HEAD

Contains ref: refs/heads/main — a symbolic reference pointing to the current branch. Commits advance this branch.

detached

Detached HEAD

Contains a raw SHA-1 hash. Commits create unreachable nodes unless a branch is created to retain them.

Tag Types

Type Storage Features
LightweightPlain ref file pointing to commit SHA-1Name only, no metadata
AnnotatedRef file pointing to tag objectTagger, date, message, optional GPG signature
Packed Refs

git gc consolidates individual ref files into .git/packed-refs (one line per ref). Git checks loose refs first, then falls back to packed-refs. The reflog in .git/logs/ records every ref change with entries expiring after 90 days (reachable) or 30 days (unreachable).

05

Index (Staging Area) Internals

The index (.git/index) is a binary file that acts as a cache between the working tree and the repository. It represents the "next commit" — the proposed snapshot. The DIRC header identifies the file format.

Index Binary Format
graph TD
    subgraph Header["Header (12 bytes)"]
        SIG["Signature: DIRC"]
        VER["Version: 2, 3, or 4"]
        CNT["Entry Count: 32-bit"]
    end

    subgraph Entries["Cache Entries (sorted by path)"]
        E1["Entry: ctime + mtime + dev + ino
+ mode + uid + gid + size
+ SHA-1 + flags + path"] E2["Entry: (same structure)"] EN["...more entries..."] end subgraph Extensions["Optional Extensions"] TREE["TREE: cached tree objects"] REUC["REUC: resolve undo data"] UNTR["UNTR: untracked file cache"] FSMN["FSMN: fsmonitor data"] end subgraph Trailer["Trailer"] CSUM["SHA-1 checksum of all above"] end Header --> Entries --> Extensions --> Trailer style SIG fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style VER fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style CNT fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style E1 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style E2 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style EN fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style TREE fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style REUC fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style UNTR fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style FSMN fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style CSUM fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1

Merge Conflict Stages

The 2-bit stage field in each index entry tracks merge conflicts. During a conflicted merge, up to three versions of a file coexist in the index:

Stage Meaning Source
0Normal (resolved)Clean merge or manually resolved
1BaseCommon ancestor version
2OursCurrent branch version
3TheirsIncoming branch version

Index Versions

v2

Version 2

Basic format with no extended flags. Paths padded to 8-byte alignment after NUL terminator.

v3

Version 3

Adds extended flags: skip-worktree for sparse checkout, intent-to-add for git add -N.

v4

Version 4

Path prefix compression (stores diff from previous entry's path). No padding needed.

06

Packfiles & Delta Compression

Git stores objects in two forms: loose objects (individual zlib-compressed files) and packfiles (consolidated binary archives with delta compression). Packfiles typically achieve 50%+ space reduction through delta encoding and deduplication.

Packfile Binary Structure
graph LR
    subgraph Pack["Packfile (.pack)"]
        HDR["Header
PACK + version + count"] OBJ1["Object 1
(full blob)"] OBJ2["Object 2
(OFS_DELTA)"] OBJ3["Object 3
(full tree)"] OBJ4["Object 4
(REF_DELTA)"] TRAIL["Trailer
SHA-1 checksum"] end subgraph Index["Pack Index (.idx)"] FAN["Fanout Table
256 entries"] SIDS["Sorted Object IDs"] OFFS["Offset Table"] end FAN -->|"bucket lookup"| SIDS SIDS -->|"binary search"| OFFS OFFS -->|"seek to"| OBJ1 style HDR fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style OBJ1 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style OBJ2 fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style OBJ3 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style OBJ4 fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style TRAIL fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style FAN fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style SIDS fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style OFFS fill:#8B6914,stroke:#C4A96A,color:#F5F0E1

Object Entry Types

Type ID Name Description
1OBJ_COMMITFull commit object stored verbatim
2OBJ_TREEFull tree object stored verbatim
3OBJ_BLOBFull blob object stored verbatim
4OBJ_TAGFull tag object stored verbatim
6OFS_DELTADelta with negative byte offset to base in same pack
7REF_DELTADelta with 20-byte SHA-1 reference to base object
Delta Compression Strategy

Objects are sorted by type, filename, then size. A sliding window (default --window=10) compares each object against nearby objects. Delta depth is capped at 50 to prevent long decompression chains. Newer versions are stored intact; older versions as deltas — optimizing access to recent content.

07

Merge Strategies

Git supports multiple merge strategy backends selected via git merge -s <strategy>. The core mechanism is a three-way merge: find the merge base, compare each branch against it, and resolve differences.

Three-Way Merge Algorithm
graph TD
    subgraph Find["Step 1: Find Common Ancestor"]
        BASE["Merge Base
(common ancestor)"] end subgraph Compare["Step 2: Compare Both Sides"] OURS["Ours (current branch)
diff vs base"] THEIRS["Theirs (incoming)
diff vs base"] end subgraph Resolve["Step 3: Resolve"] AUTO["One side changed
= take that change"] MERGE_OK["Both changed, different lines
= auto-merge"] CONFLICT["Both changed, same lines
= CONFLICT"] end BASE --> OURS BASE --> THEIRS OURS --> AUTO OURS --> MERGE_OK OURS --> CONFLICT THEIRS --> AUTO THEIRS --> MERGE_OK THEIRS --> CONFLICT style BASE fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style OURS fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style THEIRS fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style AUTO fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style MERGE_OK fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style CONFLICT fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1

Available Strategies

Strategy Heads Key Features
ort2Default (v2.33+). Rename detection, histogram diff, virtual merge base
recursive2Legacy. Synonym for ort since v2.50
resolve2Simpler three-way merge, no rename detection
octopus2+Merges N branches; refuses on manual conflicts
oursanyResult is always current branch; ignores all other changes
subtree2Adjusts tree structure to match subtree layout
Virtual Merge Base

When multiple common ancestors exist (criss-cross merge), the ort/recursive algorithm recursively merges the common ancestors to create a synthetic merge base. This avoids ambiguity and reduces mismerges compared to arbitrarily picking one ancestor.

08

Transport Protocols

Git's smart protocol uses two process pairs for communication: fetch-pack/upload-pack for fetching and send-pack/receive-pack for pushing. Data is framed in pkt-line format with 4-hex-digit length prefixes.

Smart Protocol Fetch Flow
graph LR
    subgraph Client["Client"]
        FP["fetch-pack"]
    end

    subgraph Network["Wire Protocol (pkt-line)"]
        REF_DISC["1. Reference Discovery
(server sends all refs)"] WANT["2. Want/Have Negotiation
(client: want SHA, have SHA)"] ACK["3. ACK/NAK Responses"] PACK_TX["4. Packfile Transfer
(minimal pack streamed)"] end subgraph Server["Server"] UP["upload-pack"] end FP --> REF_DISC REF_DISC --> UP UP --> WANT WANT --> FP FP --> ACK ACK --> UP UP --> PACK_TX PACK_TX --> FP style FP fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style UP fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style REF_DISC fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style WANT fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style ACK fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style PACK_TX fill:#5C4A28,stroke:#A8884A,color:#F5F0E1

Transport Layers

Transport URL Format Auth Features
SSHgit@host:repo.gitSSH keysRead/write, encrypted
Smart HTTPhttps://host/repo.gitHTTP authRead/write, firewall-friendly
git://git://host/repo.gitNoneRead-only (typically), fastest raw speed
Dumb HTTPhttp://host/repo.gitHTTP authRead-only, deprecated
Push Protocol

Client sends lines of <old-sha1> <new-sha1> <refname> (all-zeros on left = new ref; all-zeros on right = delete ref), followed by a packfile. Server runs receive-pack, optionally triggers pre-receive and update hooks, then updates refs.

09

Hooks System

Hooks are executable scripts in .git/hooks/ that fire on specific Git events. Pre-hooks can abort operations (non-zero exit = cancel), while post-hooks are notifications only. Hooks are local-only and not transferred during clone.

Commit Workflow Hooks
graph LR
    subgraph Commit["git commit Lifecycle"]
        PRE["pre-commit
lint, test, check"] PREP["prepare-commit-msg
template injection"] MSG["commit-msg
validate format"] POST["post-commit
notifications"] end START["User runs
git commit"] --> PRE PRE -->|"exit 0"| PREP PRE -->|"exit 1"| ABORT1["ABORT"] PREP -->|"exit 0"| MSG MSG -->|"exit 0"| POST MSG -->|"exit 1"| ABORT2["ABORT"] POST --> DONE["Commit
complete"] style START fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style PRE fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style PREP fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style MSG fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style POST fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style DONE fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style ABORT1 fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1 style ABORT2 fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1

Server-Side Hooks

Server-side hooks cannot be bypassed by clients with --no-verify. They provide the critical enforcement layer for shared repositories.

Hook Trigger Scope Can Reject?
pre-receiveOn push arrivalAll refs at onceYes (entire push)
updatePer-branch during pushSingle refYes (just that ref)
post-receiveAfter push completesAll refsNo (notification)

Other Client-Side Hooks

guard

pre-rebase

Fires before rebase starts. Can prevent rebasing of already-pushed commits.

guard

pre-push

Fires during push before object transfer. Can validate refs or run test suites.

notify

post-checkout

Fires after git checkout. Useful for setting up environment or regenerating docs.

notify

post-merge

Fires after git merge. Can restore file permissions or sync external files.

10

Garbage Collection & Pruning

git gc performs four operations: packing loose objects into packfiles, consolidating multiple packfiles, packing loose refs, and removing unreachable objects older than the grace period.

GC Reachability & Pruning Flow
graph TD
    subgraph Roots["Reachability Roots"]
        BR["Branch refs"]
        TG["Tag refs"]
        RL["Reflog entries"]
        IX["Index"]
    end

    subgraph Analysis["Reachability Walk"]
        WALK["Traverse DAG from roots
mark all reachable objects"] end subgraph Result["Classification"] REACH["Reachable Objects
(kept, repacked)"] DANGLE["Dangling Objects
(unreachable)"] end subgraph Prune["Pruning"] GRACE["Grace Period
(default: 2 weeks)"] DELETE["Removed from disk"] end Roots --> WALK WALK --> REACH WALK --> DANGLE DANGLE --> GRACE GRACE --> DELETE style BR fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style TG fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1 style RL fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style IX fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1 style WALK fill:#5C4A28,stroke:#A8884A,color:#F5F0E1 style REACH fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1 style DANGLE fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1 style GRACE fill:#8B6914,stroke:#C4A96A,color:#F5F0E1 style DELETE fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1

Auto-GC Thresholds

gc.auto = 6700

When loose object count exceeds this threshold, git gc --auto triggers packing into a packfile.

Configurable via git config

gc.autopacklimit = 50

When packfile count exceeds this threshold, packfiles are consolidated into a single pack.

Configurable via git config

Reflog Expiry

Reachable entries: 90 days. Unreachable entries: 30 days. Configured via gc.reflogExpire.

gc.reflogExpire / gc.reflogExpireUnreachable
Data Recovery

Use git reflog to find recently-orphaned commits via HEAD history, git fsck --full to detect dangling objects, and git branch recover <sha1> to re-attach a dangling commit before GC removes it.

11

Architectural Design Decisions

Git's architecture reflects several deliberate trade-offs made by Linus Torvalds in 2005. These decisions optimize for the Linux kernel's development workflow — large-scale, distributed, and integrity-critical.

Content-Addressable Storage

Identical content is automatically deduplicated across the entire repository history. Integrity verification is built-in — any bit-flip changes the hash and is immediately detectable.

Immutable Objects + Mutable Refs

The object store is append-only (safe for concurrent access, trivially cacheable), while refs provide the mutable "view" layer. This separation enables lock-free reads and simple distributed replication.

Snapshot-Based Commits

Each commit stores a complete tree pointer, not a changeset. Checkout is O(tree-size) rather than O(history-length). Diffs between any two commits take constant time. Deltas are purely a storage optimization.

The Index Layer

The staging area enables partial commits, conflict resolution via stage slots, and performance optimization through stat caching that avoids re-hashing unchanged files.

SHA-1 Triple Duty

The hash serves as content address, integrity check, and deduplication key simultaneously. Git is migrating to SHA-256 for stronger cryptographic guarantees.

Distributed by Default

Every clone is a full repository with complete history. No single point of failure. Operations are local-first — commits, branches, diffs, and log all work offline.

12

Glossary

blobBinary Large Object (file content)
DAGDirected Acyclic Graph
DIRCDirectory Cache (index signature)
FSMNFile System Monitor extension
HEADPointer to current branch or commit
IEOTIndex Entry Offset Table
MIDXMulti-Pack-Index
OFS_DELTAOffset Delta (packfile entry type)
ortOstensibly Recursive's Twin (merge strategy)
pkt-linePacket-line wire format
REF_DELTAReference Delta (packfile entry type)
refsReferences (branch/tag pointers)
REUCResolve Undo Cache
SHA-1Secure Hash Algorithm 1 (160-bit)
SHA-256Secure Hash Algorithm 256-bit (successor)
TREECache Tree (index extension)
UNTRUntracked Cache (index extension)
zlibCompression library (DEFLATE algorithm)
Diagram
100%
Scroll to zoom · Drag to pan · Esc to close