Interactive architecture map of Git's internal design — the content-addressable object store, DAG commit model, packfiles, transport protocols, and merge strategies.
Git is a distributed version control system built on a content-addressable filesystem. Every piece of data is identified by its SHA-1 hash, forming an immutable object store with mutable references layered on top. The architecture separates cleanly into plumbing (low-level data operations) and porcelain (user-facing commands).
graph TD
subgraph WorkTree["Working Tree"]
WT["Files on disk
(editable workspace)"]
end
subgraph StagingArea["Index / Staging Area"]
IDX["Binary cache: .git/index
(proposed next commit)"]
end
subgraph Repository["Object Database"]
ODB["Content-addressable store
.git/objects/"]
REFS["References
.git/refs/"]
end
WT -->|"git add"| IDX
IDX -->|"git commit"| ODB
ODB -->|"git checkout"| WT
REFS -->|"points to"| ODB
style WT fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style IDX fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style ODB fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style REFS fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
Git is fundamentally a content-addressable filesystem — a key-value store where every piece of data is identified by its SHA-1 hash. All objects live in .git/objects/, stored with the first 2 hex characters as a subdirectory and the remaining 38 as the filename. Every object is zlib-compressed on disk.
graph TD
subgraph Objects["Git Object Types"]
COMMIT["Commit
tree + parents + metadata"]
TAG["Annotated Tag
target + tagger + message"]
TREE["Tree
directory listing of entries"]
BLOB["Blob
raw file content (no name)"]
end
TAG -->|"points to"| COMMIT
COMMIT -->|"references"| TREE
COMMIT -->|"parent(s)"| COMMIT
TREE -->|"entry: blob SHA-1"| BLOB
TREE -->|"entry: subtree SHA-1"| TREE
style COMMIT fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style TAG fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style TREE fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style BLOB fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
Each object is hashed as "<type> <bytesize>\0<content>". The SHA-1 of this string becomes the object's address. Identical content always produces the same hash, enabling automatic deduplication.
Stores raw file content with no filename or metadata. The same content in two different files is stored once.
A directory listing: entries of <mode> <name> pointing to blob or subtree SHA-1 hashes.
Points to a tree (snapshot), parent commit(s), author/committer identity, and the commit message.
Points to a target object with tagger identity, message, and optional GPG signature.
| Mode | Type | Description |
|---|---|---|
100644 | Normal file | Standard read/write file permissions |
100755 | Executable | File with execute permission set |
120000 | Symlink | Symbolic link (blob contains target path) |
040000 | Subdirectory | Subtree reference to another tree object |
160000 | Gitlink | Submodule commit reference |
git hash-object -w creates a blob, git cat-file -p/-t inspects any object, git update-index writes to the index, git write-tree converts the index to a tree, and git commit-tree creates a commit object.
The commit history forms a Directed Acyclic Graph where nodes are commit objects and edges are parent pointers. Edges flow from child to parent (newer to older). SHA-1 hashing makes cycles impossible — no commit can be its own ancestor.
graph RL
subgraph History["Commit History (DAG)"]
C1["C1
initial commit"]
C2["C2"]
C3["C3"]
C4["C4
(feature branch)"]
C5["C5
(feature branch)"]
C6["C6
(main branch)"]
M["M
merge commit"]
end
subgraph Pointers["Mutable References"]
MAIN["main"]
FEAT["feature"]
HEAD["HEAD"]
end
C2 -->|"parent"| C1
C3 -->|"parent"| C2
C4 -->|"parent"| C3
C5 -->|"parent"| C4
C6 -->|"parent"| C3
M -->|"parent 1"| C6
M -->|"parent 2"| C5
MAIN -->|"refs/heads/main"| M
FEAT -->|"refs/heads/feature"| C5
HEAD -->|"ref: refs/heads/main"| MAIN
style C1 fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style C2 fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style C3 fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style C4 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style C5 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style C6 fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style M fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1
style MAIN fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style FEAT fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style HEAD fill:#8B6914,stroke:#D4B458,color:#F5F0E1
Zero parents. Root nodes of the DAG. Most repositories have exactly one.
Exactly one parent. The most common type, forming a linear chain of snapshots.
Two or more parents, recording the union of divergent histories into one lineage.
The DAG is append-only — existing nodes are never mutated. Commands like git commit --amend and git rebase create new nodes and move references, leaving old nodes unreachable until garbage collection removes them.
References are human-readable names stored as plain text files containing a single SHA-1 hash. A branch is a 41-byte file (40-char hash + newline) in .git/refs/heads/. Creating a branch means writing a new file; deleting means removing it.
graph TD
subgraph GitDir[".git/"]
HEAD_F["HEAD
ref: refs/heads/main"]
PACKED["packed-refs"]
end
subgraph Refs[".git/refs/"]
HEADS["heads/
local branches"]
TAGS["tags/
tag references"]
REMOTES["remotes/
remote-tracking refs"]
end
subgraph Branches["Branch Files"]
MAIN_R["main
(41 bytes: SHA-1)"]
FEAT_R["feature
(41 bytes: SHA-1)"]
end
subgraph RemoteRefs["Remote Tracking"]
ORIGIN["origin/main"]
ORIGIN_F["origin/feature"]
end
HEAD_F -->|"symbolic ref"| HEADS
HEADS --> MAIN_R
HEADS --> FEAT_R
REMOTES --> ORIGIN
REMOTES --> ORIGIN_F
style HEAD_F fill:#8B6914,stroke:#D4B458,color:#F5F0E1
style PACKED fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style HEADS fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style TAGS fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style REMOTES fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style MAIN_R fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style FEAT_R fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style ORIGIN fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style ORIGIN_F fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
Contains ref: refs/heads/main — a symbolic reference pointing to the current branch. Commits advance this branch.
Contains a raw SHA-1 hash. Commits create unreachable nodes unless a branch is created to retain them.
| Type | Storage | Features |
|---|---|---|
| Lightweight | Plain ref file pointing to commit SHA-1 | Name only, no metadata |
| Annotated | Ref file pointing to tag object | Tagger, date, message, optional GPG signature |
git gc consolidates individual ref files into .git/packed-refs (one line per ref). Git checks loose refs first, then falls back to packed-refs. The reflog in .git/logs/ records every ref change with entries expiring after 90 days (reachable) or 30 days (unreachable).
The index (.git/index) is a binary file that acts as a cache between the working tree and the repository. It represents the "next commit" — the proposed snapshot. The DIRC header identifies the file format.
graph TD
subgraph Header["Header (12 bytes)"]
SIG["Signature: DIRC"]
VER["Version: 2, 3, or 4"]
CNT["Entry Count: 32-bit"]
end
subgraph Entries["Cache Entries (sorted by path)"]
E1["Entry: ctime + mtime + dev + ino
+ mode + uid + gid + size
+ SHA-1 + flags + path"]
E2["Entry: (same structure)"]
EN["...more entries..."]
end
subgraph Extensions["Optional Extensions"]
TREE["TREE: cached tree objects"]
REUC["REUC: resolve undo data"]
UNTR["UNTR: untracked file cache"]
FSMN["FSMN: fsmonitor data"]
end
subgraph Trailer["Trailer"]
CSUM["SHA-1 checksum of all above"]
end
Header --> Entries --> Extensions --> Trailer
style SIG fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style VER fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style CNT fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style E1 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style E2 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style EN fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style TREE fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style REUC fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style UNTR fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style FSMN fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style CSUM fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
The 2-bit stage field in each index entry tracks merge conflicts. During a conflicted merge, up to three versions of a file coexist in the index:
| Stage | Meaning | Source |
|---|---|---|
0 | Normal (resolved) | Clean merge or manually resolved |
1 | Base | Common ancestor version |
2 | Ours | Current branch version |
3 | Theirs | Incoming branch version |
Basic format with no extended flags. Paths padded to 8-byte alignment after NUL terminator.
Adds extended flags: skip-worktree for sparse checkout, intent-to-add for git add -N.
Path prefix compression (stores diff from previous entry's path). No padding needed.
Git stores objects in two forms: loose objects (individual zlib-compressed files) and packfiles (consolidated binary archives with delta compression). Packfiles typically achieve 50%+ space reduction through delta encoding and deduplication.
graph LR
subgraph Pack["Packfile (.pack)"]
HDR["Header
PACK + version + count"]
OBJ1["Object 1
(full blob)"]
OBJ2["Object 2
(OFS_DELTA)"]
OBJ3["Object 3
(full tree)"]
OBJ4["Object 4
(REF_DELTA)"]
TRAIL["Trailer
SHA-1 checksum"]
end
subgraph Index["Pack Index (.idx)"]
FAN["Fanout Table
256 entries"]
SIDS["Sorted Object IDs"]
OFFS["Offset Table"]
end
FAN -->|"bucket lookup"| SIDS
SIDS -->|"binary search"| OFFS
OFFS -->|"seek to"| OBJ1
style HDR fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style OBJ1 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style OBJ2 fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style OBJ3 fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style OBJ4 fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style TRAIL fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style FAN fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style SIDS fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style OFFS fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
| Type ID | Name | Description |
|---|---|---|
1 | OBJ_COMMIT | Full commit object stored verbatim |
2 | OBJ_TREE | Full tree object stored verbatim |
3 | OBJ_BLOB | Full blob object stored verbatim |
4 | OBJ_TAG | Full tag object stored verbatim |
6 | OFS_DELTA | Delta with negative byte offset to base in same pack |
7 | REF_DELTA | Delta with 20-byte SHA-1 reference to base object |
Objects are sorted by type, filename, then size. A sliding window (default --window=10) compares each object against nearby objects. Delta depth is capped at 50 to prevent long decompression chains. Newer versions are stored intact; older versions as deltas — optimizing access to recent content.
Git supports multiple merge strategy backends selected via git merge -s <strategy>. The core mechanism is a three-way merge: find the merge base, compare each branch against it, and resolve differences.
graph TD
subgraph Find["Step 1: Find Common Ancestor"]
BASE["Merge Base
(common ancestor)"]
end
subgraph Compare["Step 2: Compare Both Sides"]
OURS["Ours (current branch)
diff vs base"]
THEIRS["Theirs (incoming)
diff vs base"]
end
subgraph Resolve["Step 3: Resolve"]
AUTO["One side changed
= take that change"]
MERGE_OK["Both changed, different lines
= auto-merge"]
CONFLICT["Both changed, same lines
= CONFLICT"]
end
BASE --> OURS
BASE --> THEIRS
OURS --> AUTO
OURS --> MERGE_OK
OURS --> CONFLICT
THEIRS --> AUTO
THEIRS --> MERGE_OK
THEIRS --> CONFLICT
style BASE fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style OURS fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style THEIRS fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style AUTO fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style MERGE_OK fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style CONFLICT fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1
| Strategy | Heads | Key Features |
|---|---|---|
| ort | 2 | Default (v2.33+). Rename detection, histogram diff, virtual merge base |
| recursive | 2 | Legacy. Synonym for ort since v2.50 |
| resolve | 2 | Simpler three-way merge, no rename detection |
| octopus | 2+ | Merges N branches; refuses on manual conflicts |
| ours | any | Result is always current branch; ignores all other changes |
| subtree | 2 | Adjusts tree structure to match subtree layout |
When multiple common ancestors exist (criss-cross merge), the ort/recursive algorithm recursively merges the common ancestors to create a synthetic merge base. This avoids ambiguity and reduces mismerges compared to arbitrarily picking one ancestor.
Git's smart protocol uses two process pairs for communication: fetch-pack/upload-pack for fetching and send-pack/receive-pack for pushing. Data is framed in pkt-line format with 4-hex-digit length prefixes.
graph LR
subgraph Client["Client"]
FP["fetch-pack"]
end
subgraph Network["Wire Protocol (pkt-line)"]
REF_DISC["1. Reference Discovery
(server sends all refs)"]
WANT["2. Want/Have Negotiation
(client: want SHA, have SHA)"]
ACK["3. ACK/NAK Responses"]
PACK_TX["4. Packfile Transfer
(minimal pack streamed)"]
end
subgraph Server["Server"]
UP["upload-pack"]
end
FP --> REF_DISC
REF_DISC --> UP
UP --> WANT
WANT --> FP
FP --> ACK
ACK --> UP
UP --> PACK_TX
PACK_TX --> FP
style FP fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style UP fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style REF_DISC fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style WANT fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style ACK fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style PACK_TX fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
| Transport | URL Format | Auth | Features |
|---|---|---|---|
| SSH | git@host:repo.git | SSH keys | Read/write, encrypted |
| Smart HTTP | https://host/repo.git | HTTP auth | Read/write, firewall-friendly |
| git:// | git://host/repo.git | None | Read-only (typically), fastest raw speed |
| Dumb HTTP | http://host/repo.git | HTTP auth | Read-only, deprecated |
Client sends lines of <old-sha1> <new-sha1> <refname> (all-zeros on left = new ref; all-zeros on right = delete ref), followed by a packfile. Server runs receive-pack, optionally triggers pre-receive and update hooks, then updates refs.
Hooks are executable scripts in .git/hooks/ that fire on specific Git events. Pre-hooks can abort operations (non-zero exit = cancel), while post-hooks are notifications only. Hooks are local-only and not transferred during clone.
graph LR
subgraph Commit["git commit Lifecycle"]
PRE["pre-commit
lint, test, check"]
PREP["prepare-commit-msg
template injection"]
MSG["commit-msg
validate format"]
POST["post-commit
notifications"]
end
START["User runs
git commit"] --> PRE
PRE -->|"exit 0"| PREP
PRE -->|"exit 1"| ABORT1["ABORT"]
PREP -->|"exit 0"| MSG
MSG -->|"exit 0"| POST
MSG -->|"exit 1"| ABORT2["ABORT"]
POST --> DONE["Commit
complete"]
style START fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style PRE fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style PREP fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style MSG fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style POST fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style DONE fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style ABORT1 fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1
style ABORT2 fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1
Server-side hooks cannot be bypassed by clients with --no-verify. They provide the critical enforcement layer for shared repositories.
| Hook | Trigger | Scope | Can Reject? |
|---|---|---|---|
| pre-receive | On push arrival | All refs at once | Yes (entire push) |
| update | Per-branch during push | Single ref | Yes (just that ref) |
| post-receive | After push completes | All refs | No (notification) |
Fires before rebase starts. Can prevent rebasing of already-pushed commits.
Fires during push before object transfer. Can validate refs or run test suites.
Fires after git checkout. Useful for setting up environment or regenerating docs.
Fires after git merge. Can restore file permissions or sync external files.
git gc performs four operations: packing loose objects into packfiles, consolidating multiple packfiles, packing loose refs, and removing unreachable objects older than the grace period.
graph TD
subgraph Roots["Reachability Roots"]
BR["Branch refs"]
TG["Tag refs"]
RL["Reflog entries"]
IX["Index"]
end
subgraph Analysis["Reachability Walk"]
WALK["Traverse DAG from roots
mark all reachable objects"]
end
subgraph Result["Classification"]
REACH["Reachable Objects
(kept, repacked)"]
DANGLE["Dangling Objects
(unreachable)"]
end
subgraph Prune["Pruning"]
GRACE["Grace Period
(default: 2 weeks)"]
DELETE["Removed from disk"]
end
Roots --> WALK
WALK --> REACH
WALK --> DANGLE
DANGLE --> GRACE
GRACE --> DELETE
style BR fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style TG fill:#6B4C8A,stroke:#9A7CB8,color:#F5F0E1
style RL fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style IX fill:#3A5C8A,stroke:#6B8EBF,color:#F5F0E1
style WALK fill:#5C4A28,stroke:#A8884A,color:#F5F0E1
style REACH fill:#2D5A3D,stroke:#4A9D63,color:#F5F0E1
style DANGLE fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1
style GRACE fill:#8B6914,stroke:#C4A96A,color:#F5F0E1
style DELETE fill:#7A2032,stroke:#9A3A4A,color:#F5F0E1
When loose object count exceeds this threshold, git gc --auto triggers packing into a packfile.
When packfile count exceeds this threshold, packfiles are consolidated into a single pack.
Reachable entries: 90 days. Unreachable entries: 30 days. Configured via gc.reflogExpire.
Use git reflog to find recently-orphaned commits via HEAD history, git fsck --full to detect dangling objects, and git branch recover <sha1> to re-attach a dangling commit before GC removes it.
Git's architecture reflects several deliberate trade-offs made by Linus Torvalds in 2005. These decisions optimize for the Linux kernel's development workflow — large-scale, distributed, and integrity-critical.
Identical content is automatically deduplicated across the entire repository history. Integrity verification is built-in — any bit-flip changes the hash and is immediately detectable.
The object store is append-only (safe for concurrent access, trivially cacheable), while refs provide the mutable "view" layer. This separation enables lock-free reads and simple distributed replication.
Each commit stores a complete tree pointer, not a changeset. Checkout is O(tree-size) rather than O(history-length). Diffs between any two commits take constant time. Deltas are purely a storage optimization.
The staging area enables partial commits, conflict resolution via stage slots, and performance optimization through stat caching that avoids re-hashing unchanged files.
The hash serves as content address, integrity check, and deduplication key simultaneously. Git is migrating to SHA-256 for stronger cryptographic guarantees.
Every clone is a full repository with complete history. No single point of failure. Operations are local-first — commits, branches, diffs, and log all work offline.