Things That You Don’t Know About Git – Part 1

Posted on Posted in Tools

Most of us use git on a daily basis, but not many of us ever take a break to think about how it works. We often use GUI clients or built-in IDE tools without understanding the underlying concepts. It’s especially common for mobile developers, because our flow usually doesn’t require too much terminal usage,
unlike web or backend development.

Recently, I started learning more about git, and I found really interesting things that I didn’t know before. I think it’s worth understanding how git works under the hood, after all, it’s one of the most important tools in our daily work. That’s why I decided to share my findings.

Understanding git internals will help you avoid common mistakes, easily solve git problems, even improve your automation scripts and CI workflows.

What Is Commit?


You could think that a commit is just a diff between the current state and the previous commit. It’d be a reasonable assumption, because we always see diffs when browsing status, commits, or PRs, but it’s not true.

A commit is a full snapshot of the project at a certain point in time. It contains the state of all files in the project, as well as metadata such as the author, date, and commit message.

You could think that it’s inefficient to store the full snapshot of the project every time you make a commit. But git is smart enough to point to the same unchanged files and only store new versions of changed files. In other words, git reuses the same file in all commits until it changes.

This approach allows git to be very efficient in terms of storage and speed. Storing diffs would be slow, because git would need to go through all commits since the beginning of the project and recreate the final state to even checkout a branch or a commit.

Everything Is Object


The core of git is a key-value store. It uses a hash function (SHA-1) to create a unique identifier for each object. The object can be a commit, a tree, or a blob. The hash is calculated based on the content of the object, so if the content changes, the hash will change too. Objects are stored in a compressed format, so they take up less space.

All objects are stored in a directory structure that is based on the first two characters of the hash. For example, if the hash is e3b0c44298..., the object will be stored in .git/objects/e3/b0c44298.... This way, git can avoid storing too many files in a single directory (which was historically a problem).

> tree .git/objects
.git/objects
├── 00
│   ├── 6bd7632bed930c6b862e28a366316f8c82762e
│   └── 23a5340637983755c8c19c18cc2c2bb1dbda1c
├── 03
│   └── 1b7982ad2ea86f14deac38cc578a0763a214de
├── 09
│   └── 85bcf5a7a87b26b5ea56162b2c829775e9ce8e
├── 2f
│   └── 7f870b7ab575dda9045a852eed8980ef03c1ee
├── 3e
│   └── d3281738852bb8f10e992d3d2a62bc5495f799
...

As you can see, there is no meaningful structure for objects. Whatever git needs to store, it just creates a new object and refers to it by its hash. This is a very efficient way to store data, because it allows git to avoid storing redundant information and maintaining complex data structures.

Please note that if you just cloned a repository, the objects will be packed. Git does this to save space and speed up the cloning process. Usually objects created before cloning will remain packed and all new local operations will create new objects in the unpacked format.

You can also create your own object by running:

git hash-object -w <file>

However, this object will be stored only locally, because it’s not referenced by any commit and it won’t be pushed to the remote repository.

Large Files


Understanding the basics allows us to immediately spot the problem with large files. Changing even a single byte forces git to store a new version of the whole file.

If the file size is 200 MB, each change will increase the size of the repository by 200 MB. You could think that git will compress the file, but the problem is that big files like images, videos, etc. are already compressed. So git won’t be able to compress them further.

That’s why it’s important to avoid committing large files to your repository. There is also an alternative way to store large files in git called git LFS, which I’ll cover in the next article.

Commit Structure


So let’s now see what’s really under the hood of a commit. We first need to find a commit hash. You can do that by running:

git log --oneline

Let’s take the first commit hash from the output. Now, we can confirm that this commit actually exists as a file in the .git/objects directory.

The file is compressed, so to see the content we need to run:

git cat-file -p <hash>

Sample output:

tree 0985bcf5a7a87b26b5ea56162b2c829775e9ce8e
parent 5173872595d7830507cf5077814866e9d0a1624a
author Wojciech Kulik <3128467+wojciech-kulik@users.noreply.github.com> 1745497256 +0200
committer Wojciech Kulik <3128467+wojciech-kulik@users.noreply.github.com> 1745497705 +0200

Commit message

As you can see, there are all the metadata fields that we usually see in the commit. But the actual snapshot is hidden within the tree object, which contains a list of all files in the main directory including their hashes.

Tree

So let’s see the content of the tree object.

 > git cat-file -p 0985bcf5a7a87b26b5ea56162b2c829775e9ce8e

100644 blob 0023a5340637983755c8c19c18cc2c2bb1dbda1c    .gitignore
100644 blob aa8d3067ff49badf837cef01097f048ad0b05725    Package.swift
040000 tree 6257b843979d77b7b16208390424afae061da612    Sources
040000 tree a35607eea1904dd6297ed9b09ac951db69acdd61    Tests
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    TestFile.swift

It contains only the first level of the tree. To see nested directories, we can call the command recursively on tree objects or we can just use:

git ls-tree -r <hash>

To print contents of a specific file just run git cat-file -p <hash> for the blob object.

You can also notice that the tree object contains in the first column the permissions of the file. This information is also tracked by git, which is very useful.

Parent

It’s also important to understand the parent field of the commit which contains the hash of the previous commit. This way, git can easily recreate the history of the project. In other words, commits create a linked list. This way, git doesn’t need to store the whole history of the project, it just needs to store the last commit and traverse it.

A regular commit has only one parent, but there are also merge commits that have two parents.

What Is Branch?


We often think of branch as separate copy of the project or repository, but that’s not true. A branch is just a named pointer to a specific commit, nothing more, nothing less. You can see your main branch by running:

 > cat .git/refs/heads/main

e11da4246586478a83dba70edcbe40e1d359e823

When you commit, pull, rebase, etc. git just moves the pointer by changing the hash in the file. You can also check .git/refs directory to see other pointers like branches, tags, and remote heads.

What Is HEAD?


There is also a special pointer called HEAD. You can see it by running:

 > cat .git/HEAD

ref: refs/heads/main

It just points to the place where you are currently checked out. If you are on a branch, it will point to the branch name. If you checkout a commit, it will point to the commit hash – this state is called a detached HEAD.

Understanding the HEAD pointer is important, because it allows you to understand how git works under the hood. It’s also often used in many git commands.

What Is FETCH_HEAD?


When you run git fetch, git creates a special file called FETCH_HEAD in the .git directory. This file contains the hash of the last commit that was fetched from the remote repository. This way, git can easily track which commits were pulled and which ones were not.

You can use it to see the history of the remote branch without pulling it:

git log --oneline FETCH_HEAD

or you can see what’s changed since the last pull by running:

git diff HEAD FETCH_HEAD

Creating Branch


When you create a branch, git creates a new file in the .git/refs/heads with the hash of the current commit. It also updates the HEAD file to point to the new branch.

Perhaps you imagine branches as separate copies but actually they are just pointers that update when you commit. Each commit updates both pointers – HEAD and the branch.

So let’s assume that we created a new branch and made a few commits. Now, we want to merge it back to the main branch, our history looks like this:

 > git log --oneline

5173872 (HEAD -> branch) commit2
fcc5020 commit1
e11da42 (main) commit3
5e301f0 commit2
7fe827d dcommit1

Knowing the internals, you could guess that theoretically, we could just move the pointer of the main branch to the last commit of our branch. And that’s exactly what git does when you perform a fast-forward merge. To be specific, the content of .git/refs/heads/main will be updated to the hash of the last commit.

Of course, this is not always possible. If there are new commits in the main branch, we need to merge both branches by either creating a merge commit or rebasing the branch.

As we discussed before, there is no separate history for branches. Git just traverses the linked list of commits, starting from the last commit stored in .git/refs/heads/<branch>.

Deleting Branch


Now, when you know that a branch is just a pointer to a commit, you can understand why deleting a branch is so fast. When you delete a branch, git just removes the file from the .git/refs/heads.

This way, the branch is no longer referenced, but the commits are still there in .git/objects. You can still access them by using the commit hash. This is important, because it allows you to recover deleted branches.

However, if you run git gc --prune=now, git will remove all unreferenced objects, including commits. This is a garbage collection process that is run automatically by git to free up space.

Bonus: Pull Requests


Pull request is a feature of GitHub, Bitbucket, and other platforms. It’s not a git feature. The related commit hash is stored remotely in the refs/pull directory. However, it’s not automatically fetched, you can do it manually by running:

git fetch origin refs/pull/<PR-id>/head:pr-<PR-id>

This will map the remote PR head to a local branch called pr-<PR-id>. You can then check it out and see the changes locally.

You can also list all remote GitHub heads by running:

git ls-remote

This way, you will see all branches, pull requests, and tags. You can even track all pull requests by updating your config:

git config remote.origin.fetch "+refs/pull/*:refs/remotes/origin/pull/*"

Of course, there is also a better way to checkout a pull request by just using gh CLI tool and running:

gh pr checkout <PR-id>

But it’s good to understand how it works under the hood.

Boost Your Work


Psst! If you want to boost your productivity even further, check out this app.

Snippety is a tool that can make daily tasks more enjoyable by providing quick access to your snippets. Snippety works flawlessly with every text field! Just press ⌘⇧Space, find your snippet, and hit ↩︎. You can define also your keywords and use snippets by just typing without even opening the app!

Snippety Keyboard
App Store - Snippety