When the .git folder suddenly becomes a storage burden
A few months ago, my team’s CI/CD system suddenly threw a No space left on device error in the middle of the night. After a quick check of the Jenkins server, I found the hard drive was 100% full. Surprisingly, the culprit wasn’t log files or Docker images, but the .git folder of a long-standing project that had ballooned to nearly 5GB.
Every time a developer ran git fetch, years of accumulated junk data continued to replicate. If you find your project’s clone speed is crawling like a snail or your .git size is unusually large, it’s time for a “deep clean” using Git GC and Git Prune.
Why does the .git folder bloat excessively?
Basically, Git stores your entire change history, not just the current code. There are three main reasons why this repository can become bulky:
- Loose Objects: Every time you
git add, Git creates a new zlib-compressed file. Over thousands of commits, these small individual files multiply rapidly and slow down the system. - Dangling Commits: Operations like rebase or
commit --amendoften leave behind old commits. These remain in the database but don’t belong to any branch. - Reflog Data: The reflog mechanism records every action to help you recover data from accidental deletions. By default, Git keeps these records for 30 to 90 days.
In my team’s case, members had accidentally committed heavy binary files like .jar and raw images. Even though the files were later deleted and recommitted, these heavy files still quietly existed in the Git history, preventing the repo size from shrinking.
Step 1: Measuring Repository Health
First, check how much redundant data exists using the command:
git count-objects -vH
Pay attention to the count (number of loose objects) and size-pack. If count reaches tens of thousands, it’s a clear sign your repo is extremely cluttered.
Step 2: Compressing Data with Git GC (Garbage Collection)
The git gc command is like a specialized garbage truck. It collects loose objects into packfiles, removes objects that are no longer referenced, and updates the index.
For the deepest optimization, I usually use the --aggressive flag:
git gc --aggressive --prune=now
The --aggressive flag helps Git search for deltas (differences between files) more effectively to maximize compression. However, be careful with --prune=now. This command immediately wipes out dangling commits, making it impossible to use reflog to save data if you accidentally deleted a branch earlier.
Step 3: Thorough Cleaning with Git Prune
Sometimes git gc still can’t delete everything because it’s hindered by the reflog protection mechanism. To truly get rid of old data, I often use the following aggressive “purge” command set:
# Force reflog entries to expire immediately
git reflog expire --expire=now --all
# Remove objects that are no longer referenced
git prune --verbose --progress
# Pack everything into a single packfile
git repack -ad
The results were surprising. After running this combo on the 5GB project mentioned above, the .git folder size dropped to less than 800MB. The execution speed of git status was also noticeably faster.
Strategies to Prevent Repo Bloat Early On
Cleaning up is just a temporary fix. To maintain a clean repo, I apply two hard rules for the team:
1. Strict Control of Binary Files
Absolutely do not commit build files (node_modules, target), log files, or large binary files to Git. If you accidentally commit them, removing them from history is complex and usually requires specialized tools like BFG Repo-Cleaner.
2. Periodic Maintenance on CI/CD
Instead of waiting for the hard drive to turn red, I add a maintenance script that runs on weekends for large projects:
#!/bin/bash
git gc --prune=today --quiet
git remote prune origin
The git remote prune origin command is extremely useful. It helps clean up references to branches that have been deleted on the server, keeping your team’s branch list tidy.
Conclusion
Managing a Git Repository is like cleaning your house. Doing it periodically keeps everything light and smooth. If your repo is showing signs of sluggishness, try git gc --aggressive right away. You’ll be surprised by the amount of space you save!
If you’ve ever had trouble with heavy files accidentally committed to Git, share your experience in the comments!
