Git Sparse Checkout: The Ultimate Trick for Handling Gigabyte-Scale Monorepos

Git tutorial - IT technology blog
Git tutorial - IT technology blog

The Nightmare of Endless Git Clones

Have you ever sat waiting for 20 minutes just to fix a single line of CSS in a massive Monorepo project? It’s a terrible feeling. You only need to update a README file, but you’re forced to download the entire 5-10 GB source code. The SSD on your “standard issue” MacBook is constantly flashing red due to lack of space.

When I first started working on a project with over 50 microservices, I found myself in this frustrating situation. The entire repo contained code for every department, while I was only responsible for exactly two Backend services. Every time I pulled code, my computer’s cooling fan sounded like a jet engine. After many frustrating moments, I discovered Sparse Checkout — a powerful Git feature that completely solves this problem.

Sparse Checkout allows you to fetch only the specific directories you need. The rest are hidden by Git and not downloaded to your machine. Note that this isn’t just cloning the whole repo and then manually deleting folders!

What is Sparse Checkout?

In practical terms, Sparse Checkout helps you define a “whitelist” of folders you want to appear. Anything not on this list will vanish from your working directory. However, they remain safe on the server (like GitHub or GitLab).

Since Git version 2.25, we have the dedicated git sparse-checkout command. This new approach is much simpler than manually editing system configuration files as in the past.

There are two modes you should know about:

  • Cone mode (Default): Only allows selecting specific directories. This mode is extremely fast, stable, and highly recommended.
  • Non-cone mode: Allows using patterns (like regex) to filter files. This is more flexible but has poorer performance and is prone to configuration errors.

Workflow for Partial Directory Cloning (Pro Tips)

Don’t clone everything and then enable sparse-checkout. You’ll still waste bandwidth downloading “junk.” Instead, apply this optimized 4-step process.

Step 1: Initialize the Repo with Blobless Clone

Instead of a standard git clone, use the --filter and --no-checkout flags. This command only downloads commit metadata without downloading file contents (blobs).

# Replace the URL with your actual repo
git clone --filter=blob:none --no-checkout https://github.com/example/monorepo-khong-lo.git
cd monorepo-khong-lo

At this point, the directory will be empty. Don’t panic; we’re on the right track.

Step 2: Activate Efficiency Mode

Enable the sparse-checkout feature with Cone mode for maximum performance:

git sparse-checkout init --cone

Step 3: Select Working Directories

Suppose you only need to work with apps/api-gateway and libs/shared-auth. Instruct Git as follows:

git sparse-checkout set apps/api-gateway libs/shared-auth

Git will record this list and prepare to download the data.

Step 4: Checkout Data

Finally, pull the code for the branch you need:

git checkout main

The result is surprising: your project directory now only shows the 2 folders you selected. The project size drops from 5GB to about 150MB. Your workflow speed will increase significantly.

Real-world Experience After 6 Months of Implementation

Applying Sparse Checkout in a corporate environment brings great results, but keep these 3 important points in mind.

1. Always Combine with Partial Clone

If you use sparse-checkout on a fully cloned repo, you only solve the “visual clutter” problem. To truly save disk space, you must use --filter=blob:none in Step 1.

2. Smart Dependency Management

This is the most common mistake. A service in a Monorepo often depends on common or utils folders. Without them, your IDE will light up with errors, and the code won’t build.

Pro tip: Check your package.json or go.mod files for required local packages. Then use the add command to include those folders:

git sparse-checkout add common/utils

3. Optimize CI/CD Pipelines

I’ve tried applying this technique to Jenkins and GitHub Actions. Pipeline execution time dropped from 4 minutes to 45 seconds. Runners no longer have to pull thousands of unnecessary files over the network, significantly saving bandwidth costs.

How to Revert to Full State

If you need to perform a global search or refactor the entire project, you can disable this feature at any time:

git sparse-checkout disable

This command returns the repo to its fully cloned state. Everything is flexible and safe.

Conclusion

Git Sparse Checkout is more than just a technical trick. It’s a modern workflow mindset that helps you focus on what matters most. By eliminating thousands of irrelevant files, your computer runs faster, and your mind stays less distracted.

If your team is struggling with a Monorepo, try implementing this immediately. Your hard drive and bandwidth will surely thank you!

Share: