Git Submodule vs Git Subtree: Don’t Let Shared Libraries Become Your Project’s ‘Black Hole’

Git tutorial - IT technology blog
Git tutorial - IT technology blog

The 2 AM Phone Call and the ‘Copy-Paste’ Trap

A phone ringing at 2 AM rarely brings good news. On the other end was my Lead, sounding panicked: “The server is down, there’s a logic error in the payment module, check it immediately!”.

After 30 minutes of investigation, I discovered a bitter truth. A colleague had copy-pasted the payment-utils library from an old project into this one. To save time, they tweaked 10 lines of logic to meet a deadline. However, when the original library was bug-fixed in another project, the current project was still using the flawed version. This logic discrepancy between environments caused the system to freeze. The price of ‘conveniently’ copying code was an all-nighter spent fixing bugs in a panic.

This is a classic scenario when a team lacks a process for managing reusable components. Instead of manual copy-pasting, Git provides two heavy-duty weapons: Git Submodule and Git Subtree. Each tool has its own pros and cons. Without understanding their nature, it’s easy to “shoot yourself in the foot”.

1. Git Submodule: A ‘Pointer’ to a Specific Commit

Imagine a Submodule like a Windows shortcut. It doesn’t actually contain the library’s code inside the main repo. Instead, it only stores a path and the ID of a specific commit (SHA) that you are using.

Practical Implementation

Suppose you have a web-app project and want to integrate the auth-service library from another repo:

# Add submodule to the project
git submodule add https://github.com/user/auth-service.git lib/auth-service

# Git will create a .gitmodules file to track it
git status

At this point, the lib/auth-service folder will be empty if your colleague clones the repo for the first time. To pull the library code, they must run the command:

git submodule update --init --recursive

Pros and Cons

  • Pros: The main repo remains lightweight because it doesn’t store the library’s commit history. You can pin the library to a specific version, ensuring app stability even if the original library changes.
  • Cons: Extremely tedious for teamwork. If someone forgets to push code in the submodule but has already pushed the main repo, the CI/CD pipeline will fail immediately.

2. Git Subtree: ‘Integrated but Distinct’

Unlike Submodule, Subtree actually brings the entire code and the full commit history of the library directly into the main repo. It’s like building a small house entirely within the grounds of your villa.

Execution Commands

To add a library using Subtree, I usually use the command:

git subtree add --prefix lib/auth-service https://github.com/user/auth-service.git main --squash

The --squash parameter is key here. It helps condense hundreds of library commits into a single commit. This keeps the main project’s git log clean and easy to follow.

Why is Subtree Popular?

  • Team-friendly: No one needs to learn special Git commands. They just clone the repo and have all the code ready to run.
  • Centralized management: You can edit the library code directly within the main repo. Later, you can push back to the original repo if you want to contribute a patch.
  • Cons: The main repo size increases because it carries the additional commit history. Additionally, subtree pull commands are often long and hard to remember.

Comparison: When to Choose the Sword, When to Choose the Bow?

Based on hands-on experience with large projects, here is a comparison table to help you decide quickly:

Criteria Git Submodule Git Subtree
Repo structure Stores only a link (commit SHA) Stores code + history directly
Team experience Complex, prone to “missing commit” errors Simple, like a normal folder
Editability Must switch to the sub-repo Edit directly in place
Use cases 3rd-party libraries, rarely edited Internal modules, frequent updates needed

Painful Lessons from Code ‘Disappearing’

Working with advanced Git inevitably leads to mistakes. I once made a quick fix in a submodule and used git push --force to clean up the history. The result was overwriting all the important code my colleague had just pushed. From then on, I learned a rule: Never use --force unless you fully understand the directory tree structure.

If the project uses npm, composer, or pip, prioritize these package managers for third-party libraries. Use Submodule or Subtree only when you need to manage modules developed by your own team that need to be shared across multiple projects.

Standard Workflow for Using Subtree with Internal Modules:

  1. Add: Place the module in the /shared directory.
  2. Edit: If a bug is found, fix it directly within the project you are working on.
  3. Push: Use git subtree push to update the fix back to the original repo for other projects to use.
# Example of pushing code from subtree back to the original repo
git subtree push --prefix=lib/auth-service https://github.com/user/auth-service.git main

Conclusion

Code management isn’t just about add, commit, and push. As a project grows, decoupling components makes the system easier to maintain. Don’t wait until 2 AM to stay up late because of a messy copy-paste job. Choose a smart management solution today!

Share: