How to Package Code Changes with git diff and git archive for Incremental Deployment

Git tutorial - IT technology blog
Git tutorial - IT technology blog

Background: When Full Deployment Becomes a Nightmare

When I first started my career, I managed an “ancient” PHP web project weighing over 5GB. This source code contained countless old images and junk libraries that no one dared to delete. Every time I needed to update the code via FTP or RSYNC, it was a nerve-wracking experience due to the long waiting times.

The pain point was that I had only edited two lines in the auth.py file, a common scenario when urgent issues interrupt unfinished work. Yet, the system still had to scan tens of thousands of files to compare and upload. Once, the connection flickered midway, causing the site to go down for 15 minutes. That was when I realized Incremental Deployment, as part of a practical Git workflow, was the key to solving this problem.

While Docker and CI/CD are common today, pushing the entire source code can still be too costly—especially when bandwidth between build and deployment servers is limited to a few Mbps. Combining git diff and git archive allows you to create extremely lightweight update packages containing only what has actually changed.

Essential Tools: Two “Sidekicks” Built into Git

Instead of installing complex software, we will leverage two basic Git commands.

1. git diff: Listing Files for Update

Normally we use git diff to see how code has changed. However, for packaging, we only need the filenames.

# List changed files between the current commit and the one before it
git diff --name-only HEAD~1 HEAD

In practice, I often compare two versions, such as from v1.0.2 to v1.1.0. This command returns a list of file paths, acting as a “shopping list” for the next step.

2. git archive: The Professional Packaging Engine

The git archive command helps compress files in a commit into zip or tar formats. Its biggest advantage is that it automatically preserves the directory structure and completely ignores untracked junk files.

# Package the entire current source code into update.zip
git archive -o update.zip HEAD

Combining Them to Automate the Process

Our goal is to force git archive to only package the files identified by git diff.

One-liner for Immediate Efficiency

You can nest these two commands using Linux sub-shell syntax. For example, to capture all changes from commit abc1234 to the latest version:

git archive -o changes.zip HEAD $(git diff --name-only abc1234 HEAD)

But be careful. If the list contains deleted files, the command will fail immediately because git archive cannot find those files in the current commit to package.

Filtering File Status with –diff-filter

In my experience, you should always add the --diff-filter flag. We only care about files that were Added (A), Copied (C), Modified (M), or Renamed (R). Deleted files (D) should be handled separately by a deletion script on the server.

# Only include files that currently exist
git archive -o patch_v2.zip HEAD $(git diff --name-only --diff-filter=ACMR v1.0 HEAD)

This approach ensures the patch_v2.zip package is always clean and ready to be extracted over the old code.

Optimization with a Bash Script

To boost productivity and save the team from remembering long commands, I often write a simple bundle.sh script:

#!/bin/bash

OLD_COMMIT=$1
NEW_COMMIT=${2:-HEAD}
OUTPUT="deploy_$(date +%Y%m%d_%H%M%S).zip"

if [ -z "$OLD_COMMIT" ]; then
    echo "Error: Old commit hash is required!"
    exit 1
fi

FILES=$(git diff --name-only --diff-filter=ACMR $OLD_COMMIT $NEW_COMMIT)

if [ -z "$FILES" ]; then
    echo "No changes detected."
    exit 0
fi

git archive -o $OUTPUT $NEW_COMMIT $FILES
echo "Package created: $OUTPUT with $(echo "$FILES" | wc -l) files."

I once implemented this script in GitLab CI to automate your workflow and generate artifacts. It reduced the file size sent to the server from 200MB to less than 1MB for each small bug fix.

Quality Control Before Deployment

Don’t rush to push the file to the server immediately. Take 5 seconds to double-check your package.

  • Inspect the zip content: Run unzip -l changes.zip to see the file list. Ensure sensitive config files like .env are not included.
  • Handling deleted files: The zip package only handles overwrites or additions. To delete old files on the server, I usually export a list of deleted files to a text file: git diff --name-only --diff-filter=D $OLD $NEW > deleted.txt.
  • Version tracking: Always record the deployed commit hash in a VERSION file on the server. Next time, the script will read this file to know where to start the comparison, avoiding incorrect or missing deployments.

This technique is extremely useful for low-spec VPS or Shared Hosting environments. I once shortened the deployment time of a legacy project from 15 minutes to under 30 seconds with just this small change.

Share: