Optimizing Docker Images with Multi-stage Builds: A Guide to Reducing Size and Enhancing Security

Docker tutorial - IT technology blog
Docker tutorial - IT technology blog

Introduction: When Docker Images Get ‘Bloated’

When I first started with Docker, I thought I could just throw everything into a single Dockerfile and be done with it. I’d install all sorts of compilers, development libraries, debugging tools… all in one image. Then, after pushing the image to the registry, I’d break out in a sweat looking at its size – hundreds of MBs, sometimes even GBs. That’s when I started looking into optimization. Multi-stage build was the ‘savior’ I found, and now, after more than 6 months of deploying it in production, I find it incredibly useful.

Multi-stage build not only significantly reduces image size but also enhances application security. This technique allows you to separate the build environment (where many tools are needed) from the runtime environment (where only the application needs to run). The result is a lightweight final image, containing only what’s truly necessary to run the application.

Why Multi-stage Builds Are Essential

  • Reduced Image Size: This is the most prominent benefit. Smaller images lead to faster push/pull operations, significantly saving storage space and bandwidth.
  • Enhanced Security: By removing build tools, unnecessary libraries, and even source code from the final image, you drastically reduce the potential attack surface.
  • Optimized Caching: Separate stages enable Docker to utilize caching more effectively, thereby speeding up subsequent builds.
  • Cleaner Dockerfiles: Clearly separating steps makes Dockerfiles much easier to read and manage.

Without further ado, let’s dive straight into how you can apply Multi-stage Builds immediately.

Quick Start: Optimize Your Image in 5 Minutes

To begin, let’s walk through a practical example. I’ll use a simple Go application because Go compiles into a static binary, which is perfect for clearly illustrating the difference in image size.

1. Prepare the Go Application

Create a new directory, for example, my_go_app, and create a main.go file with the following content:

package main

import (
	"fmt"
	"net/http"
)

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintf(w, "Hello from Multi-stage Docker Build!")
	})

	fmt.Println("Server starting on port 8080...")
	http.ListenAndServe(":8080", nil)
}

2. Traditional Dockerfile (Single-stage)

Create a Dockerfile.single file in the same directory:

# Use the full golang image to build and run
FROM golang:1.22

WORKDIR /app

# Copy all source code into the image
COPY . .

# Initialize go mod (if not already) and download dependencies
RUN go mod init example.com/myapp || true
RUN go mod tidy

# Compile the application
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/myapp .

# Expose port
EXPOSE 8080

# Run the application
CMD ["/app/myapp"]

Build and check the image size:

docker build -t my-go-app-single -f Dockerfile.single .
docker images | grep my-go-app-single

You’ll notice that the image size is quite large, potentially several hundred MBs.

3. Dockerfile with Multi-stage Build

Create a Dockerfile.multi file in the same directory:

# Stage 1: Build environment
FROM golang:1.22 AS builder

WORKDIR /app

COPY . .

RUN go mod init example.com/myapp || true
RUN go mod tidy

# Compile the application
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/myapp .

# Stage 2: Lightweight runtime environment
FROM alpine:latest

WORKDIR /app

# Only copy the compiled binary from the 'builder' stage
COPY --from=builder /app/myapp .

EXPOSE 8080

CMD ["./myapp"]

Build and check the image size:

docker build -t my-go-app-multi -f Dockerfile.multi .
docker images | grep my-go-app-multi

You’ll see that the image my-go-app-multi is significantly smaller, just a few MBs! That’s the power of Multi-stage Build.

To run the application:

docker run -p 8080:8080 my-go-app-multi

Open your browser and navigate to http://localhost:8080 to test it.

Detailed Explanation: How Multi-stage Build Works

Multi-stage Build works by defining multiple stages within a single Dockerfile. Each stage starts with a FROM instruction and can be named using AS <stage_name>. The key point is that you can copy artifacts (e.g., compiled files, configurations) from a previous stage to a later one. To do this, we use the COPY --from=<stage_name> instruction.

It’s important to note that only the final stage produces the complete Docker image. All other intermediate stages exist only during the build process and are not stored in the final image. This allows us to eliminate everything unnecessary for runtime, such as compilers, SDKs, development libraries, or cache.

Basic Structure of Multi-stage Build

# Stage 1: Build
FROM some_build_image AS builder
WORKDIR /app
COPY . .
RUN build_command

# Stage 2: Test (optional)
FROM some_test_image AS tester
WORKDIR /app
COPY --from=builder /app/build_output .
RUN test_command

# Stage 3: Final runtime
FROM some_runtime_image
WORKDIR /app
COPY --from=builder /app/build_output .
EXPOSE port
CMD ["./app"]

As you can see, we can define multiple different stages. The tester stage can take output from the builder to run tests. Then, the final stage will only retrieve what’s necessary from the builder (or tester, if applicable) to create the runtime image.

Specific Benefits

  • Ultra-small Image Size: Multi-stage Build significantly reduces storage space on the registry and shortens image pull/push times. Especially for Go, Rust, or C/C++ applications, image sizes can drop from hundreds of MBs to just a few MBs.
  • Enhanced Security: The final image contains only the application and minimal runtime libraries. There are no compilers, development tools, or original source code. This reduces the potential attack surface, making it harder for malicious actors to exploit vulnerabilities or extract source code.
  • Simplified Dockerfiles: Although they might appear longer, each stage has a clear purpose. This makes Dockerfiles much easier to read and maintain.
  • Effective Caching: If only the source code changes, Docker can reuse the cache from the dependency installation steps in the build stage, thus accelerating the build process.

Advanced: Further Optimization with Multi-stage Build

Now that you’ve mastered the basics, it’s time to explore some advanced techniques to optimize Multi-stage Builds even further.

1. Utilizing Multiple Build Stages

Sometimes, your application might have various types of dependencies or complex build steps. In such cases, you can create multiple build stages to separate them. For example: one stage dedicated to installing NPM dependencies, another for building the frontend, and a third for building the backend.

# Stage 1: Install Node.js dependencies
FROM node:18-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

# Stage 2: Build frontend (e.g., with React/Angular/Vue)
FROM node:18-alpine AS frontend_builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build:frontend

# Stage 3: Build backend (e.g., with NestJS/Express)
FROM node:18-alpine AS backend_builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build:backend

# Stage 4: Final runtime image
FROM node:18-alpine
WORKDIR /app

# Copy built frontend and backend
COPY --from=frontend_builder /app/dist/frontend ./dist/frontend
COPY --from=backend_builder /app/dist/backend ./dist/backend

# Install only production dependencies for the backend
COPY package.json package-lock.json ./
RUN npm ci --only=production

EXPOSE 3000
CMD ["node", "./dist/backend/main.js"]

2. Leveraging Build Arguments (ARG)

You can pass variables into the build process using ARG. This is particularly useful when you want to change the version of a tool or library during the build without modifying the Dockerfile.

ARG NODE_VERSION=18-alpine

# Stage 1: Dependencies
FROM node:${NODE_VERSION} AS deps
# ... (dependency installation steps)

# Stage 2: Final runtime
FROM node:${NODE_VERSION}
# ... (application copy and run steps)

When building, you can override the default value:

docker build --build-arg NODE_VERSION=20-alpine -t my-app .

3. Optimizing Caching

Docker caches layers based on the order of instructions in the Dockerfile. To optimize Multi-stage Builds, place less frequently changing instructions first:

  • Dependencies before source code: Always copy package.json/go.mod and run dependency installation commands before copying the entire source code. If only the source code changes, Docker will reuse the dependency layer.
  • Separate installation steps: If your Dockerfile has multiple RUN commands, consider consolidating them. This applies when commands are closely related and frequently change together, aiming to reduce the number of unnecessary layers.

Practical Tips & Personal Experience

After numerous “headaches” deploying Docker in a production environment, I’ve distilled some “hard-earned” lessons related to Multi-stage Build that I want to share with you:

  • Choose the Right Base Image:
    • For build stages: It’s advisable to use more complete images (e.g., golang:1.22, node:18) as they typically include the necessary tools for compilation.
    • For runtime stages: Always prioritize ultra-small images like alpine (e.g., alpine:latest, node:18-alpine, scratch). scratch is an empty image, only suitable for completely static binaries like Go. If your application needs glibc or other libraries, alpine is an excellent choice.
  • Always Clean Up Intermediate Stages: Before COPY --from to the next stage, ensure you’ve removed any temporary files, caches, or unnecessary tools in the current stage. This helps reduce the size of the “artifact” you’re copying. While Docker automatically discards unused layers in the final stage, proactive cleanup is still a good practice.
  • Check File Permissions: When using COPY --from, file permissions might occasionally not be desired. Remember to reset permissions if necessary (e.g., RUN chmod +x /app/myapp).
  • Utilize .dockerignore: This is an extremely important file that many newcomers often overlook. .dockerignore helps exclude unnecessary files and directories – such as local node_modules, .git, .env, dist – from the Docker build context. This not only speeds up the build process but also prevents copying irrelevant items into the final image.
  • Don’t Forget EXPOSE and CMD/ENTRYPOINT: Ensure your final stage correctly declares the port your application will listen on (`EXPOSE`) and the command to launch the application (`CMD` or `ENTRYPOINT`).
  • Debug Multi-stage Build: If you encounter an error in an intermediate stage, you can build up to that stage and inspect it. For example, to debug the `builder` stage:
    docker build --target builder -t my-app-builder-debug .
    docker run -it my-app-builder-debug bash
    

    This allows you to enter the container of that specific stage to check files, run commands, and understand the cause of the error.

In summary, Multi-stage Build is an incredibly powerful technique and an indispensable best practice when working with Docker at a production scale. This technique not only helps you create lean, fast images but also significantly enhances the security of your application. Don’t hesitate, try applying it to your projects today!

Share: