Optimizing Docker Image Size: Practical Experience for Faster, Lighter Applications

Docker tutorial - IT technology blog
Docker tutorial - IT technology blog

Why is Docker Image Size Optimization So Important?

When first working with Docker, like many others, I often focused on getting the application to run within a container. It wasn’t until I deployed to a production environment and worked with dozens, sometimes even hundreds of images, that I truly grasped the importance of optimizing Docker image size.

An oversized Docker image not only consumes excessive storage but also prolongs build times, push/pull times to the registry, and most importantly, slows down application deployment. In a production environment, every second of waiting can impact user experience or operational costs. Reducing image size means saving network and disk resources, and significantly accelerating CI/CD.

Common Methods for Optimizing Docker Image Size

While searching for solutions to “slim down” Docker images, I experimented and gathered several key methods. Each approach has its own advantages and disadvantages, suitable for specific situations.

1. Use Compact Base Images

This is one of the simplest and most effective ways. Instead of choosing “full-featured” base images like ubuntu or debian, we can switch to lighter versions.

  • Alpine Linux: Known for its ultra-small size (just a few MB).
  • -slim or -buster-slim variants: Slimmed-down versions of popular operating systems (e.g., python:3.9-slim-buster).
  • Distroless images: Provide almost exclusively the libraries needed to run an application, without a shell or package manager, making them extremely secure and compact.

2. Multi-stage Builds

This is the technique I value most and use frequently. The idea is to use multiple FROM statements within a single Dockerfile. The initial stage (build stage) will contain all the necessary tools and libraries for compiling or packaging the application. The subsequent stage (runtime stage) then only copies the artifacts created in the previous stage into a cleaner, more compact base image.

3. Optimize Layers

Each RUN, COPY, ADD command in a Dockerfile creates a new layer. Docker caches these layers, but if we’re not careful, many unnecessary layers can cause the image to swell. Optimization methods include:

  • Combine RUN commands: Instead of multiple consecutive RUN commands, combine them using && to create a single layer, reducing the number of layers and improving cache utilization.
  • Use .dockerignore: Similar to .gitignore, this file helps Docker ignore unnecessary files and directories (e.g., local node_modules, .git, __pycache__) when building an image.
  • Delete temporary files and cache: After installing packages, immediately remove unnecessary cache files or dependencies. For example, with apt use rm -rf /var/lib/apt/lists/*, and with pip use --no-cache-dir.
  • Only COPY what’s necessary: Avoid COPY . . if you only need a few specific files.

4. Use Docker BuildKit

BuildKit is a newer and more efficient image building tool, integrated into Docker since version 18.09. It brings many improvements such as better caching, parallel build capabilities, and other advanced features that help optimize the build process and image size. I often activate BuildKit by setting the environment variable DOCKER_BUILDKIT=1.

Analysis of Each Method’s Pros and Cons

1. Using Compact Base Images

  • Pros: Highly effective, easy to implement, significantly reduces image size from the outset.
  • Cons: Ultra-small base images like Alpine may lack certain necessary C/C++ tools or libraries (like glibc), requiring additional installation, which can sometimes be quite complex for newcomers.

2. Multi-stage Builds

  • Pros: The most powerful method for eliminating build-time-only dependencies, creating extremely lightweight runtime images. Helps clearly separate development and application runtime environments. Improves security as the final image does not contain potentially vulnerable build tools.
  • Cons: Dockerfiles become slightly more complex, requiring a clear understanding of the application’s build stages.

3. Optimizing Layers

  • Pros: Provides detailed control over what goes into the image. Helps minimize waste and unnecessary files.
  • Cons: Easily overlooked without good discipline. Sometimes combining commands can make the Dockerfile harder to read if not organized properly.

4. Using Docker BuildKit

  • Pros: Significantly speeds up builds, improves caching capabilities, and supports features like external cache and secret mounting.
  • Cons: Requires manual activation (if not already default). For older projects, switching to BuildKit might require some adjustments.

Which Method is Most Suitable?

Through practical experience, I’ve found that there’s no single “silver bullet.” The most effective approach is to combine the methods above. For most applications, I typically apply the following strategy:

  1. Always prioritize Multi-stage Builds: This is the foundation for compact and secure images.
  2. Use compact Base Images in the runtime stage: Combine Multi-stage builds with base images like -slim or Alpine (if the application doesn’t have complex dependencies).
  3. Optimize Layers and clean up files: Thoroughly apply .dockerignore and always clean up the cache immediately after package installation.
  4. Activate BuildKit: To speed up builds and leverage advanced features.

Detailed Implementation Guide

Let’s look at a practical example for a Python application to see the difference:

Suboptimal Dockerfile (for comparison)

A simple, easy-to-write Dockerfile, but it will create a rather large image:

# SUBOPTIMAL: Build and Run in the same stage, using a large base image
FROM python:3.9

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["python", "app.py"]

The image created from this Dockerfile will include all build tools (compiler, headers…) and temporary pip files, significantly increasing its size.

Optimized Dockerfile with Multi-stage Build and -slim Base Image

This is the method I typically use for Python applications in production:

# Stage 1: Build environment
FROM python:3.9-slim-buster as builder

# Set up working directory
WORKDIR /app

# Copy requirements file and install dependencies
# Use --no-cache-dir so pip doesn't store cache, helping reduce layer size
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application source code (after dependencies are installed)
COPY . .

# Stage 2: Runtime environment
# Contains only what's necessary to run the application
FROM python:3.9-slim-buster

WORKDIR /app

# Copy installed dependencies and source code from 'builder' stage
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
COPY --from=builder /app /app

EXPOSE 8000

# Command to start the application
CMD ["python", "app.py"]

In the example above, the builder stage installs Python packages. Subsequently, the runtime stage only copies the installed packages and application source code into a clean python:3.9-slim-buster image. The final image size will be significantly smaller.

Using .dockerignore

Create a file named .dockerignore at the same level as your Dockerfile. This file’s content will list the files/directories Docker should ignore when building:


# Ignore Python virtual environment
venv/

# Ignore .git directory
.git/

# Ignore cache files and compiled files
__pycache__/
*.pyc
*.egg-info/

# Ignore environment configuration files
.env
.DS_Store

# Ignore other files and directories not needed for runtime
docs/
tests/

This ensures that unnecessary files are not included in the build context, helping to reduce image size and speed up builds.

Combine RUN commands and clean up cache

When installing system packages, combine apt-get commands and clean up the cache immediately within the same RUN instruction:


# Combine commands and clean up cache immediately
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential \
        curl \
        git && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

The commands rm -rf /var/lib/apt/lists/* and apt-get clean are crucial for removing package manager metadata, significantly reducing layer size.

Experience and Advice from Production Reality

Optimizing Docker images is a continuous process. After 6 months of deploying applications in production with Docker, I realized that monitoring and fine-tuning Dockerfiles are essential.

A point I was particularly pleased with during this process was when I transitioned my entire stack from Docker Compose v1 to v2, and the process was quite smooth. With improvements in performance and syntax, Docker Compose v2 (now a Docker CLI plugin) has helped me manage multi-container services more efficiently, especially when working with optimized images. The startup and update speeds of services have noticeably improved.

Remember, a small image isn’t just about disk space. It’s also about security (fewer components, fewer potential vulnerabilities) and performance (faster builds, quicker deployments, less resource consumption). Treat your Dockerfile as a critical part of your application’s source code and dedicate time to optimizing it.

I hope these practical insights will help you feel more confident in optimizing your Docker images!

Share: