NVIDIA Container Toolkit: How to Run GPUs in Docker (Hands-on from Local to Production)

Docker tutorial - IT technology blog
Docker tutorial - IT technology blog

Quick Start: Enabling GPU for Docker in 5 Minutes

Just received a brand new Ubuntu server for model training? Don’t get bogged down in dry theory. Follow these 3 steps to let Docker truly harness the power of your graphics card.

1. Check Drivers on the Host Machine

Before touching Docker, the host machine must recognize the hardware. Run the following command:

nvidia-smi

If you see the GPU stats table and CUDA version appear, you’re 50% done. If the system says command not found, install the driver using sudo apt install nvidia-driver-535 (or the latest version) before proceeding.

2. Install NVIDIA Container Toolkit

NVIDIA has bundled everything in their official repo. Just run these commands to add the source and install:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

3. Configure Docker Runtime and Test

In this step, we need to register the nvidia runtime with the Docker engine. Instead of manually editing JSON files, use the official command:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Now it’s time to verify the results. Run a sample NVIDIA container to check:

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

See the GPU stats table appearing inside the container? Congratulations, your system is ready for heavy-duty AI workloads.

Why do you need NVIDIA Container Toolkit?

By nature, Docker isolates resources. A default container is completely “blind” to that thousand-dollar graphics card plugged into your motherboard.

GPUs are not like CPUs. They require specific drivers at the user-space level to communicate with hardware. If you try to install drivers directly into the image, the file size will bloat by several GBs. Not to mention, it causes catastrophic conflicts whenever you update drivers on the host machine.

NVIDIA Container Toolkit acts as a smart bridge. Instead of overwriting, it simply maps the necessary library files and binaries from the host machine into the container at runtime. This approach keeps images lightweight while ensuring high flexibility.

Fixing “Heartbreaking” Errors at 2 AM

Let me tell you about a painful lesson. I once deployed an image processing microservice for an e-commerce project. It ran smoothly on local using CPU, but on production with a GPU, the container kept crashing with a CUDA error: unknown error.

After 2 hours of exhausting debugging, I discovered a very basic cause: The host driver version was too old for the CUDA Toolkit requirements in the image.

Remember this golden rule: The host driver must always be newer than or equal to the CUDA version required by the container. You can run a CUDA 11 image on a host with CUDA 12 drivers, but the reverse will definitely fail.

Tips for GPU Partitioning

You don’t always want a single container to “swallow” all resources, especially when running multiple small models simultaneously.

  • Specify a specific GPU: Use --gpus '"device=0"' to use only card 0.
  • Resource control: Combine with --memory and --cpus flags to prevent a container from hanging the entire server.

Using with Docker Compose (Production Standard)

In production environments, manually typing docker run commands is a taboo. We use Docker Compose for management. However, the GPU syntax in Compose is very prone to indentation errors.

Here is a standard docker-compose.yml configuration that I frequently use:

services:
  ai-engine:
    image: vllm/vllm-openai:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Note: This deploy.resources configuration only works reliably from Docker Compose V2 onwards. If you are still using V1, consider upgrading immediately to avoid unnecessary minor bugs.

Optimizing Images: Don’t Let Your Hard Drive “Cry for Help”

A common mistake is using the nvidia/cuda:devel image for production. The devel version contains everything from compilers to header files, weighing in at 3-4GB.

If you only need to run code, choose the right version:

  • base: Ultra-lightweight (~150MB), contains only minimal CUDA libraries.
  • runtime: Suitable for running most common AI applications.
  • devel: Intended only for the code building stage.

My experience is to always use Multi-stage builds. Build your code in the devel environment, then copy the executable to the base image. This technique reduces image size from 4GB to just a few hundred MBs, speeding up deployment significantly.

Conclusion

Installing NVIDIA Container Toolkit is the first step into the professional AI/ML world. Always prioritize driver version synchronization and image size optimization from the start. If the server reports a GPU error, reuse the “sacred” nvidia-smi command inside the container to determine if the issue is with infrastructure or code. Wishing you peaceful nights of sleep instead of staying up all night debugging GPUs!

Share: