Docker Volumes: Persistent Data Storage That Survives Container Restarts

Docker tutorial - IT technology blog
Docker tutorial - IT technology blog

2 AM, PostgreSQL database wiped clean — an expensive lesson about volumes

That day I was redeploying a PostgreSQL container on the production server, ran docker rm -f postgres and spun it back up. Opened the app — blank. Three months of data, gone.

The reason was painfully simple: I was storing data inside the container, not outside. Container deleted — data goes with it. From that point on, I never deploy a database without checking its volume first.

Why storing data inside a container is dangerous

Docker containers have a filesystem layer called the writable layer — everything you write inside a container (database files, uploads, logs…) lives there. The problem: this layer is tied to the container’s lifecycle. Container dies — data dies with it.

  • docker stop → data is still there (container is just paused)
  • docker rm → data is gone permanently
  • Deploying a new version (rebuilding the image, recreating the container) → data is gone
  • Server crash, Docker daemon restart → container ephemeral state can be lost

Docker has 3 ways to store data outside a container: volumes, bind mounts, and tmpfs. For production, volumes are the standard choice — Docker manages them internally, independent of the host directory structure.

Core concepts before you start

Volumes are Docker built-ins — no extra installation needed. But there are 2 types, and the difference matters more than you’d think:

Named volumes vs Anonymous volumes

Anonymous volume: Docker auto-generates a random name like b3c2e1f9a8d7..., nearly impossible to track, easy to forget, and silently eats up disk space.

# Anonymous volume — DO NOT use in production
docker run -v /var/lib/postgresql/data postgres

Named volume: you give it a meaningful name, easy to manage, and it exists independently of any container.

# Create a named volume
docker volume create postgres_data

# Inspect it
docker volume ls
docker volume inspect postgres_data

The output of inspect shows you where the data actually lives on the host — typically /var/lib/docker/volumes/postgres_data/_data.

Real-world production configurations

1. PostgreSQL with a named volume

# Run PostgreSQL with a named volume
docker run -d \
  --name postgres \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -e POSTGRES_DB=myapp \
  -v postgres_data:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:16

# Test: create some data
docker exec -it postgres psql -U postgres -d myapp -c \
  "CREATE TABLE test (id serial, name text); INSERT INTO test (name) VALUES ('hello');"

# Remove the container
docker rm -f postgres

# Recreate it — data is still there
docker run -d \
  --name postgres \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -e POSTGRES_DB=myapp \
  -v postgres_data:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:16

docker exec -it postgres psql -U postgres -d myapp -c "SELECT * FROM test;"
# Result: data is still there

2. Volumes in Docker Compose

My production stack includes an app server, PostgreSQL, Redis, and a volume for user uploads. This is the Compose config after refining it through several real-world incidents:

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes
    restart: unless-stopped

  app:
    image: myapp:latest
    volumes:
      - uploads:/app/public/uploads
    depends_on:
      - postgres
      - redis

volumes:
  postgres_data:         # Docker creates this automatically if it doesn't exist
  redis_data:
  uploads:

Pay attention to the volumes: block at the bottom of the file — this is where all named volumes are declared. If you reference a volume in a service without declaring it here, Compose will throw an error immediately.

3. Bind mounts — development use only

Bind mounts map a host directory directly into the container. Useful for hot reload during development, or when you need to read a config file from the host:

# Bind mount: map current directory to /app
docker run -v $(pwd):/app node:20 npm run dev

# In Compose — development config
services:
  app:
    image: myapp:latest
    volumes:
      - ./src:/app/src          # bind mount for dev
      - uploads:/app/uploads    # named volume for data

In production, using bind mounts for databases or user data creates unnecessary headaches — permission issues are complex, and if the host path changes the container breaks immediately.

4. Volume drivers — storing data off the host

Running multi-server setups or need automated cloud backups? Volume drivers let you store data on NFS, S3, or cloud storage. Example with NFS:

docker volume create \
  --driver local \
  --opt type=nfs \
  --opt o=addr=192.168.1.100,rw \
  --opt device=:/srv/nfs/mydata \
  nfs_data

For a small team on a single VPS, local named volumes are more than enough. Just make sure to back up /var/lib/docker/volumes/ and you’re covered.

Inspection & Monitoring

List volumes and check disk usage

# List all volumes
docker volume ls

# Inspect a specific volume
docker volume inspect postgres_data

# Check size (requires root)
du -sh /var/lib/docker/volumes/postgres_data/_data

# Or use docker system df for an overview
docker system df -v

Find which containers are using a volume

# View volume mounts for a running container
docker inspect postgres --format='{{json .Mounts}}' | python3 -m json.tool

Backup and restore a volume

After the 2 AM incident, this is the script I added to a daily cronjob — simple, but it’s saved me at least twice:

# Backup a volume to a tar.gz file
docker run --rm \
  -v postgres_data:/data \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/postgres_$(date +%Y%m%d_%H%M%S).tar.gz -C /data .

# Restore from backup
docker run --rm \
  -v postgres_data:/data \
  -v $(pwd)/backups:/backup \
  alpine tar xzf /backup/postgres_20240101_020000.tar.gz -C /data

Cleaning up unused volumes

# List volumes not used by any container
docker volume ls -f dangling=true

# Remove all dangling volumes — CAUTION: this cannot be undone
docker volume prune

# Remove a specific volume (must stop the container first)
docker volume rm volume_name

Note: docker system prune does NOT remove volumes unless you add the --volumes flag. This is an intentional Docker design decision to protect your data — do not add that flag to automated cleanup scripts in production.

Monitoring disk usage

PostgreSQL WAL logs, Redis RDB snapshots — these silently eat up disk if you’re not watching. I added this check to my monitoring script, alerting when usage exceeds 80%:

#!/bin/bash
# Alert if volumes are using more than 80% of disk
DISK_USAGE=$(df /var/lib/docker/volumes | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$DISK_USAGE" -gt 80 ]; then
  echo "ALERT: Docker volumes disk usage at ${DISK_USAGE}%"
  docker system df -v
fi

Production checklist

  • All databases (PostgreSQL, MySQL, MongoDB, Redis) must use named volumes
  • User uploads, generated files — named volumes
  • Backup cronjob running at least once per day for each critical volume
  • Test restores at least once per month — an untested backup is no backup at all
  • Never run docker system prune --volumes in production unless you’ve already backed up
  • Document volume names and which containers use them — you’ll forget everything in 3 months

Ever since that 2 AM lesson, every time I set up a new stack I run docker inspect to confirm data is going into the volume, not the writable layer. Those thirty seconds have saved me from at least a few midnight debugging sessions.

Share: