The first time I used Docker Compose on a real project, I made a lot of basic mistakes that are embarrassing to look back on — I’d finish a deploy and the site would be down for an hour because I forgot to drain traffic before restarting containers. When the team asked “why is it down?”, I had no good answer. That was the starting point that pushed me to go deep on advanced Docker Swarm: controlled Rolling Updates, Placement Constraints, and Docker Config.
This article is for people who already know Swarm basics and want to configure it for real production — not a demo lab, but a system running 24/7 with actual users.
Context: Why Do We Need This Extra Layer of Configuration?
Swarm’s defaults are decent, but the moment you move to production, you’ll run into a handful of common problems:
- Default rolling updates cause downtime: Swarm stops the old replica before starting the new one, creating a window where no instance is serving requests — users see a 502.
- Uncontrolled container placement: A heavy database might get scheduled on a low-RAM node, or all your API replicas might pile onto a single node that then goes down.
- Config and secrets in environment variables: These can easily leak through
docker inspect, log aggregation, orps auxon the host. - No automatic rollback: You deploy, discover a bug, and have to handle it manually while users are seeing real errors.
Three features — Placement Constraints, Docker Config/Secret, and Rolling Updates with order: start-first — directly solve each of these problems.
Setup: Labeling Your Nodes
Labels are the foundation of Placement Constraints. Before writing your stack file, you need to assign labels to each node based on its role and hardware characteristics. This is a step most tutorials skip, which leaves readers confused about why constraints aren’t working:
# Assign role labels to worker nodes
docker node update --label-add role=worker node-1
docker node update --label-add role=worker node-2
# Assign storage type — important for databases
docker node update --label-add storage=ssd node-1
docker node update --label-add storage=hdd node-2
# Assign availability zone if running a multi-region cluster
docker node update --label-add zone=az-1 node-1
docker node update --label-add zone=az-2 node-2
# Verify labels were assigned correctly
docker node inspect node-1 --format '{{json .Spec.Labels}}'
docker node ls --format 'table {{.Hostname}}\t{{.Status}}\t{{.ManagerStatus}}'
After assigning labels, double-check with docker node ls -q | xargs docker node inspect --format '{{.Description.Hostname}}: {{.Spec.Labels}}' to make sure all nodes have their labels before deploying the stack.
Detailed Configuration
Docker Config and Secret — Managing Configuration the Right Way
Instead of passing config through environment variables, Docker Config stores static configuration files (nginx.conf, app.yaml, etc.) while Docker Secret stores sensitive data (passwords, API keys). Both are encrypted at rest and only decrypted in the RAM of the running container:
# Create config from a file
docker config create nginx_conf ./nginx.conf
docker config create app_settings ./app.yaml
# Create secret from a file (recommended over stdin to avoid saving to shell history)
docker secret create db_password ./db_password.txt
docker secret create jwt_secret ./jwt_secret.txt
# Verify
docker config ls
docker secret ls
Declare and mount them into containers in your stack file:
configs:
nginx_conf:
external: true
app_settings:
external: true
secrets:
db_password:
external: true
services:
nginx:
image: nginx:1.25-alpine
configs:
- source: nginx_conf
target: /etc/nginx/nginx.conf
mode: 0440 # Read-only for owner and group, not others
api:
image: myapp/api:latest
configs:
- source: app_settings
target: /app/config/settings.yaml
secrets:
- source: db_password
target: db_password
mode: 0400 # Owner read-only — secrets should be stricter than configs
Secrets are mounted at /run/secrets/<secret_name> inside the container. Your app reads from this file instead of an environment variable — this is far more secure because the secret never appears in the process environment and won’t be exposed through docker inspect.
Placement Constraints — Routing Workloads to the Right Nodes
With the labels assigned above, you can now precisely control which containers run on which nodes. constraints are hard rules (must be satisfied), while preferences are soft rules (satisfied when possible):
services:
api:
deploy:
replicas: 4
placement:
constraints:
- node.role == worker # API does not run on manager nodes
- node.labels.role == worker # Double-check via custom label
preferences:
- spread: node.labels.zone # Spread evenly across AZs, avoid concentration
database:
deploy:
replicas: 1
placement:
constraints:
- node.labels.storage == ssd # DB only runs on nodes with SSD
- node.role == worker # Not on manager nodes
redis:
deploy:
replicas: 1
placement:
constraints:
- node.labels.zone == az-1 # Pin Redis to a specific zone if needed
Zero-Downtime Rolling Updates — Detailed Configuration
This is the section most commonly misconfigured. The key parameter is order: start-first — Swarm starts the new replica, waits for the healthcheck to pass, and only then stops the old replica. This is the opposite of the default stop-first behavior, which causes downtime.
But order: start-first only works correctly when paired with a properly configured healthcheck:
services:
api:
image: myapp/api:${VERSION:-latest}
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:3000/health || exit 1"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s # Give the app 30s to start up before health evaluation begins
deploy:
replicas: 4
update_config:
parallelism: 1 # Update 1 replica at a time — conservative but safe
delay: 15s # Wait 15s between update batches
order: start-first # START the new replica FIRST, STOP the old one AFTER
failure_action: rollback # Automatically rollback everything if the update fails
monitor: 30s # Monitor for 30s after each update to catch delayed failures
max_failure_ratio: 0.3 # Allow up to 30% replica failures before triggering rollback
rollback_config:
parallelism: 0 # 0 = rollback all replicas simultaneously
delay: 0s # No delay during rollback — speed matters here
failure_action: continue # Continue rolling back even if errors occur
order: stop-first # During rollback: stop new version first, restore old after
start_period in the healthcheck is the parameter I spent the most time tuning correctly. If your app needs 20 seconds to connect to the database, load config, and warm up its cache — set start_period: 25s to give yourself a buffer. Without this, Swarm marks the container as failed immediately during startup, the container keeps restarting, and the rolling update never completes.
Testing and Monitoring
Deploy the Stack and Watch the Rolling Update
# Deploy the stack for the first time
docker stack deploy -c docker-stack.yml myapp
# List all services in the stack
docker stack services myapp
# Update the API to a new version — rolling update runs automatically
VERSION=v2.1.0 docker stack deploy -c docker-stack.yml myapp
# Watch the rolling update in real-time
# Look for: new replica in Running state BEFORE old replica reaches Shutdown
watch -n2 'docker service ps myapp_api --format "table {{.Name}}\t{{.Node}}\t{{.CurrentState}}\t{{.DesiredState}}"'
When the update is working correctly, you’ll see a moment where replicas+1 tasks exist simultaneously: the new replica in Running state while the old replica hasn’t yet transitioned to Shutdown. That’s your proof that zero-downtime is actually working.
Rollback and Placement Verification
# Roll back a service to the previous version (for manual intervention)
docker service rollback myapp_api
# Verify the database is running on the correct SSD node
docker service ps myapp_database --format 'table {{.Name}}\t{{.Node}}\t{{.CurrentState}}'
# View the distribution of API replicas across nodes
docker service ps myapp_api --filter 'desired-state=running'
# Aggregate logs from all replicas of a service
docker service logs -f --tail 100 myapp_api
Monitoring Resource Usage
# Resource usage for all containers in a service
docker stats $(docker ps --filter 'name=myapp_api' -q)
# View configured resource limits
docker service inspect myapp_api --pretty | grep -A 8 Resources
# Health status of all tasks
docker service ps myapp_api --format 'table {{.Name}}\t{{.Node}}\t{{.CurrentState}}\t{{.Error}}'
Once everything is set up, I like to run a “dry” rolling update: retag the same image with a new version number, trigger the update, and watch watch docker service ps. If the new replica reaches Running before the old one reaches Shutdown — zero-downtime deployment is working as intended.
These three things — Placement Constraints to route workloads to the right nodes, Docker Config/Secret to secure your configuration, and Rolling Updates with order: start-first for uninterrupted deployments — are what you need from day one when taking Swarm to production. None of it is complicated, but missing any one of the three will eventually cause an incident. I learned that the hard way.
