The Problem: Docker Shows Green but the App is ‘Dead’
Docker reports the container status as Up, but accessing the website results in a 502 Bad Gateway error. This is an extremely frustrating situation. You check the logs and find that the application has been frozen for a long time.
The issue is that Docker only monitors the main process. If the app suffers from an Out of Memory (OOM) error, a deadlock, or loses its DB connection but the process doesn’t exit, Docker still considers it healthy. The system then falls into a “vegetative state”: not completely dead, but unable to function.
In my first project, I was overconfident and only used restart: always. When server overload caused the Database to respond slowly, the Node.js app hung on connections, yet the container still showed a bright green status. Customers complained constantly while I remained convinced the system was stable. To handle this properly, you need the duo: Restart Policies and Healthcheck.
1. Self-healing with Restart Policies
Restart Policies help containers stand back up after a power failure or a crash. Docker provides four main options:
- no: The default. Docker watches the container die and does nothing.
- always: Always restarts regardless of the reason it stopped. If you reboot the server, this container also automatically restarts with the Docker daemon.
- on-failure: Only restarts if the exit code is non-zero. Suitable for data processing jobs that need to finish and then stop.
- unless-stopped: Similar to
alwaysbut with a plus point. If you proactively use thedocker stopcommand, it will stay down until you manually start it again.
The configuration in docker-compose.yml is very clean:
services:
web-app:
image: nginx:1.25-alpine
restart: unless-stopped
I usually prefer unless-stopped. It helps the app come back after server maintenance while avoiding the annoyance of containers automatically restarting when I intentionally shut them down for debugging.
2. Healthcheck: A Private “Doctor” for Your Container
While Restart Policies only know if a container is alive or dead, Healthcheck knows if the application is working effectively. It’s like periodically sending a signal to ask: “Hey, are you still responsive?”
Key Parameters You Need to Master
- test: The check command (usually using
curlorpg_isready). - interval: Check frequency (e.g., once every 30 seconds).
- timeout: How long to wait for a response before considering the check a failure.
- retries: Number of consecutive failures (e.g., 3 times) before labeling it
unhealthy. - start_period: Wait time for the app to boot. A Java Spring Boot app might take 45 seconds to start; give it time to prepare before starting the inspection.
3. Practical Configuration for a Node.js Application
Suppose you have an application running on port 3000. Don’t just hope for the best; force Docker to check it.
services:
my-api:
image: node-app:v1
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
In this configuration, the curl -f command will return an error if the /health endpoint responds with a 500 code or times out. Thanks to start_period: 40s, Docker will patiently wait for the app to finish loading libraries before it starts scoring its health.
Embedding Healthcheck into the Dockerfile
The best way is to package this mechanism into the image so that every environment is protected:
FROM node:18-alpine
RUN apk add --no-cache curl
# ... setup app ...
HEALTHCHECK --interval=1m --timeout=3s --retries=3 \
CMD curl -f http://localhost:3000/ || exit 1
CMD ["npm", "start"]
4. Smooth Coordination Between Services
A common mistake is the app starting faster than the Database, leading to connection errors from the start. Instead of using complex wait scripts, use depends_on combined with a health condition:
services:
db:
image: postgres:15
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
interval: 10s
app:
image: my-node-app
depends_on:
db:
condition: service_healthy
Now, the app container will patiently wait until the db is truly ready to accept connections. This approach is much more professional and reliable than blindly using sleep 10.
5. Resource Considerations: Don’t Over-Check
Healthchecks aren’t free. Each run consumes a small amount of CPU and RAM. If you set interval: 1s for 20 containers, the server will waste resources just checking itself.
A reasonable number is usually 30 seconds to 1 minute for standard services. Prioritize lightweight check commands and avoid heavy SQL queries just to see if the DB is alive.
Conclusion
Combining Restart Policies and Healthchecks provides you with a self-healing system. You will no longer have to wake up at 2 AM just to type docker restart.
Three rules of thumb:
- Use
unless-stoppedfor most web services. - Always include a
start_periodso the app isn’t killed before it can even start. - Use
service_healthyto manage the execution order of dependent services.
Apply these techniques immediately to make your applications more resilient and stable.
