The 2 AM Wake-Up Call
About three years ago, I got paged at 2 AM: the entire shopping cart and session system for an e-commerce platform had gone down. The culprit? The Redis master got killed by the OOM killer, and nothing was there to automatically recover it. The dev team had to SSH in, manually promote a replica to master, update the application config, and restart services — nearly 40 minutes of downtime.
After that incident, I started taking Redis Sentinel seriously. This article is written from real deployment experience — not copy-pasted from the docs.
Why Redis Standalone Isn’t Enough for Production
Redis replication (master-replica) solves read scale-out and backup, but it does not handle failover automatically. When the master dies:
- Replicas know the master is down but won’t promote themselves
- The application keeps connecting to the old IP → cascading errors
- Manual intervention is required: promote a replica, update the connection string
A quick comparison: MongoDB has replica sets that self-elect a primary, MySQL 8 has Group Replication, PostgreSQL uses Patroni or Repmgr. Redis needs a similar mechanism — and Sentinel is exactly that.
Redis Cluster is another option, but it requires data sharding and significantly more complex setup. For most systems (session storage, caching, pub/sub), Sentinel is more than sufficient — and far easier to debug when things go wrong.
How Redis Sentinel Works
Sentinel is a separate process — not a Redis server — that runs alongside Redis. Under normal conditions it just watches, without interfering. When you run multiple Sentinels together, they form a quorum to make failover decisions in a distributed fashion.
The failover flow when a master dies:
- Each Sentinel continuously pings the master every 1 second
- When one Sentinel gets no response → it marks the master as
SDOWN(subjectively down) - That Sentinel asks the others — if enough agree to reach quorum → status is elevated to
ODOWN(objectively down) - The Sentinels elect a leader among themselves
- The Sentinel leader selects the most suitable replica to promote as the new master
- All remaining replicas are reconfigured to replicate from the new master
- Clients are notified via a pub/sub channel
Quorum requires an odd number of Sentinels, typically 3. With only 2 Sentinels, if a network partition occurs, neither can reach quorum alone — deadlock, and failover never triggers.
Preparing the Environment: 3 Servers, 1 Sentinel Cluster
The setup I actually use in production:
192.168.1.10— Redis Master + Sentinel 1192.168.1.11— Redis Replica 1 + Sentinel 2192.168.1.12— Redis Replica 2 + Sentinel 3
Install Redis on all 3 servers (Ubuntu 22.04):
sudo apt update && sudo apt install -y redis-server
sudo systemctl enable redis-server
Configuring Redis Master and Replicas
On the master server (192.168.1.10), edit /etc/redis/redis.conf:
bind 0.0.0.0
protected-mode no
port 6379
requirepass "your_strong_password"
masterauth "your_strong_password"
On the 2 replica servers (192.168.1.11 and 192.168.1.12), add the replication directive:
bind 0.0.0.0
protected-mode no
port 6379
replicaof 192.168.1.10 6379
requirepass "your_strong_password"
masterauth "your_strong_password"
Restart Redis on all 3 servers:
sudo systemctl restart redis-server
# Verify replication is working
redis-cli -a your_strong_password info replication
The expected output on the master will show role:master and connected_slaves:2.
Configuring Redis Sentinel
Create the Sentinel config file on all 3 servers. Important note: Sentinel will write to this file during failover, so the process needs write permission:
sudo nano /etc/redis/sentinel.conf
File contents — identical across all 3 servers, just change sentinel announce-ip to each server’s own IP:
port 26379
daemonize yes
logfile /var/log/redis/sentinel.log
pidfile /var/run/redis/redis-sentinel.pid
# Name the cluster after your project for clarity
sentinel monitor mymaster 192.168.1.10 6379 2
sentinel auth-pass mymaster your_strong_password
# How long (ms) without a response before marking as down
sentinel down-after-milliseconds mymaster 5000
# Number of replicas to sync simultaneously after failover (keep at 1)
sentinel parallel-syncs mymaster 1
# Failover timeout (ms)
sentinel failover-timeout mymaster 10000
# Declare the actual IP of this server — required in NAT/Docker environments
sentinel announce-ip 192.168.1.10
sentinel announce-port 26379
Start Sentinel:
sudo redis-sentinel /etc/redis/sentinel.conf
# Or use systemd (recommended for production)
sudo nano /etc/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/redis-sentinel /etc/redis/sentinel.conf
PIDFile=/var/run/redis/redis-sentinel.pid
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now redis-sentinel
Verifying Sentinel Is Working
# Connect to Sentinel
redis-cli -p 26379
# View cluster information
127.0.0.1:26379> SENTINEL masters
127.0.0.1:26379> SENTINEL replicas mymaster
127.0.0.1:26379> SENTINEL sentinels mymaster
# Get the current master address (what your application should query)
127.0.0.1:26379> SENTINEL get-master-addr-by-name mymaster
Testing Failover for Real
I always do this immediately after setup — never put anything into production without testing failover at least once:
# Kill Redis on the master server
sudo systemctl stop redis-server
# Watch the Sentinel log on another server
tail -f /var/log/redis/sentinel.log
# After ~10-15 seconds, check the new master
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
If everything is configured correctly, the Sentinel log will record the entire election and promotion process. The new master will be either 192.168.1.11 or 192.168.1.12, whichever replica had the higher replication offset at that moment.
Connecting Your Application Through Sentinel
This is where many people get confused: your application does not connect directly to the Redis master. The client asks Sentinel “where is the current master?”, then connects there. When failover happens, the client automatically discovers the new master — no application restart needed.
Python with redis-py:
from redis.sentinel import Sentinel
sentinel = Sentinel(
[
('192.168.1.10', 26379),
('192.168.1.11', 26379),
('192.168.1.12', 26379),
],
socket_timeout=0.5,
password='your_strong_password'
)
# Get master connection (writes)
master = sentinel.master_for('mymaster', socket_timeout=0.5)
# Get replica connection (reads)
replica = sentinel.slave_for('mymaster', socket_timeout=0.5)
master.set('key', 'value')
print(replica.get('key'))
Node.js with ioredis:
const Redis = require('ioredis');
const redis = new Redis({
sentinels: [
{ host: '192.168.1.10', port: 26379 },
{ host: '192.168.1.11', port: 26379 },
{ host: '192.168.1.12', port: 26379 },
],
name: 'mymaster',
password: 'your_strong_password',
sentinelPassword: 'your_strong_password',
});
Pitfalls I’ve Run Into
Some real-world gotchas from operating Sentinel in production:
- Forgetting to set
masterauth: After failover, the newly promoted master works fine, but the remaining replicas can’t replicate because auth fails. The symptom is subtle: replication lag spikes, data between nodes starts to diverge. Always set bothrequirepassandmasterauthon every node. - Sentinel not announcing its IP: In NAT or Docker environments, Sentinel reports the wrong internal IP — clients can’t connect after failover. Always set
sentinel announce-ipto the IP that clients can actually reach. - sentinel.conf file permission locked: Sentinel needs to write to its config file during failover. If the file is owned by root but Sentinel runs as the
redisuser, failover silently fails — no clear error, just timeouts that eventually self-resolve. - down-after-milliseconds set too low: In cloud environments, network latency can spike to 500ms–1s. A value of 1000ms can trigger false positive failovers. I typically use 5000ms in production.
Monitoring Sentinel in Production
Integrate Prometheus via redis_exporter — it supports Sentinel mode out of the box. If you don’t have a Prometheus stack yet, a periodic bash check script is perfectly adequate:
#!/bin/bash
MASTER=$(redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster | head -1)
echo "Current master: $MASTER"
# Alert if master cannot be determined
if [ -z "$MASTER" ]; then
echo "ALERT: Cannot determine Redis master!" | mail -s "Redis Sentinel Alert" [email protected]
fi
Redis Sentinel doesn’t solve every problem — for large datasets that genuinely need sharding, Redis Cluster is the next step. But for 90% of typical use cases (sessions, caching, queues), Sentinel is powerful enough, simple enough to debug at midnight, and most importantly: it means you won’t be getting that 2 AM phone call anymore.

