Redis Monitoring: Don’t Wait for RAM to Exhaust Before Taking Action! (Prometheus + Redis Exporter) – ITFROMZERO

Table of Contents

A Familiar Scene: Manual SSH Checks or Automated Monitoring?

When I first started working, every time the system slowed down or the app reported cache errors, the first thing I did was frantically SSH into the server. I would type redis-cli info and strain my eyes reading a long wall of text to see how much RAM was left. It felt very reactive. Now, I just open a Grafana dashboard while sipping coffee to see exactly how the Singapore Redis node is “breathing.” The difference lies in automated monitoring.

Broadly speaking, you’ll usually fall into three scenarios when checking Redis health:

Option 1: Manual Check: Using redis-cli info. This method is fast and requires no installation. However, it’s just a snapshot. You can’t know if there was a RAM spike at 2 AM.
Option 2: Using SaaS (Datadog, New Relic): Very high-end, just a few clicks and you’re done. But the cost is very “steep.” If you have 10-20 Redis nodes, the monthly bill will make your boss frown.
Option 3: Prometheus + Redis Exporter: This is the current gold standard. It’s free, stores long-term historical data, and is extremely flexible.

Every Option Has Its Price

To help you choose the right tool, let’s compare the actual pros and cons of each:

1. Manual Check via CLI

Pros: Available on every server.
Cons: No trend charts. Easy to miss transient errors (micro-bursts).

2. SaaS Monitoring

Pros: Beautiful interface, excellent support.
Cons: Expensive. Data sent externally might violate security policies of banks or large projects.

3. Prometheus + Redis Exporter

Pros: Full control over your data. Integrates well with Alertmanager to send alerts via Telegram as soon as something happens.
Cons: Requires initial setup effort.

Why I Always Prioritize Redis Exporter

If you’re running a Production environment, don’t hesitate—choose Prometheus. Redis is an in-memory database, so it’s extremely sensitive to RAM. A bug in logic causing a RAM overflow can freeze the entire service in seconds. Redis Exporter acts as a “translator.” It converts raw metrics from Redis into a format that Prometheus can read and graph.

Practical Implementation Guide

I assume you already have Prometheus set up. If not, check out the basic Prometheus installation guide on the blog. Here are the steps to get the data flowing.

Step 1: Install Redis Exporter with Docker

Using Docker is the fastest and cleanest way. You can run the Exporter right next to the Redis node for easy management.

# Run Redis Exporter connecting to Redis (assuming IP is 192.168.1.100)
docker run -d \
  --name redis_exporter \
  -p 9121:9121 \
  oliver006/redis_exporter:latest \
  --redis.addr=redis://192.168.1.100:6379

Note: Always set a password for Redis in real environments. In that case, add the auth parameter:

docker run -d \
  --name redis_exporter \
  -p 9121:9121 \
  oliver006/redis_exporter:latest \
  --redis.addr=redis://192.168.1.100:6379 \
  --redis.password=SuperHardPassword2024

Step 2: Configure Prometheus Scrape

Once the Exporter is running, it opens port 9121. You need to tell Prometheus to visit this address periodically. Open prometheus.yml and add:

scrape_configs:
  - job_name: 'redis_monitoring'
    static_configs:
      - targets: ['192.168.1.105:9121']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'Redis-Core-Production'

Restart Prometheus. Access the web interface and type redis_up. If the result is 1, you’ve succeeded!

3 “Life-or-Death” Metrics You Must Pay Attention To

Don’t get lost in hundreds of charts. Focus on three numbers that decide the fate of your system:

1. Memory Usage

This is the most critical metric. If used_memory hits 90% of maxmemory, the system will start deleting keys based on the eviction policy or refuse to write new data.

Metric: redis_memory_used_bytes
Pro-tip: Set an Alert at 80% to give yourself time to upgrade RAM or optimize your code.

2. Cache Hit Rate

This metric measures cache efficiency. If the Hit Rate is below 50%, it means the application is fetching data from the main database too often. This defeats the purpose of using Redis.

Formula: hits / (hits + misses)
Target: A stable system usually has a Hit Rate above 80%.

3. Latency

Redis is famous for sub-1ms response times. If latency spikes to 10ms – 50ms, check it immediately. Someone might be running the KEYS * command on Production—a fatal mistake that blocks Redis’s single-threaded process.

Final Thoughts from the Field

A small tip: Don’t waste time building a Grafana Dashboard from scratch. Use Dashboard ID: 11835. This is the standard template most trusted by the community. Just import this ID into Grafana, and you’ll have a professional interface immediately.

Monitoring isn’t just about knowing when the system is down. The ultimate goal is knowing when it’s about to crash so you can intervene before customers start complaining. Happy (and stress-free) Redis administration!