Practical Redis Caching: From DB Crashes to Handling Millions of Requests – ITFROMZERO

Table of Contents

Real-world Story: When the Database is Struggling to Keep Up

Six months ago, the system I managed kept freezing whenever the Marketing department ran campaigns. The AWS Dashboard showed the RDS PostgreSQL CPU consistently hitting 95%. At that time, upgrading the instance (Vertical Scaling) cost nearly $400/month more but didn’t solve the root cause. After analysis, I found that 80% of queries were repetitive SELECT statements. This is when Redis stepped in as a “shield” for the Database.

Many often view Redis as just a simple temporary storage. However, without a clear caching strategy, you’ll soon run into trouble with stale data or Cache Stampede scenarios that crash the system during peak hours.

Quick Deployment with Docker: Don’t Forget Resource Limits

I always prioritize using Docker to ensure consistent environments from local machines to servers. Using the Alpine version helps keep the image lightweight and faster to boot.

# Run Redis with a 512MB RAM limit to protect the server
docker run --name redis-itfromzero \
  -d -p 6379:6379 \
  redis:alpine \
  redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru

Important Note: The allkeys-lru parameter is a lifesaver when memory is full. It helps Redis automatically delete the least recently used keys instead of returning an OOM (Out Of Memory) error that disrupts the application.

Cache-Aside Strategy: Simple but Requires Precision

This is the most common pattern. The flow: The App checks Redis; if it’s a “Cache Hit,” it returns immediately (usually taking only 2-5ms). If it’s a “Cache Miss,” the app queries the DB and stores the result back in Redis for next time.

Implementation Example with Python

import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_product_detail(product_id):
    cache_key = f"product:{product_id}"
    # 1. Check cache
    cached_data = r.get(cache_key)
    if cached_data:
        return json.loads(cached_data)
    
    # 2. Query DB (simulating 200ms delay)
    product_from_db = db.query(f"SELECT * FROM products WHERE id={product_id}")
    
    # 3. Set cache with 1-hour TTL
    r.setex(cache_key, 3600, json.dumps(product_from_db))
    return product_from_db

My painful experience: Never forget to invalidate the cache when DB data changes. Once, I forgot the r.delete(cache_key) command during a price update, causing customers to see the promotional price while the checkout still reflected the old one.

Write-Through: When Data Needs Absolute Consistency

Contrary to Cache-Aside, Write-Through writes data to both Redis and the DB simultaneously. This ensures the cache is always up-to-date, completely eliminating stale data risks.

I often apply this to sensitive information like wallet balances or user sessions. Although write latency increases slightly because it waits for confirmation from both sides, you gain peace of mind regarding data accuracy.

The Art of TTL Management and the Jitter Technique

Setting TTL (Time To Live) is a balancing act. Based on production experience, I divide it into three main groups:

Low-volatility data (Categories, configurations): TTL from 24h to 48h.
Frequently updated content (Products, news): TTL 1h to 6h.
Hot data (Inventory levels): Extremely short TTL (30s – 2m) or no cache.

To avoid Cache Avalanche (where many keys expire at once, crashing the DB), use the Jitter technique. Instead of a hard-coded 3600s, use 3600 + random(0, 300) to spread out expiration times.

Effective Monitoring: Don’t Let Your Cache Run Blind

How do you know if your strategy is working? Use the redis-cli info stats command to check keyspace_hits and keyspace_misses.

The Cache Hit Ratio should be above 80%. If it’s too low, you’re wasting Redis resources on rarely accessed data. Also, be careful with the monitor command in production; it can reduce Redis performance by 30-50% just for logging.

The result after optimization? My RDS CPU dropped from 95% to a stable 15-20%. More importantly, the user experience is much smoother, with critical pages loading in under 100ms.