Redis Distributed Lock in Microservices: Preventing Race Conditions in Concurrent Processing

Development tutorial - IT technology blog
Development tutorial - IT technology blog

Last week, our team hit a nasty incident: the order system produced duplicates — the same order caused stock to be decremented twice. After tracing the logs, we found that two service instances had processed the same request within under 50ms, both reading the inventory count before the other had time to write to the DB. Classic race condition in a microservices environment.

The solution we ultimately chose is Redis Distributed Lock — simple, effective, and it’s been running stably ever since. This post documents the implementation for future reference by the team.

What Does a Race Condition Look Like in Microservices?

Picture this: you have an API endpoint /api/orders/create and a load balancer distributing requests across 3 instances. A customer double-clicks the “Place Order” button — two requests hit two different instances almost simultaneously. Both start executing the inventory check logic at the exact same moment:

# Both Instance A and Instance B run this code almost simultaneously
stock = db.query("SELECT stock FROM products WHERE id = ?", product_id)
if stock > 0:
    db.execute("UPDATE products SET stock = stock - 1 WHERE id = ?", product_id)
    create_order(product_id, user_id)

Instance A reads stock = 1, Instance B also reads stock = 1. Both see “in stock”, both create an order. Result: stock = -1 and the customer receives 2 order confirmations for a single item.

On a single server, database transactions + row locking handle this just fine. But with multiple services, multiple instances, spread across multiple servers — you need a locking mechanism at a higher layer. That’s where Distributed Lock comes in.

How Does Redis Distributed Lock Work?

Before processing a critical operation, a service must “acquire” a lock key in Redis. If acquired, it proceeds and releases when done. If it can’t acquire — because another instance holds it — it waits or retries. Simple in concept, but the elegance lies in Redis guaranteeing this acquisition step is completely atomic.

Redis provides the SET NX PX command to accomplish this:

# SET key value NX PX milliseconds
# NX = only set if the key does not exist
# PX = TTL in milliseconds

SET order:lock:product-123 "unique-token-abc" NX PX 5000
# Returns OK if the lock was acquired
# Returns nil if the key already exists (another service holds the lock)

This command is atomic — even if 100 instances call it simultaneously, exactly 1 will succeed in setting the value. This is the critical difference from a check-then-set approach using two separate commands: the gap between those two commands is precisely where race conditions slip through.

TTL is not optional — it’s what prevents the system from deadlocking permanently. If a service crashes immediately after acquiring a lock without releasing it, Redis automatically deletes the lock after the preset duration. Without TTL, a dead service could paralyze the entire pipeline.

Implementing Redis Distributed Lock with Python

Installation

pip install redis

RedisLock Class

import redis
import uuid
import time

class RedisLock:
    def __init__(self, redis_client: redis.Redis, lock_key: str, ttl: int = 5000):
        self.redis = redis_client
        self.lock_key = lock_key
        self.ttl = ttl  # milliseconds
        self.token = str(uuid.uuid4())  # unique token per lock instance

    def acquire(self, retry: int = 3, retry_delay: float = 0.1) -> bool:
        for attempt in range(retry):
            result = self.redis.set(
                self.lock_key,
                self.token,
                nx=True,
                px=self.ttl
            )
            if result:
                return True
            if attempt < retry - 1:
                time.sleep(retry_delay)
        return False

    def release(self):
        # Lua script ensures we only release our own lock
        lua_script = """
        if redis.call("get", KEYS[1]) == ARGV[1] then
            return redis.call("del", KEYS[1])
        else
            return 0
        end
        """
        self.redis.eval(lua_script, 1, self.lock_key, self.token)

    def __enter__(self):
        if not self.acquire():
            raise RuntimeError(f"Failed to acquire lock: {self.lock_key}")
        return self

    def __exit__(self, *args):
        self.release()

We use a Lua script to release the lock instead of calling DEL directly, for a specific reason: if the lock happens to expire right as you’re about to delete it, and another instance has already acquired a new lock, you must not delete theirs. The script checks the token before deleting — atomic, with no gap for race conditions to slip through.

Applying to Order Processing

import redis

redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)

def create_order(product_id: str, user_id: str):
    lock_key = f"order:lock:{product_id}"

    with RedisLock(redis_client, lock_key, ttl=10000) as lock:
        # Only one service can run this block at a time
        stock = db.query("SELECT stock FROM products WHERE id = ?", product_id)

        if stock <= 0:
            raise Exception("Out of stock")

        db.execute(
            "UPDATE products SET stock = stock - 1 WHERE id = ?",
            product_id
        )
        order_id = db.insert("INSERT INTO orders ...")
        return order_id
    # Lock is automatically released when exiting the with block

Handling Lock Acquisition Failure

from fastapi import HTTPException

def create_order_endpoint(product_id: str, user_id: str):
    lock_key = f"order:lock:{product_id}"
    lock = RedisLock(redis_client, lock_key, ttl=10000)

    if not lock.acquire(retry=5, retry_delay=0.2):
        raise HTTPException(
            status_code=429,
            detail="System is busy processing another request, please try again later"
        )

    try:
        result = process_order(product_id, user_id)
        return result
    finally:
        lock.release()  # Always release regardless of exceptions

Key Considerations for Real-World Usage

1. Setting the Right TTL

TTL too short → lock expires before processing completes → race condition returns.
TTL too long → if a service crashes, other services are stuck waiting indefinitely.

The formula I use: TTL = estimated processing time × 3, minimum 3 seconds. Not sure about processing time? Log the duration for 1–2 days, take the p99 value, then multiply by 3. Avoid setting it by gut feeling.

2. Choosing a Sufficiently Granular Lock Key

A lock key that’s too broad creates unnecessary bottlenecks — with 1,000 different products but only 1 global lock key, you can only process exactly 1 order at a time:

# Too broad — blocks all orders
lock_key = "order:lock:global"

# Just right — only blocks orders for a specific product
lock_key = f"order:lock:product:{product_id}"

# More granular if business logic allows
lock_key = f"order:lock:product:{product_id}:warehouse:{warehouse_id}"

3. Combining with an Idempotency Key

Distributed lock prevents race conditions, but it won’t prevent duplicates when a client retries after a timeout. These two mechanisms complement each other — use both if your system truly requires idempotency.

4. Monitoring Locks in Redis

# View all active lock keys
redis-cli KEYS "order:lock:*"

# Check remaining TTL of a lock
redis-cli TTL "order:lock:product-123"

# Get the value (token) of a lock
redis-cli GET "order:lock:product-123"

When debugging, I often format and inspect JSON payloads before setting them in Redis using toolcraft.app/en/tools/developer/json-formatter — much more convenient than installing an extension, no account needed, just open and use.

Conclusion

After the duplicate order incident last week, we’ve been using Redis Distributed Lock with no recurrence of the issue. Not because it’s a perfect solution, but because it’s simple enough to understand, reliable enough for production, and requires no additional infrastructure if Redis is already in your stack.

The biggest lesson: don’t assume database transactions are enough when multiple services run in parallel. Concurrency in a distributed environment is fundamentally different from single-server — you need to think about it at multiple levels. Redis Lock is the fastest way to plug one of the most common vulnerabilities.

Using Node.js? Check out the redlock library. Go has go-redis with equivalent features — same SET NX PX principle and Lua script release, just different syntax.

Share: