Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It

The team needs to ensure that exactly one worker processes a job at a time. Somebody recommends Redlock, somebody else cites Martin Kleppmann’s critique, the conversation derails into a debate about clock skew and Byzantine failures. Two weeks later they are still arguing and the feature is unshipped.

Distributed locks are one of those topics where the surface answer (“use Redlock”) is worse than the honest answer (“it depends, and most teams don’t need what Redlock provides”). This post is the practical version: when a simple SET NX EX is enough, when Redlock buys you something, and when you actually need a real consensus system. With code for each.

What you are actually trying to prevent

Be specific about the failure mode:

Two workers do the same expensive thing, e.g., both bill the customer.
Two workers race on a shared resource and corrupt it.
A “cron-like” job runs twice when one instance is enough.

For (3), a simple lock is fine — the cost of two cron runs is “one extra report email,” not “the customer was charged twice.” For (1) and (2), the lock alone is not a correctness boundary; you also need idempotency on the protected operation. Locks make collisions less likely; idempotency makes them harmless.

This is the non-obvious thing nobody tells you up front: a distributed lock is a probabilistic improvement, not a correctness guarantee, unless you pair it with idempotency on the protected work.

The simple version: SET NX EX

For 90% of “I need a lock” use cases, this is enough:

import { createClient } from 'redis';
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function withLock<T>(
  key: string,
  ttlSec: number,
  fn: () => Promise<T>,
): Promise<T | null> {
  const token = crypto.randomUUID();
  const acquired = await redis.set(key, token, { NX: true, EX: ttlSec });
  if (acquired !== 'OK') return null; // someone else has it

  try {
    return await fn();
  } finally {
    // Release only if we still hold it — avoids deleting someone else's lock.
    const release = `
      if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
      else
        return 0
      end`;
    await redis.eval(release, { keys: [key], arguments: [token] });
  }
}

// Usage
const result = await withLock('lock:billing:user-42', 30, async () => {
  await chargeCustomer(42);
});

The pieces:

SET key value NX EX ttl — atomic: set only if not exists, with expiration. Either acquires the lock or returns null.
EX ttl — the lock expires automatically. Worker dies → lock releases. No orphan locks.
The release Lua script — only deletes the key if the value matches our token. Prevents you from accidentally deleting a lock acquired by a later operation after yours expired.

This is the implementation the Redis docs themselves recommend for simple cases.

Why this is not “safe” against all failures

The simple lock has known holes. The ones that matter:

Lock TTL elapses while you’re still working. Worker A acquires for 30s. Work takes 35s. At 30s, the lock expires; Worker B acquires it; both A and B run concurrently for 5 seconds. Bad if the work is non-idempotent.

Redis primary fails over. Primary loses the lock keys before the replica catches up. Replica becomes the new primary; it doesn’t know about the lock. Both workers acquire it.

Redis blocks for a long time (GC, swap, fsync). The lock effectively expires from the client’s perspective even though Redis’s view is fine.

For (3) cron-like idempotency, none of these matter — the worst case is “the report sends twice,” which is fine. For (1) billing, they matter a lot, but the right fix is idempotency on the billing operation, not a fancier lock.

What Redlock adds

Redlock is an algorithm that runs against multiple independent Redis instances (typically 5). To acquire the lock, you must succeed on a majority (3 of 5). To release, you delete from all of them.

1. Get current time T1.
2. SET NX on each of N Redis instances in parallel, with a small per-instance timeout.
3. Wait until at least majority succeed and elapsed time T2 - T1 < (TTL - some safety margin).
4. The lock is held with effective TTL of (TTL - (T2 - T1)).
5. To release, DEL from all instances.

This handles two of the failures of single-Redis locks: independent failover (one Redis going down doesn’t lose the lock), and a single Redis pause (others still respond).

The reasonable critique (Kleppmann) is that Redlock still doesn’t help with TTL-expiring-mid-work. The defense (Antirez) is that Redlock is “for use cases where idempotency is impossible” — but that is rare in practice.

Library: redlock for Node. Don’t roll your own.

When a Postgres lock is the right answer

For correctness-critical locks (don’t double-charge, don’t double-deliver), use Postgres advisory locks:

-- Acquire (transactional — released on commit/rollback).
SELECT pg_try_advisory_xact_lock(hashtext('lock:billing:user-42'));

-- Or session-level:
SELECT pg_advisory_lock(hashtext('lock:billing:user-42'));
SELECT pg_advisory_unlock(hashtext('lock:billing:user-42'));

pg_try_advisory_xact_lock returns true/false based on whether the lock was acquired. The lock is held for the transaction. Other transactions calling the same will get false.

Why Postgres locks are actually safer:

Single source of truth. No quorum, no clock skew.
Tied to a transaction. If your work is in the same transaction, the lock and the work commit atomically.
No TTL surprises. Lock is held until commit/rollback, period.

The cost: you need a Postgres connection, which is more expensive than a Redis connection. For low-frequency locks (cron jobs, billing operations), this is fine.

When Zookeeper / etcd is the right answer

For locks that need to span across services, survive any kind of failure, and have explicit ordering guarantees, you want a real consensus system: ZooKeeper, etcd, Consul.

Examples:

Leader election in a distributed system.
“Only one instance of this service should be running” guarantees.
Coordinating long-running workflows where TTL-based locks are too brittle.

These come with operational cost — you have to run ZooKeeper or etcd, which is non-trivial. But for the use cases that demand them, there is no Redis-based shortcut.

The decision tree

Is the protected operation idempotent or have a separate dedup mechanism? Yes → simple SET NX EX is fine. Reach for a lock as a probabilistic optimization, not a correctness boundary.
Is the operation non-idempotent and lives in Postgres anyway? Use pg_try_advisory_xact_lock. Locked + work in one transaction = correct.
Is the operation non-idempotent and not in Postgres? Reconsider whether you can make it idempotent — usually you can. Idempotency keys, fingerprinting, “create-or-update” with a unique constraint.
Truly distributed leader election or workflow coordination? etcd / ZooKeeper / a real consensus system. Don’t fight this one with Redis.

For 95% of teams, the answer is (1) or (2). The “Redlock or not” debate matters for (3), and (3) is the case where idempotency is the real fix.

Lock contention is a code smell

If you find yourself adding more and more locks to coordinate things, that is usually a sign the data model is wrong. Some patterns that eliminate locks:

Per-key serialization. Instead of locking, route work for a key to the same worker. Kafka partitioning by key, sharded queues.

Compare-and-swap on the data. UPDATE foo SET x = $newx WHERE id = $1 AND x = $oldx. Postgres handles concurrency; no lock needed.

Outbox pattern. Pre-compute the side effect in a transaction; let a worker dispatch from an outbox table. Avoids the “did I already do this” question.

A well-designed system has very few locks. If your design has many, the design is the problem.

Diagnosing lock issues

When something goes wrong with locks, three things to check:

Lock duration. redis-cli MONITOR (carefully — high traffic) or instrument acquire/release. Long-held locks are a symptom of slow protected work.
Lock contention rate. What percentage of SET NX EX calls return null? If high, the lock is a bottleneck.
Lock leaks. Keys that never expire and are never released. Usually a process crash before release. The TTL bounds the damage.

For Redlock, also monitor: number of Redis instances that responded, time taken to acquire across the quorum, and how many lock-acquire attempts had to retry.

The takeaway

Distributed locks are not the silver bullet they appear to be. A simple SET NX EX with a release Lua script is enough for most uses; pair with idempotency for anything that has correctness consequences. Redlock is for narrow scenarios where idempotency is impossible and a single-Redis lock is genuinely too risky. Postgres advisory locks are underrated and often the right choice. Real consensus systems exist for real consensus problems.

The next time someone says “we need a distributed lock,” ask first whether the operation can be made idempotent. The answer is usually yes, and the lock becomes a nice-to-have instead of a correctness boundary.

A note from Yojji

The kind of distributed-systems judgment that picks the simplest correct lock — and recognizes when the real fix is idempotency — is the kind of senior backend engineering Yojji’s teams bring to client work.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and distributed-systems engineering — including the lock-and-coordination decisions that decide whether a system stays simple or accumulates accidental complexity.