Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order
The batch job runs fine locally and explodes in production with ERROR: 40P01 deadlock detected. Here is how to make Postgres tell you exactly which queries fought, how to reproduce the race in a test script, and the three lock-ordering rules that eliminate deadlocks without guesswork.
The batch inventory sync ran fine for six months. Then one Tuesday at 2:14 p.m., while the sync overlapped with checkout traffic, the error rate jumped from zero to 12%. The logs were full of this:
ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 67890;
Process 67890 waits for ShareLock on transaction 12345.
HINT: See server log for query details.
CONTEXT: while updating tuple (0,15) in relation "inventory"
The first reaction on every team is the same: “Postgres is broken.” It is not. Postgres did exactly what it is supposed to do. It detected a circular wait between two transactions, picked one as the victim, rolled it back, and let the other finish. The bug is in the application code that handed Postgres two transactions with opposite lock orders. This post is about making Postgres tell you the full story, reproducing that story in a test script, and the three rules that stop deadlocks from being your problem.
No restarts. No connection pool increases. Just working code.
What a deadlock actually is
A deadlock is not a crash. It is not corruption. It is a traffic jam with no exit.
Transaction A locks row 1, then tries to lock row 2. Transaction B locks row 2, then tries to lock row 1. Each holds what the other wants. Postgres waits for deadlock_timeout (default 1 second), builds a wait-for graph, realizes the cycle, and kills the cheaper transaction to break it.
The victim receives error code 40P01. Every query it ran inside that transaction is rolled back. The survivor commits. From the user’s perspective, one request failed with a 500 and the other succeeded. If your retry logic does not know what 40P01 means, the user sees the 500. If your retry logic does know, the victim retries and usually succeeds on the second attempt.
The important insight: deadlocks are almost always deterministic. The same two code paths, hitting the same rows, in the same order, will deadlock every time. Which means you can reproduce them locally and fix the ordering.
Make Postgres tell you exactly who fought
The default log output for a deadlock is useless. You get the error, but not the queries that caused it. Change three settings.
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET deadlock_timeout = '1s';
ALTER SYSTEM SET log_line_prefix = '%t [%p] [%l-1] db=%d,app=%a ';
SELECT pg_reload_conf();
log_lock_waits = onmeans any lock wait that exceedsdeadlock_timeoutgets written to the log before the deadlock is resolved. You see both queries, not just the victim’s stack trace.deadlock_timeout = '1s'is the default, but many cloud providers silently change it. Check it.log_line_prefixincludes the process ID (%p), log line (%l), database (%d), and application name (%a). When you grep the log for the two PIDs in the deadlock detail, you find the exact queries.
After the change, the log looks like this:
2026-05-14 14:14:32 [12345] [4] db=shop,app=batch_sync
process 12345 still waiting for ShareLock on transaction 67890
after 1000.145 ms
Context: SQL statement "UPDATE inventory SET qty = qty - 1 WHERE sku = 'A-002'"
2026-05-14 14:14:32 [67890] [3] db=shop,app=checkout_api
process 67890 detected deadlock with 12345
Context: SQL statement "UPDATE inventory SET qty = qty - 1 WHERE sku = 'A-001'"
Now you know: the batch sync updated A-001 first, then A-002. The checkout API updated A-002 first, then A-001. That opposite order is the entire bug.
You can also monitor the aggregate rate with this query:
SELECT datname, deadlocks
FROM pg_stat_database
WHERE datname = 'shop';
Alert when deadlocks increases by more than one per minute. A healthy system has near-zero deadlocks. If you are seeing them regularly, you have an ordering bug somewhere.
Reproduce the deadlock in a test script
Because deadlocks are deterministic, you can reproduce them with two Node.js clients and a little timing. This script intentionally creates the opposite-order collision.
// deadlock-test.ts
import pg from 'pg';
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
async function worker(name: string, firstSku: string, secondSku: string) {
const client = await pool.connect();
try {
await client.query('BEGIN');
console.log(`[${name}] locking ${firstSku}`);
await client.query('UPDATE inventory SET qty = qty - 1 WHERE sku = $1', [firstSku]);
// Small delay to guarantee the other worker grabs its first lock
await new Promise(r => setTimeout(r, 200));
console.log(`[${name}] attempting ${secondSku}`);
await client.query('UPDATE inventory SET qty = qty - 1 WHERE sku = $1', [secondSku]);
await client.query('COMMIT');
console.log(`[${name}] committed`);
} catch (err: any) {
await client.query('ROLLBACK').catch(() => {});
console.log(`[${name}] rolled back: ${err.code} ${err.message}`);
} finally {
client.release();
}
}
async function main() {
await pool.query(`
CREATE TABLE IF NOT EXISTS inventory (
sku TEXT PRIMARY KEY,
qty INT NOT NULL
)
`);
await pool.query(`INSERT INTO inventory (sku, qty) VALUES ('A-001', 100), ('A-002', 100)
ON CONFLICT (sku) DO UPDATE SET qty = 100`);
// Worker A grabs A-001 first, then A-002.
// Worker B grabs A-002 first, then A-001.
await Promise.all([
worker('A', 'A-001', 'A-002'),
worker('B', 'A-002', 'A-001'),
]);
await pool.end();
}
main();
Run it:
npx ts-node deadlock-test.ts
You will see one worker commit and the other roll back with 40P01 deadlock detected. You have now reproduced the exact failure your production logs show. The fix is not in Postgres. The fix is in the order those UPDATE statements run.
Fix 1: Always acquire locks in the same order
The single most effective deadlock prevention is deterministic lock ordering. If every code path that touches rows A-001 and A-002 locks them in the same sequence — say, sorted by primary key — no two transactions can ever wait on each other.
Before (dangerous):
async function reserveItems(items: { sku: string; qty: number }[]) {
for (const item of items) {
await db.query('UPDATE inventory SET qty = qty - $1 WHERE sku = $2', [item.qty, item.sku]);
}
}
If one caller passes ['A-001', 'A-002'] and another passes ['A-002', 'A-001'], you have a deadlock.
After (safe):
async function reserveItems(items: { sku: string; qty: number }[]) {
const sorted = [...items].sort((a, b) => a.sku.localeCompare(b.sku));
for (const item of sorted) {
await db.query('UPDATE inventory SET qty = qty - $1 WHERE sku = $2', [item.qty, item.sku]);
}
}
Same queries. Same rows. Same result. But now both transactions always lock A-001 before A-002. The cycle is impossible.
This applies to any multi-row write: batched updates, parent-child inserts, ledger entries, seat reservations. Sort by a stable key before locking.
Fix 2: Collapse multiple row locks into one statement
If you can express the work as a single query, Postgres acquires all locks in a single planning step, which is internally consistent and avoids interleaving.
Before:
for (const sku of skus) {
await db.query('UPDATE inventory SET qty = qty - 1 WHERE sku = $1', [sku]);
}
After:
await db.query(
'UPDATE inventory SET qty = qty - 1 WHERE sku = ANY($1::text[])',
[skus],
);
One statement, one set of locks acquired in the order Postgres processes the index scan. No interleaving, no deadlock.
This also applies to deletes, upserts, and SELECT ... FOR UPDATE batches.
Fix 3: Keep transactions short and free of external calls
The longer a transaction holds locks, the wider the window for a deadlock. The worst pattern is this:
await client.query('BEGIN');
await client.query('SELECT * FROM orders WHERE id = $1 FOR UPDATE', [orderId]);
const payment = await stripe.paymentIntents.create({ amount: 5000, currency: 'usd' }); // 800ms lock hold
await client.query('UPDATE orders SET status = $1 WHERE id = $2', ['paid', orderId]);
await client.query('COMMIT');
That Stripe call holds the row lock for 800 milliseconds. Every other request for that order queues. If another transaction already holds a related lock, you have a deadlock. If Stripe is slow, you have a performance incident. If Stripe is down, you have a hanging transaction.
The fix: do external work outside the transaction.
const payment = await stripe.paymentIntents.create({ amount: 5000, currency: 'usd' });
await client.query('BEGIN');
await client.query('SELECT * FROM orders WHERE id = $1 FOR UPDATE', [orderId]);
await client.query('UPDATE orders SET status = $1, payment_intent = $2 WHERE id = $3', [
'paid',
payment.id,
orderId,
]);
await client.query('COMMIT');
Lock time drops from 800ms to sub-millisecond. The deadlock window shrinks to almost nothing. If the Stripe call fails, no database transaction ever started, so there is nothing to roll back.
The retry wrapper you still need
Even with perfect lock ordering, you cannot prevent every deadlock. A VACUUM running in the background, a concurrent ALTER TABLE, or two transactions inserting the same unique key at the same moment can still trigger 40P01. The application must retry the victim.
Here is a small wrapper that catches 40P01 and retries with capped exponential backoff. It only retries the transaction, not the whole HTTP request, so side effects outside the database are not duplicated.
import { PoolClient } from 'pg';
const DEADLOCK_ERROR = '40P01';
const MAX_RETRIES = 3;
export async function withRetry<T>(
fn: (client: PoolClient) => Promise<T>,
pool: pg.Pool,
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
const client = await pool.connect();
try {
return await fn(client);
} catch (err: any) {
lastError = err;
if (err.code !== DEADLOCK_ERROR) throw err;
// Capped exponential backoff: 50ms, 100ms, 200ms
const delay = Math.min(50 * 2 ** attempt, 200);
await new Promise(r => setTimeout(r, delay));
} finally {
client.release();
}
}
throw lastError;
}
Usage:
import { withRetry } from './retry';
const result = await withRetry(async (client) => {
await client.query('BEGIN');
await client.query('UPDATE inventory SET qty = qty - 1 WHERE sku = $1', [sku]);
const { rows } = await client.query('SELECT qty FROM inventory WHERE sku = $1', [sku]);
await client.query('COMMIT');
return rows[0];
}, pool);
Do not retry indefinitely. Three attempts with backoff is enough. If you are still deadlocking after three tries, you have an ordering bug that retry is hiding, not fixing.
Why SERIALIZABLE is not the answer
A common suggestion is to switch the transaction isolation level to SERIALIZABLE and “let Postgres handle it.” Postgres does handle it — by aborting more transactions. SERIALIZABLE detects read-write conflicts that READ COMMITTED ignores, and its resolution mechanism is the same 40P01 rollback. Moving to SERIALIZABLE usually increases your deadlock rate, not decreases it. Fix the lock order instead.
The practical checklist
Before you declare a deadlock investigation done, verify:
- Logging is on.
log_lock_waits,deadlock_timeout, and a usefullog_line_prefixare configured. - The race is reproduced. The test script shows the same
40P01locally. - Multi-row writes are sorted. Every code path that updates more than one row orders by primary key before locking.
- Batched when possible.
WHERE id = ANY(...)replaces loops of single-row updates. - No external calls inside transactions. Payment APIs, email sends, and PDF renders happen before
BEGINor afterCOMMIT. - Retry wrapper is deployed.
40P01victims retry three times with jitter. - Deadlocks are monitored.
pg_stat_database.deadlocksis alerted, and the rate trends toward zero after each fix.
Deadlocks are not a database failure. They are a feedback mechanism. Postgres is telling you that two pieces of your code disagree on who gets to go first. Listen, reproduce, sort the order, and the 500s disappear.
A note from Yojji
Database reliability work — configuring the right logging prefix, reproducing races in test scripts, and enforcing deterministic lock ordering across a codebase — is the kind of unglamorous engineering that keeps services stable when traffic patterns change. Yojji engineers regularly do this kind of deep-dive backend work in the Node.js and Postgres systems they build and operate for clients.
Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem, cloud platforms, and the kind of resilient backend architecture that turns mysterious production errors into reproducible, fixable problems.