The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service
A single slow report endpoint consumed every connection in the pool, and your login API started timing out. Here is how the bulkhead pattern isolates failure domains in Node.js — with semaphores, separate pools, and the fast-fail logic that keeps the rest of your service alive.
The /export endpoint was new. It runs a heavy analytics query, streams a CSV to the client, and takes about eight seconds on a warm cache. You shipped it on Tuesday. By Thursday afternoon, /login, /search, and /checkout were all timing out.
The database was fine. CPU was at 30%. The Redis latency graph was flat. What happened was simpler and more embarrassing: the /export query grabbed every connection in the shared Postgres pool and held them for eight seconds. Every other request that needed the database queued behind those exports. The event loop was not blocked — Node.js was doing its job — but the shared resource was monopolized by one endpoint. /login did not fail because /login was broken; it failed because /export refused to get out of the way.
This is the exact scenario the bulkhead pattern prevents. A bulkhead is a wall in a ship’s hull that isolates compartments. If one floods, the ship stays afloat. In software, a bulkhead isolates failure domains — connection pools, worker threads, memory budgets, or request queues — so a surge or slowdown in one domain cannot drown the others. This post is the four bulkhead techniques that matter in Node.js, the code for each, and the load-test proof that they work.
Why Node.js needs bulkheads more than you think
Node.js has a reputation for resilience because the event loop does not block on I/O. That reputation is half true. The event loop keeps ticking, but your resources — database connections, file descriptors, memory, worker threads — are still finite and shared. If one endpoint hoards them, every endpoint suffers.
The common failure modes:
- Connection pool exhaustion. A single slow query or a burst of long-running requests grabs every slot in a shared pool. Every other request waits in the pool queue or times out.
- Worker pool exhaustion. CPU-bound tasks (PDF generation, image resizing, data parsing) submitted to a shared worker pool or
Promise.allqueue starve out other tasks. - Memory pressure. One endpoint accepts a 100MB JSON payload and parses it into a nested object. The heap spikes, GC pauses lengthen, and every request gets slower.
- Downstream cascade. One upstream dependency slows down. Because there is no per-dependency concurrency limit, every request type that touches that dependency piles up together.
The bulkhead pattern fixes all four by partitioning resources. Here is how.
Bulkhead 1: Per-route concurrency limits with semaphores
The simplest and most effective bulkhead is a semaphore that caps how many concurrent requests a single route can process. Not rate-limiting per client IP — limiting total in-flight work for the route itself. If the limit is hit, new requests fail fast with a 503 instead of queuing indefinitely.
This is the difference between “the service is down” and “exports are temporarily unavailable; everything else works.”
import { Semaphore } from 'async-mutex';
import type { Request, Response, NextFunction } from 'express';
class Bulkhead {
private semaphore: Semaphore;
private queueSize: number;
private waiting = 0;
constructor(concurrency: number, queueSize: number) {
this.semaphore = new Semaphore(concurrency);
this.queueSize = queueSize;
}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.waiting >= this.queueSize) {
const err = new Error('Bulkhead queue full');
(err as any).statusCode = 503;
throw err;
}
this.waiting++;
const release = await this.semaphore.acquire();
this.waiting--;
try {
return await fn();
} finally {
release();
}
}
}
// One bulkhead per failure domain.
const exportBulkhead = new Bulkhead(3, 5); // 3 concurrent exports, queue of 5
const searchBulkhead = new Bulkhead(20, 50); // 20 concurrent searches, queue of 50
function withBulkhead(bulkhead: Bulkhead) {
return async (req: Request, res: Response, next: NextFunction) => {
try {
await bulkhead.execute(async () => {
// Continue into the actual route handler
await new Promise<void>((resolve, reject) => {
res.on('finish', resolve);
res.on('error', reject);
next();
});
});
} catch (err: any) {
if (err.statusCode === 503) {
res.status(503).json({ error: 'Service temporarily overloaded for this endpoint' });
} else {
next(err);
}
}
};
}
Use it as Express middleware:
app.get('/export', withBulkhead(exportBulkhead), async (req, res) => {
const csv = await runHeavyAnalyticsQuery();
res.setHeader('Content-Type', 'text/csv');
res.send(csv);
});
The numbers matter. I set /export to 3 concurrent because the analytics query is I/O-heavy and the database can comfortably run three of them without hurting OLTP traffic. The queue size of 5 means a small burst waits briefly; anything beyond that gets a fast 503. The client can retry with backoff, or the user sees “export is busy” instead of a 30-second timeout.
Bulkhead 2: Separate connection pools per domain
Sharing one connection pool across every query in your app is the default, and it is wrong once you have more than one query shape. A single pool means a slow analytical query and a fast user lookup draw from the same finite set of connections. The fix is two pools: one for OLTP, one for analytics, and possibly a third for background jobs.
import { Pool } from 'pg';
const oltpPool = new Pool({
host: process.env.PGHOST,
database: process.env.PGDATABASE,
max: 20, // tight: these are fast queries
connectionTimeoutMillis: 2000,
idleTimeoutMillis: 30000,
});
const analyticsPool = new Pool({
host: process.env.PGHOST,
database: process.env.PGDATABASE,
max: 8, // loose: fewer, longer connections
connectionTimeoutMillis: 5000,
idleTimeoutMillis: 60000,
// Optional: route to a read replica
host: process.env.PG_ANALYTICS_HOST ?? process.env.PGHOST,
});
// Fast user lookup — OLTP pool
export async function getUser(userId: string) {
const client = await oltpPool.connect();
try {
const result = await client.query('SELECT * FROM users WHERE id = $1', [userId]);
return result.rows[0];
} finally {
client.release();
}
}
// Heavy report — analytics pool
export async function runReport(startDate: string, endDate: string) {
const client = await analyticsPool.connect();
try {
const result = await client.query(
'SELECT … FROM events WHERE ts BETWEEN $1 AND $2 GROUP BY …',
[startDate, endDate]
);
return result.rows;
} finally {
client.release();
}
}
The max values are not random. OLTP queries are fast — 5ms to 50ms — so 20 connections can serve hundreds of requests per second. Analytics queries run for seconds, so 8 connections is plenty; if more than 8 reports are requested concurrently, the analyticsPool queues them internally, and the route-level bulkhead from step one gives you the graceful 503.
If you want to go further, give the analytics pool a different host pointing to a read replica. Now the bulkhead is physical: the heavy queries cannot even saturate the primary’s network or disk I/O.
Bulkhead 3: Isolating CPU-bound work to a dedicated worker pool
Node.js worker threads are not just for offloading math. They are a bulkhead for CPU-bound tasks that would otherwise monopolize the event loop and delay every I/O callback. The mistake most teams make is creating a new worker per task, or worse, running the task on the main thread because “it only takes 200ms.”
Two hundred milliseconds on the main thread is 200 milliseconds where no HTTP request handler, no database callback, and no timer fires.
A proper worker bulkhead is a fixed-size pool with a bounded queue. If the queue is full, fail fast.
import { Worker } from 'node:worker_threads';
import { cpus } from 'node:os';
import { EventEmitter } from 'node:events';
interface Task {
id: number;
payload: unknown;
resolve: (value: unknown) => void;
reject: (reason: Error) => void;
}
class WorkerPool {
private workers: Worker[] = [];
private queue: Task[] = [];
private active = new Map<number, Task>();
private taskId = 0;
private maxQueue: number;
constructor(script: string, poolSize: number, maxQueue: number) {
this.maxQueue = maxQueue;
for (let i = 0; i < poolSize; i++) {
const worker = new Worker(script);
worker.on('message', (msg) => {
const task = this.active.get(msg.id);
if (task) {
this.active.delete(msg.id);
msg.error ? task.reject(new Error(msg.error)) : task.resolve(msg.result);
this.pump();
}
});
this.workers.push(worker);
}
}
execute(payload: unknown): Promise<unknown> {
if (this.queue.length >= this.maxQueue) {
return Promise.reject(new Error('Worker pool queue full'));
}
return new Promise((resolve, reject) => {
this.queue.push({ id: ++this.taskId, payload, resolve, reject });
this.pump();
});
}
private pump() {
for (const worker of this.workers) {
if (this.queue.length === 0) break;
// A real implementation tracks which workers are busy.
// Simplified: round-robin or idle tracking omitted for brevity.
const task = this.queue.shift()!;
this.active.set(task.id, task);
worker.postMessage({ id: task.id, payload: task.payload });
}
}
terminate() {
return Promise.all(this.workers.map((w) => w.terminate()));
}
}
// Usage
const pdfPool = new WorkerPool('./pdf-worker.js', cpus().length, 20);
app.post('/invoices/:id/pdf', async (req, res) => {
try {
const pdf = await pdfPool.execute({ invoiceId: req.params.id });
res.contentType('application/pdf');
res.send(pdf);
} catch (err: any) {
if (err.message === 'Worker pool queue full') {
res.status(503).json({ error: 'PDF generation overloaded' });
} else {
res.status(500).json({ error: err.message });
}
}
});
The worker script ./pdf-worker.js stays simple: it listens for messages, does the CPU work, and posts the result back. The main thread never blocks. The pool size is cpus().length because more workers than cores just context-switch. The queue of 20 means a burst of PDF requests waits briefly; anything beyond that gets a fast rejection.
If you do not want to write a pool by hand, piscina and workerpool are production-grade npm packages that do exactly this with better idle tracking and error handling.
Bulkhead 4: Per-dependency concurrency limits
The final bulkhead is for downstream services. If your app calls three external APIs — payment, inventory, and shipping — a slowdown in inventory should not consume every outbound HTTP connection and starve out payment calls.
Most HTTP clients (Axios, node-fetch, undici) share a global connection pool by default. You need separate pools, or at least separate concurrency limits, per dependency.
import http from 'node:http';
import https from 'node:https';
// Separate agents = separate connection pools
const paymentAgent = new https.Agent({
maxSockets: 20,
maxFreeSockets: 10,
timeout: 5000,
});
const inventoryAgent = new https.Agent({
maxSockets: 10,
maxFreeSockets: 5,
timeout: 3000,
});
const shippingAgent = new https.Agent({
maxSockets: 10,
maxFreeSockets: 5,
timeout: 3000,
});
async function callPaymentApi(payload: unknown) {
return fetch('https://payment.internal/charge', {
method: 'POST',
body: JSON.stringify(payload),
agent: paymentAgent,
});
}
async function callInventoryApi(sku: string) {
return fetch(`https://inventory.internal/stock/${sku}`, {
agent: inventoryAgent,
});
}
maxSockets is the bulkhead. If the inventory API slows down and its 10 sockets all hang waiting for a response, the payment API still has 20 sockets available. Without this separation, a single slow dependency can exhaust the global maxSockets default — usually unlimited or very high — and create a pile-up that affects every outbound call.
Pair this with a per-dependency timeout and circuit breaker (covered in earlier posts) and you have a complete resilience stack.
How to test it
The proof is a load test with two routes: one fast, one artificially slow.
// Artificially slow endpoint, no bulkhead
app.get('/slow', async (req, res) => {
await new Promise((r) => setTimeout(r, 5000));
res.json({ ok: true });
});
// Fast endpoint
app.get('/fast', async (req, res) => {
res.json({ ok: true });
});
Run autocannon against both:
# 50 concurrent connections to /slow
autocannon -c 50 -d 30 http://localhost:3000/slow &
# Simultaneously, 50 connections to /fast
autocannon -c 50 -d 30 http://localhost:3000/fast
Without bulkheads, /fast latency jumps from 5ms to multiple seconds because the event loop is not blocked, but your shared database pool, worker pool, or memory is saturated by /slow. With the bulkheads above — a semaphore on /slow limiting it to 3 concurrent, a separate database pool, and separate agents — /fast stays flat at 5ms even while /slow queues and 503s.
That is the test you run before you merge. Not unit tests — integration load tests that prove one route cannot drown another.
The takeaway
The bulkhead pattern is not a library you install. It is a design habit: every shared resource in your service should be partitioned by failure domain, and every partition should have a defined capacity with fast-fail behavior when that capacity is exceeded.
Start with one shared resource that has hurt you — probably the database connection pool. Split it into OLTP and analytics pools. Add a semaphore to your slowest endpoint. Give your CPU-bound tasks a fixed worker pool. Separate your HTTP agents by downstream dependency.
The cost is a few extra lines of configuration and some capacity planning. The benefit is that the next time an endpoint misbehaves, you disable one compartment instead of bailing out the whole ship.
A note from Yojji
Building services that stay responsive when one component degrades is not about adding more hardware — it is about drawing the right boundaries between failure domains so a problem in one place stays there. That kind of architectural discipline is what Yojji’s teams bring to full-cycle product builds, from discovery through production operations.
Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and the kind of resilient backend architecture that keeps services running when the unexpected happens.