Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss

Your cache TTL expires. The next request fetches the user profile from Postgres, takes 200ms, and writes it back to Redis. The problem: that profile endpoint just handled a traffic spike, so there were not ten requests in that 200ms window. There were four hundred. Every single one of them saw an empty cache slot, ran the same SELECT, waited on the same row lock, and queued behind the same disk read. The database, which was comfortably under 20% CPU a moment ago, is now at 100%, latency is spiking, and the cache refill that should have been a quiet background event became the incident of the afternoon.

This is not a cache stampede (that is many different keys expiring at once, which this blog has already covered). This is a single-key miss under concurrency, and it is ruthlessly efficient at turning one expensive query into hundreds. The fix is request coalescing, also called the singleflight pattern. One process runs the query. Every other concurrent caller waits for that result and receives it when it is ready. The database sees one query, not four hundred.

This post builds a production-grade singleflight implementation in TypeScript, handles the failure modes most tutorials skip (timeouts, errors, memory leaks), and shows how to wire it into a cache layer without turning your data fetcher into a mess.

The shape of the problem

Here is a minimal cache-aside fetcher that looks reasonable and falls over the moment a hot key expires:

async function getUserProfile(userId: string) {
  const cacheKey = `user:${userId}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
  await redis.setex(cacheKey, 60, JSON.stringify(user));
  return user;
}

Under load, the race looks like this:

Request A: cache miss, starts DB query.
Requests B through Z: cache miss (A has not written back yet), each starts its own DB query.
The database runs the same query dozens or hundreds of times in parallel.
Every result gets written back to Redis, so you also pay the Redis write amplification.

The fix is not “use cache-through instead of cache-aside.” A cache-through store can still issue multiple backend fetches if it does not coalesce internally. The fix is coalescing at the application layer.

A naive coalescer and why it leaks

The first instinct is a Map of in-flight promises:

const inFlight = new Map<string, Promise<unknown>>();

async function getUserProfileNaive(userId: string) {
  const cacheKey = `user:${userId}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  if (inFlight.has(cacheKey)) {
    return inFlight.get(cacheKey)!;
  }

  const promise = db.query('SELECT * FROM users WHERE id = $1', [userId])
    .then(async (user) => {
      await redis.setex(cacheKey, 60, JSON.stringify(user));
      return user;
    });

  inFlight.set(cacheKey, promise);
  return promise;
}

The leak is obvious once you look for it: the promise never leaves the Map. Every key that is ever fetched stays in inFlight forever. After a day of production traffic, that Map contains millions of stale entries and the process is bloating. Worse, on the second cache miss for the same key, inFlight.has(cacheKey) is still true from six hours ago, so the next caller receives a resolved promise that was for the old data, not the fresh query.

The missing piece is cleanup. When the promise settles, remove the key. But even that is not enough, because a slow query that never returns (or hangs until the caller times out) keeps the key in the map indefinitely. You need a TTL on the in-flight entry itself.

The production version

Here is a bounded, self-cleaning singleflight implementation. It uses a Map of AbortController-backed entries, with automatic eviction on settlement, error, or timeout.

interface InFlightEntry<T> {
  promise: Promise<T>;
  controller: AbortController;
  startedAt: number;
}

class Singleflight<T> {
  private inFlight = new Map<string, InFlightEntry<T>>();
  private readonly maxEntries: number;
  private readonly maxAgeMs: number;

  constructor(options: { maxEntries?: number; maxAgeMs?: number } = {}) {
    this.maxEntries = options.maxEntries ?? 10_000;
    this.maxAgeMs = options.maxAgeMs ?? 30_000;
  }

  async do(key: string, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
    const existing = this.inFlight.get(key);
    if (existing) {
      if (Date.now() - existing.startedAt > this.maxAgeMs) {
        existing.controller.abort();
        this.inFlight.delete(key);
      } else {
        return existing.promise;
      }
    }

    const controller = new AbortController();
    const startedAt = Date.now();

    const promise = fn(controller.signal)
      .finally(() => {
        this.inFlight.delete(key);
      })
      .catch((err) => {
        this.inFlight.delete(key);
        throw err;
      });

    if (this.inFlight.size >= this.maxEntries) {
      const firstKey = this.inFlight.keys().next().value;
      if (firstKey) {
        const evicted = this.inFlight.get(firstKey);
        evicted?.controller.abort();
        this.inFlight.delete(firstKey);
      }
    }

    this.inFlight.set(key, { promise, controller, startedAt });
    return promise;
  }
}

The important details:

Cleanup on settlement and error. The .finally removes the key whether the promise resolves or rejects. The .catch rethrows after deleting, so failures do not poison the map.
Max age. If an entry sits unresolved longer than maxAgeMs, the next caller for the same key aborts the stale flight and starts a fresh one. This prevents a hung query from blocking all future cache misses for that key.
Bounded size. If the map hits the limit, the oldest entry is evicted and aborted. In a real system under a cache stampede, the number of distinct in-flight keys can explode; this cap prevents unbounded memory growth.
Abort signal propagation. The caller receives an AbortSignal, so the actual work (the database query, the HTTP fetch, the CPU-intensive computation) can observe cancellation and release resources early.

Wiring it into a cache fetcher

The integration should be invisible to the rest of your application. The fetcher checks cache, then singleflight, then falls back to the real source. The singleflight key must include everything that makes the query unique, not just the cache key.

const sf = new Singleflight<{ id: string; name: string }>({
  maxEntries: 5_000,
  maxAgeMs: 10_000,
});

async function getUserProfile(userId: string) {
  const cacheKey = `user:${userId}`;

  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  return sf.do(cacheKey, async (signal) => {
    const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

    if (signal.aborted) {
      return user;
    }

    await redis.setex(cacheKey, 60, JSON.stringify(user));
    return user;
  });
}

Note the signal.aborted check before the Redis write. If the flight was evicted or aborted while the query was in flight, we still return the user object to the original caller (who is waiting on the promise), but we skip writing a stale result to the cache. The next cache miss will trigger a fresh query. This is a safety valve: it is better to miss the cache write than to cache data that the system has already decided is too old.

The timeout layer most people forget

Singleflight removes duplicate queries, but it does not make the single query faster. If the one query that is allowed through hangs for thirty seconds, every waiter hangs with it. You need a timeout on the coalesced work, not just on the individual HTTP requests.

A simple wrapper:

function withTimeout<T>(ms: number, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
  return new Promise((resolve, reject) => {
    const controller = new AbortController();
    const timer = setTimeout(() => {
      controller.abort();
      reject(new Error(`Timeout after ${ms}ms`));
    }, ms);

    fn(controller.signal)
      .then(resolve, reject)
      .finally(() => clearTimeout(timer));
  });
}

And in the fetcher:

return sf.do(cacheKey, async (signal) => {
  const user = await withTimeout(5_000, async (innerSignal) => {
    const combined = AbortSignal.any([signal, innerSignal]);
    return db.query('SELECT * FROM users WHERE id = $1', [userId], { signal: combined });
  });
  // ...
});

AbortSignal.any is available in Node.js 20+. It fires when either the singleflight eviction signal or the local timeout signal aborts. If you are on an older runtime, combine them manually by adding listeners to both. The point is that the database driver must actually observe the signal and cancel the query. Drivers like pg support cancellation via the signal option; if yours does not, the timeout will drop the promise but the query may still run to completion on the server. In that case, keep your database statement_timeout tight so the server cleans up for you.

What about errors?

If the single query throws, every waiter receives the same rejection. This is usually correct: if the database is down, all concurrent callers for that key should see the error rather than each one retrying independently and amplifying the failure.

But you probably do not want to cache the error. If you are using a cache layer with a “cache negative results” feature, keep the TTL short (one or two seconds) so a transient failure does not block that key for minutes. The singleflight map already handles this correctly: the entry is deleted on rejection, so the next caller will attempt a fresh query immediately.

One subtle bug: if you wrap the singleflight call in a retry loop, the retry loop on every waiter will all retry at the same moment, creating a synchronized retry storm. Move retries inside the singleflight work function, not outside it.

// Bad: every waiter retries together.
return sf.do(key, () => fetchWithRetry(key));

// Good: one fetcher retries, everyone else waits.
return sf.do(key, () => fetchWithRetry(key));

Wait, both lines look identical. The distinction is in the calling code. If each HTTP request handler wraps getUserProfile in its own retry loop, the retries happen outside singleflight. Keep retries at the data-source level, inside the function passed to sf.do.

Cross-process coalescing: do you need it?

The implementation above coalesces within one Node.js process. If you run four containers, a cache miss can still produce four database queries (one per container). For most systems, that is fine. Four queries is not four hundred.

If you genuinely need cross-process coalescing, Redis has a pattern for it: SET lock:user:42 NX EX 5 to elect a leader, LPUSH waiters:user:42 <client_id> for waiters, and BRPOP for blocking. It works, but it adds latency (network round-trips for the locking), complexity, and another failure mode (the leader dies, the waiters hang). In practice, application-level singleflight plus a short cache TTL solves the problem for 99% of teams. Do not build distributed singleflight until you have metrics proving that process-level coalescing is insufficient.

Metrics that prove it is working

Add three metrics so you can verify the behavior in production:

import client from 'prom-client';

const singleflightCoalesced = new client.Counter({
  name: 'singleflight_coalesced_total',
  help: 'Number of requests coalesced into an in-flight query',
  labelNames: ['key_prefix'],
});

const singleflightStarted = new client.Counter({
  name: 'singleflight_started_total',
  help: 'Number of backend queries actually started',
  labelNames: ['key_prefix'],
});

const singleflightDuration = new client.Histogram({
  name: 'singleflight_duration_seconds',
  help: 'Time from request to result for coalesced queries',
  labelNames: ['key_prefix', 'coalesced'],
  buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
});

Emit singleflightCoalesced when a caller joins an existing flight, and singleflightStarted when a new flight begins. The ratio between them is your savings. If you see 50,000 coalesced and 100 started, you just prevented 49,900 redundant queries.

The practical checklist

Before you ship singleflight to production, verify these:

The key is fully deterministic. It must include every parameter that changes the result. user:${userId} is good. user:${userId} when the query also depends on a ?include=orders flag is a bug waiting to happen.
The map is bounded and evicted. Unbounded Map growth is a memory leak. Bound it, abort stale entries, and delete on settlement.
The work function handles abort signals. If the driver or client does not support cancellation, at least set a tight server-side timeout so the database does not accumulate zombie queries.
Errors are not cached by the singleflight map. Delete on rejection so the next caller tries again.
Retries live inside the singleflight work. Retries outside it turn a failure into a synchronized retry storm.
Metrics are in place. You need the ratio of coalesced to started to know whether this is actually helping.

The working code

Here is the complete, copy-pasteable module. It depends on no libraries beyond Node.js built-ins.

// singleflight.ts
export interface SingleflightOptions {
  maxEntries?: number;
  maxAgeMs?: number;
}

interface Entry<T> {
  promise: Promise<T>;
  controller: AbortController;
  startedAt: number;
}

export class Singleflight<T> {
  private inFlight = new Map<string, Entry<T>>();
  private readonly maxEntries: number;
  private readonly maxAgeMs: number;

  constructor(options: SingleflightOptions = {}) {
    this.maxEntries = options.maxEntries ?? 10_000;
    this.maxAgeMs = options.maxAgeMs ?? 30_000;
  }

  async do(key: string, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
    const existing = this.inFlight.get(key);
    if (existing) {
      if (Date.now() - existing.startedAt > this.maxAgeMs) {
        existing.controller.abort();
        this.inFlight.delete(key);
      } else {
        return existing.promise;
      }
    }

    const controller = new AbortController();
    const startedAt = Date.now();

    const promise = fn(controller.signal)
      .finally(() => {
        this.inFlight.delete(key);
      })
      .catch((err) => {
        this.inFlight.delete(key);
        throw err;
      });

    if (this.inFlight.size >= this.maxEntries) {
      const firstKey = this.inFlight.keys().next().value;
      if (firstKey) {
        const evicted = this.inFlight.get(firstKey);
        evicted?.controller.abort();
        this.inFlight.delete(firstKey);
      }
    }

    this.inFlight.set(key, { promise, controller, startedAt });
    return promise;
  }
}

And a minimal cache fetcher that uses it:

import { Singleflight } from './singleflight.js';

const sf = new Singleflight<unknown>({ maxEntries: 5_000, maxAgeMs: 10_000 });

export async function fetchThroughCache<T>(
  cacheKey: string,
  fetcher: (signal: AbortSignal) => Promise<T>,
  ttlSec: number,
): Promise<T> {
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached) as T;

  const result = await sf.do(cacheKey, async (signal) => {
    const data = await fetcher(signal);
    if (!signal.aborted) {
      await redis.setex(cacheKey, ttlSec, JSON.stringify(data));
    }
    return data;
  });

  return result;
}

That is it. One bounded map, one promise per key, and a database that sees one query instead of a stampede every time the cache hiccups.

A note from Yojji

The kind of backend performance work that turns a routine cache miss into a non-event — request coalescing, bounded in-flight maps, and careful abort propagation — is exactly the kind of infrastructure detail Yojji’s teams build into the systems they ship for clients.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and full-cycle product engineering — including the caching and data-layer patterns that keep backends stable when traffic patterns change.