The Practical Developer

Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat

A production API that randomly hangs for thirty seconds and then recovers is often a connection pool inching toward exhaustion. Here is how to detect the leak, trace it to the query that never releases, and build a wrapper that prevents it from happening again.

A close-up of network cables plugged into a server rack, the physical metaphor for a saturated connection pool

The API was fine all morning. Then, at 11:47, response times climbed from 40ms to 30 seconds. Nginx started returning 504 Gateway Timeouts. The Node.js processes were not crashed. CPU was low. Memory was flat. The only suspicious metric was pg_stat_activity: 199 active connections out of a pool limit of 200, and the twentieth request in the queue was waiting.

Thirty seconds later the queue cleared and everything recovered. Then it happened again at 12:15. And 12:42. Each time the pool hit its limit, requests piled up until something timed out and released just enough connections for the backlog to drain. The “random” slowdowns were not random. They were the physics of a connection pool with a slow leak.

This post is how to find that leak, why standard error handling is not enough to prevent it, and the small wrapper that turns a silent pool exhaustion into an alert you can fix in five minutes.

Why connection pools leak

A Postgres pool in Node.js (via pg, node-postgres) hands out a client, runs your query, and expects you to call client.release() so the client goes back to the pool. If you forget the release, or if an error throws before you reach it, that client stays checked out forever. Do this ten times on a pool of twenty and you have halved your capacity. Do it twenty times and the pool is dead.

The leak is not always obvious. It can hide behind three common patterns.

Pattern 1: the missing release on a branch

const client = await pool.connect();
const result = await client.query('SELECT * FROM orders WHERE id = $1', [id]);
if (result.rows.length === 0) {
  throw new NotFoundError();
}
client.release();
return result.rows[0];

Looks safe. But if result.rows.length is zero, the function throws before client.release(). The client is orphaned. Every 404 on this endpoint burns one pool slot permanently.

Pattern 2: the forgotten release in a callback

pool.connect((err, client, release) => {
  if (err) return;
  client.query('SELECT * FROM users', (err, result) => {
    if (err) {
      // forgot release();
      return;
    }
    release();
    handle(result);
  });
});

Callback-style code makes the release path easy to miss. Every error branch that omits release() is a leak.

Pattern 3: the long transaction that never commits

const client = await pool.connect();
await client.query('BEGIN');
await client.query('UPDATE inventory SET count = count - 1 WHERE sku = $1', [sku]);
// an unhandled exception here leaves the transaction open and the client checked out
await client.query('COMMIT');
client.release();

If anything between BEGIN and COMMIT throws, the client is stuck in an idle-in-transaction state and never returns to the pool. On Postgres 14+, idle_in_transaction_session_timeout will eventually kill it, but the default is off. Until then, the slot is burned.

Detection: watch the pool, not the query

The pg pool exposes metrics you should be treating as first-class health data. The most important ones are totalCount, idleCount, and waitingCount.

import { Pool } from 'pg';

const pool = new Pool({
  host: process.env.PGHOST,
  port: Number(process.env.PGPORT),
  user: process.env.PGUSER,
  password: process.env.PGPASSWORD,
  database: process.env.PGDATABASE,
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

setInterval(() => {
  console.log({
    total: pool.totalCount,
    idle: pool.idleCount,
    waiting: pool.waitingCount,
  });
}, 5000);

A healthy pool under moderate load usually shows idle > 0 and waiting = 0. The leak signature is idle = 0, total = max, and waiting > 0. If you see that pattern, something is checked out and not coming back.

For structured logging, export these as Prometheus metrics or DataDog gauges:

function emitPoolMetrics(pool: Pool) {
  const total = pool.totalCount;
  const idle = pool.idleCount;
  const waiting = pool.waitingCount;
  const inUse = total - idle;

  metrics.gauge('db.pool.connections.total', total);
  metrics.gauge('db.pool.connections.in_use', inUse);
  metrics.gauge('db.pool.connections.waiting', waiting);

  if (waiting > 0 && idle === 0) {
    logger.warn({ total, idle, waiting }, 'pool saturation detected');
  }
}

Alert on db.pool.connections.waiting > 0 for more than two consecutive scrapes. By the time users notice the slowdown, the pool has been saturated for a while.

Tracing a leak to a single query

Once you know the pool is leaking, the next question is where. The fastest way to find out is a small wrapper that logs the stack trace of every acquisition that does not release within a deadline.

import { Pool, PoolClient } from 'pg';

function createInstrumentedPool(config: ConstructorParameters<typeof Pool>[0]) {
  const pool = new Pool(config);
  const activeAcquisitions = new Map<PoolClient, { acquiredAt: number; stack: string }>();

  const originalConnect = pool.connect.bind(pool);

  pool.connect = async (...args: Parameters<typeof originalConnect>) => {
    const client = await originalConnect(...args);
    const stack = new Error().stack ?? '';
    activeAcquisitions.set(client, { acquiredAt: Date.now(), stack });

    const originalRelease = client.release.bind(client);
    client.release = (...releaseArgs: Parameters<typeof originalRelease>) => {
      activeAcquisitions.delete(client);
      return originalRelease(...releaseArgs);
    };

    return client;
  };

  setInterval(() => {
    const now = Date.now();
    for (const [client, info] of activeAcquisitions) {
      const heldMs = now - info.acquiredAt;
      if (heldMs > 5000) {
        console.warn(`Client held for ${heldMs}ms (possible leak). Stack:\n${info.stack}`);
      }
    }
  }, 5000);

  return pool;
}

This monkey-patches pool.connect to record the acquisition time and the stack trace. When client.release() is called, the record is removed. Every five seconds, the checker prints any client still held for more than five seconds. The stack trace points you to the exact line of code that acquired it.

In production, swap the console.warn for a structured log with the stack attached. That single log line is usually enough to find the leak in under a minute.

The fix: a query helper that cannot leak

The permanent fix is to remove the human from the release path. If your codebase allows direct pool.connect() followed by manual client.release(), leaks are inevitable. Someone will add an early return or a try/catch that omits the release.

A safer pattern is a withClient helper that owns the lifecycle:

import { Pool, PoolClient } from 'pg';

async function withClient<T>(
  pool: Pool,
  fn: (client: PoolClient) => Promise<T>,
): Promise<T> {
  const client = await pool.connect();
  try {
    return await fn(client);
  } finally {
    client.release();
  }
}

Every query path uses it:

export async function getOrderById(pool: Pool, id: string) {
  return withClient(pool, async (client) => {
    const result = await client.query('SELECT * FROM orders WHERE id = $1', [id]);
    if (result.rows.length === 0) {
      throw new NotFoundError(`Order ${id} not found`);
    }
    return result.rows[0];
  });
}

client.release() is guaranteed to run because it lives in finally. An early return, a thrown error, or an async rejection all execute the cleanup. The withClient helper is about fifteen lines and it eliminates the entire class of bugs caused by forgetting to release.

For transactions, extend the helper to handle rollback on error:

async function withTransaction<T>(
  pool: Pool,
  fn: (client: PoolClient) => Promise<T>,
): Promise<T> {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    const result = await fn(client);
    await client.query('COMMIT');
    return result;
  } catch (err) {
    await client.query('ROLLBACK').catch(() => {});
    throw err;
  } finally {
    client.release();
  }
}

Even if fn throws, ROLLBACK runs, then release() runs. The catch on ROLLBACK prevents a double-throw if the connection is already broken.

When you must use pool.query directly

pool.query() is a convenience that acquires, runs a single query, and releases automatically. It is leak-proof for single statements:

const result = await pool.query('SELECT * FROM orders WHERE id = $1', [id]);

The danger is using pool.query() inside a loop or transaction. Never call pool.query() ten times and expect it to use the same connection. Each call gets a random client from the pool. For transactions, always use withTransaction or pool.connect() explicitly.

Setting safety rails on the pool itself

Even with perfect release logic, set timeouts that protect you from edge cases.

const pool = new Pool({
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
  // if a client is checked out longer than 10s, something is wrong
  // note: this is application-level logic, not a built-in pg option
});

On the Postgres side, set idle_in_transaction_session_timeout to kill transactions that sit idle:

ALTER DATABASE app SET idle_in_transaction_session_timeout = '30s';

And set statement_timeout so a single runaway query cannot hold a connection forever:

ALTER DATABASE app SET statement_timeout = '10s';

These are safety nets, not fixes. They will kill leaked transactions eventually, but the pool will still saturate while they wait. Combine them with the withClient helper and the leak detector so you find the root cause instead of treating the symptom.

Putting it together: a leak-resistant repository pattern

Here is a small but complete repository base class that wraps everything above into a pattern your team can copy for every table:

import { Pool, PoolClient, QueryResult } from 'pg';

export class Repository {
  constructor(protected pool: Pool) {}

  protected async query<T>(text: string, values?: unknown[]): Promise<QueryResult<T>> {
    return this.pool.query(text, values);
  }

  protected async withClient<T>(fn: (client: PoolClient) => Promise<T>): Promise<T> {
    const client = await this.pool.connect();
    try {
      return await fn(client);
    } finally {
      client.release();
    }
  }

  protected async withTransaction<T>(fn: (client: PoolClient) => Promise<T>): Promise<T> {
    const client = await this.pool.connect();
    try {
      await client.query('BEGIN');
      const result = await fn(client);
      await client.query('COMMIT');
      return result;
    } catch (err) {
      await client.query('ROLLBACK').catch(() => {});
      throw err;
    } finally {
      client.release();
    }
  }
}

Concrete repositories extend it:

export class OrderRepository extends Repository {
  async findById(id: string) {
    return this.withClient(async (client) => {
      const result = await client.query('SELECT * FROM orders WHERE id = $1', [id]);
      return result.rows[0] ?? null;
    });
  }

  async decrementInventory(sku: string) {
    return this.withTransaction(async (client) => {
      const result = await client.query(
        'UPDATE inventory SET count = count - 1 WHERE sku = $1 RETURNING count',
        [sku],
      );
      const count = result.rows[0]?.count;
      if (count < 0) {
        throw new Error('Inventory would go negative');
      }
      return count;
    });
  }
}

No raw pool.connect() anywhere in application code. No manual release(). The only way to leak a connection is to bypass the repository, which is a clear code-review red flag.

The takeaway

A connection pool leak is not a database bug. It is a resource-management bug that manifests at the database. The symptoms look like random slowdowns, mysterious 504s, and “it fixed itself” recoveries that happen when the queue finally times out.

Fix it in three layers:

  1. Observe it. Export totalCount, idleCount, and waitingCount from the pg pool. Alert when waiting > 0 and idle = 0.
  2. Trace it. Use the instrumented wrapper to capture the stack trace of any acquisition that exceeds a five-second deadline.
  3. Prevent it. Replace every raw pool.connect() with a withClient or withTransaction helper that puts release() in finally. Never let application code own the release path.

The difference between a team that debugs pool leaks in hours and one that debugs them in minutes is whether they shipped the helper before the incident or after it.


A note from Yojji

Production reliability work like pool instrumentation, timeout governance, and safe resource wrappers is not glamorous, but it is what separates a system that survives traffic spikes from one that folds under them. Yojji’s engineering teams build these patterns into the Node.js and Postgres backends they ship for clients, treating connection management, query safety, and observability as first-class architecture concerns rather than afterthoughts.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and the kind of backend resilience that turns a potential 3 AM page into a metric you fixed at 4 PM on a Tuesday.