PostgreSQL Connection Timeouts in Node.js: The Four Timers That Prevent Production Outages

The production pager went off at 3:14 AM. All API endpoints were returning 502 Bad Gateway. The Node.js processes were still running. The Postgres server was up and accepting connections. But every query was hanging until the load balancer’s 30-second timeout kicked in, which caused a cascading retry storm that kept the connection pool saturated for twelve minutes after the root cause was resolved.

The root cause was not a slow query. It was not an infrastructure failure. It was a DNS resolution hiccup on the database hostname that lasted four seconds. The connection pool had no timeout configured for establishing connections, so every request that tried to grab a fresh connection waited forever on a TCP SYN packet that was never going to complete. By the time the DNS cache refreshed, every request in flight had been retried five times, and the pool was full of orphaned connection attempts that had to be garbage-collected by the OS.

This is the silent failure mode of misconfigured database connection timeouts. It does not throw a clear error. It just queues work behind a connection that is never going to complete. And because it looks like a full system outage from the client’s perspective, every automated retry cycle makes it worse.

Here are the four timers that prevent this scenario, how to set each one with node-postgres, and the failure modes they protect against.

Timer 1: connectionTimeoutMillis

This is the maximum time the client will wait for a TCP connection to the Postgres server to be established. In node-postgres (the pg package), this defaults to 10_000 (10 seconds). That is too long for most production applications.

When a connection attempt times out, the Pool throws an ETIMEDOUT error and the caller gets a clean rejection. Without this timeout, the call stack freezes while the OS waits for a SYN-ACK from a server that may have restarted, changed IP, or become unreachable due to a firewall rule change.

const { Pool } = require('pg');

const pool = new Pool({
  host: 'db.production.internal',
  port: 5432,
  user: 'app',
  database: 'appdb',
  connectionTimeoutMillis: 3000, // 3 seconds, not 10
});

Set this to 3 seconds for internal networks (same VPC, sub-millisecond latency) and 5 seconds for cross-region or VPN connections. A connection that takes longer than 3 seconds inside a data center is not going to succeed if you wait longer. It is going to eat a connection slot and amplify the problem.

The exception is serverless environments like AWS Lambda with RDS Proxy. Cold starts can trigger connection setup times of 5-8 seconds, so a 10-second timeout is appropriate there. But for a persistent Node.js server, 3 seconds is the ceiling.

Timer 2: idleTimeoutMillis

Postgres connections in the pool are reused across requests. When a request finishes, the connection returns to the pool and waits for the next caller. idleTimeoutMillis controls how long an idle connection stays in the pool before the pool closes it.

The default in pg is 10_000 (10 seconds). That is too short for most workloads.

const pool = new Pool({
  connectionTimeoutMillis: 3000,
  idleTimeoutMillis: 30000, // 30 seconds
  max: 20,
});

Why 30 seconds? A connection that is 10 seconds old is still warm (TCP state established, Postgres backend ready, prepared statements possibly cached). Closing it after 10 seconds means the next request spikes the pool to max because every connection was torn down. Then Postgres has to fork 20 new backends under load, which spikes CPU and latency.

The tradeoff is that idle connections consume Postgres server resources. Each idle backend takes about 5-10 MB of memory on the server side. With a max of 20, that is 100-200 MB for the idle pool. For most applications that is negligible. If you are running on a tiny Postgres instance (1 GB RAM) with hundreds of connections, lower the idle timeout to 10 seconds, but do not go lower than that.

The real risk with idleTimeoutMillis is setting it to 0, which disables idle connection recycling. With idleTimeoutMillis: 0, a connection that is checked back in stays in the pool forever. Over hours or days, TCP middleboxes (NAT gateways, load balancers, proxies) silently drop these idle connections. The pool thinks they are healthy. The first query on a stale connection gets a read ECONNRESET. The pool then retries the query on a fresh connection, adding 100-200 ms of latency to that one unlucky request. This is the most common intermittent timeout bug in production Node.js apps.

Always set a non-zero idle timeout. 30 seconds is a good starting point.

Timer 3: statement_timeout (Postgres-side)

Everything above controls the client-side behavior. This one controls the server side. statement_timeout is a Postgres configuration parameter that kills any query that runs longer than the specified number of milliseconds.

This is the most important timeout for preventing cascading failures. Without it, a single slow query can hold a connection for minutes. While it holds the connection, that connection is unavailable for other requests. The pool shrinks. Other requests queue up. Latency climbs. Queue depth grows. Memory usage rises. Eventually the process runs out of memory or the load balancer starts returning 502s.

Set it per-session or globally:

-- In postgresql.conf or ALTER SYSTEM
statement_timeout = '30s';

-- Or per-connection in the pool config
ALTER DATABASE appdb SET statement_timeout = '30000';

Or set it in the Node.js pool initialization by executing a startup query:

const pool = new Pool({
  connectionTimeoutMillis: 3000,
  idleTimeoutMillis: 30000,
  max: 20,
});

// Run this once on startup to set the default for all connections
pool.on('connect', (client) => {
  client.query("SET statement_timeout = '30s'");
});

30 seconds is a reasonable default for most web APIs. Adjust based on your slowest legitimate query. If you have a reporting endpoint that legitimately runs for 60 seconds, do not raise the global timeout. Use SET LOCAL statement_timeout inside that specific transaction:

async function runReport() {
  const client = await pool.connect();
  try {
    await client.query("SET LOCAL statement_timeout = '120s'");
    const result = await client.query(reportQuery);
    return result.rows;
  } finally {
    client.release();
  }
}

The key insight: statement_timeout protects the database from runaway queries. It is not a replacement for query optimization. If you are hitting it regularly, the fix is not to raise the limit. The fix is to EXPLAIN ANALYZE the query and add the missing index.

The lock_timeout companion

There is a related Postgres parameter you should set alongside statement_timeout: lock_timeout. This controls how long a query waits to acquire a lock before it is killed. Without it, a query blocked by ACCESS EXCLUSIVE from a ALTER TABLE or VACUUM FULL will sit on the connection until statement_timeout fires, even though it is doing zero work.

ALTER DATABASE appdb SET lock_timeout = '10s';

This prevents the common scenario where a migration or maintenance operation on one table blocks queries on unrelated tables (via relation extension locks) for minutes at a time.

Timer 4: TCP keepalives

TCP keepalives are the OS-level watchdog for dead connections. When a connection goes silent (server crashes without closing the socket, network partition), the keepalive mechanism detects it and closes the local socket, which triggers a clean error in the pool.

Postgres and node-postgres both support keepalive configuration, but the defaults are notoriously conservative. Linux defaults to 7200 seconds (2 hours) before the first keepalive probe. That means a dead connection stays in the pool for 2 hours before anyone notices.

const pool = new Pool({
  connectionTimeoutMillis: 3000,
  idleTimeoutMillis: 30000,
  max: 20,
  keepAlive: true,
  keepAliveInitialDelayMillis: 10000, // 10 seconds
});

This tells the OS to start sending keepalive probes 10 seconds after the last data was exchanged on the connection. If the server is unreachable, the OS will detect it within 20-30 seconds (three probes at 10-second intervals) and close the socket.

Without keepalives, a connection to a server that has been hard-restarted or had its network interface taken down will appear healthy to the pool until a query is attempted on it. The query then hangs for the connectionTimeoutMillis duration before the pool retries. During that hang, the connection is occupied and cannot serve other requests.

Set keepAlive: true and keepAliveInitialDelayMillis: 10000 on every pool. It costs nothing in normal operation and saves you from silent connection death.

Putting it all together: the production pool config

Here is the complete pool configuration that covers all four timers, with annotations for each setting:

const { Pool } = require('pg');

const pool = new Pool({
  // Connection
  host: process.env.PGHOST || 'localhost',
  port: parseInt(process.env.PGPORT || '5432', 10),
  user: process.env.PGUSER || 'app',
  password: process.env.PGPASSWORD,
  database: process.env.PGDATABASE || 'appdb',

  // Timer 1: How long to wait for a TCP connection
  connectionTimeoutMillis: 3000,

  // Timer 2: How long an idle connection stays in the pool
  idleTimeoutMillis: 30000,

  // Timer 4: TCP keepalive detection
  keepAlive: true,
  keepAliveInitialDelayMillis: 10000,

  // Pool sizing
  max: 20,
  min: 2,            // Keep 2 connections warm at all times

  // Reject queries after the pool is saturated
  maxUses: 7500,     // Recycle connections after 7500 uses
});

// Timer 3: statement_timeout set per-connection
pool.on('connect', (client) => {
  client.query("SET statement_timeout = '30s'");
  client.query("SET lock_timeout = '10s'");
});

// Log pool errors that don't have a specific query context
pool.on('error', (err) => {
  console.error('Unexpected pool error:', err);
});

This config gives you:

Fail fast on connection failure (3 seconds instead of 10 or infinite)
Warm pool with bounded churn (30-second idle timeout, 2 minimum connections)
Kill runaway queries (30-second statement timeout)
Detect dead connections (10-second keepalive probes)
Recycle leaky connections (maxUses prevents prepared statement plan cache bloat)

Testing your timeouts

A pool config is only useful if you verify the timeouts actually fire when expected. Here is a test that exercises each one:

const { Pool } = require('pg');
const assert = require('assert');

async function testTimeouts() {
  // Test 1: connection timeout to a non-routable IP
  const timeoutPool = new Pool({
    host: '10.255.255.1', // non-routable
    port: 5432,
    connectionTimeoutMillis: 2000,
    idleTimeoutMillis: 1000,
  });

  const start = Date.now();
  try {
    await timeoutPool.query('SELECT 1');
    assert.fail('Should have thrown');
  } catch (err) {
    const elapsed = Date.now() - start;
    assert.ok(elapsed >= 1900 && elapsed <= 4000,
      `Connection timeout should fire around 2s, took ${elapsed}ms`);
    console.log(`PASS: connection timeout in ${elapsed}ms`);
  }
  await timeoutPool.end();

  // Test 2: statement timeout with a slow query
  const statementPool = new Pool({
    host: process.env.PGHOST || 'localhost',
    database: 'testdb',
    connectionTimeoutMillis: 3000,
    idleTimeoutMillis: 1000,
  });

  await statementPool.query("SET statement_timeout = '500ms'");

  const start2 = Date.now();
  try {
    await statementPool.query('SELECT pg_sleep(10)');
    assert.fail('Should have thrown');
  } catch (err) {
    const elapsed = Date.now() - start2;
    assert.ok(elapsed >= 400 && elapsed <= 2000,
      `Statement timeout should fire around 500ms, took ${elapsed}ms`);
    console.log(`PASS: statement timeout in ${elapsed}ms`);
  }
  await statementPool.end();

  console.log('All timeout tests passed');
}

testTimeouts().catch(console.error);

Run this against a local Postgres instance or in your CI pipeline. If any timeout takes significantly longer than expected, your infrastructure layer (Docker networking, VPN, cloud firewall) may be swallowing timeout signals. That is important information to have before a production incident, not during one.

What happens when you get it wrong

Every timeout in this post maps to a real production failure mode I have seen or debugged:

Missing/Incorrect Setting	Failure Mode
No `connectionTimeoutMillis`	Brief DNS hiccup freezes all connection attempts. Pool saturates with waiting callers. API returns 502s for minutes after DNS recovers.
No `idleTimeoutMillis` (set to 0)	Idle connections quietly die inside NAT gateways. First query on a stale connection gets `ECONNRESET`. Intermittent 200ms latency spikes.
No `statement_timeout`	One missing index causes a sequential scan that takes 45 seconds. Every connection eventually runs the slow query. Pool locks up. Whole API goes down.
No TCP keepalives	Server hard-restarts. Pool thinks all connections are healthy. First query after restart waits 120 seconds for OS to detect dead socket. Every request queues.

The common thread in every case is that the application did not fail fast. It waited. And waiting turned a 4-second blip into a 12-minute recovery.

The takeaway

Database connection timeout configuration is not a set-and-forget detail. It is the primary defense against cascading failures in any application that depends on Postgres. The defaults in pg are safe for a development environment but unsuitable for production. They tolerate too much uncertainty before failing, and that tolerance amplifies small infrastructure problems into system-wide outages.

Set all four timers. Test them in CI. Monitor pool.waitingCount in production (the number of queries waiting for a connection). If waitingCount ever exceeds zero during normal traffic, your pool is too small or your timeouts are too long.

The goal is not to prevent failures. The goal is to make failures fast, visible, and isolated. A query that times out in 3 seconds and returns a 503 to the caller is better than a query that hangs for 30 seconds, consumes a connection, and blocks 19 other requests behind it.

Fail fast, recover faster.

A note from Yojji

Getting the infrastructure-level details right — connection timeouts, pool sizing, keepalive intervals — is the kind of backend engineering that does not show up in feature demos but determines whether your application survives its first production incident. Yojji’s teams build Node.js and Postgres systems for startups and enterprises alike, and they treat these configuration details as first-class design decisions, not afterthoughts. Their senior engineers bring the same discipline to timeouts, connection management, and database operations that they bring to the application logic itself.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. They offer full-cycle product development and dedicated team augmentation, specializing in the JavaScript ecosystem, AWS/Azure/GCP infrastructure, and microservices architectures.