Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works

You run a load test against your API. CPU is at 40%, memory is flat, database p95 is under 10 ms. Everything looks healthy. Then the error rate jumps: ConnectTimeoutError, UND_ERR_CONNECT_TIMEOUT, or the generic socket hang up. You scale the API pods from four to eight. The spikes shrink but do not disappear. You scale to twelve. Same pattern, higher bill.

The downstream service you are calling is fast. It handles 10,000 RPS in its own load tests. The problem is not the service. The problem is that your Node.js client opens, negotiates, and closes a TCP connection for every single request — or it exhausts a small default pool and queues requests behind a gate that has nothing to do with your business logic.

This post shows how Node.js manages HTTP connections, how to read the real signals, and the two config lines that fix most pool-related latency spikes.

Where the default behavior hurts you

Node.js 18+ ships global.fetch powered by undici, which uses a connection pool under the hood. Before that, most production code used node:http, axios, or node-fetch with an http.Agent. In every case, there is a pool: a set of reusable TCP connections to the same origin.

The defaults are tuned for browsers, not servers:

undici’s default connections per origin: 6
axios with the default agent: Infinity (unbounded, which is its own disaster)
http.globalAgent.maxSockets: Infinity in older Node, capped behavior varies
undici’s default keepAliveTimeout: 4 seconds

Six connections per origin sounds fine until your service is a microservice that makes three downstream calls per request, each to a different origin, and you have twelve workers per pod. Under moderate load, eighteen requests hit the same origin simultaneously. Six grab a connection, twelve wait in a FIFO queue. The queue time shows up as latency, not CPU load, so your dashboards lie to you.

The Infinity case is worse. Every concurrent request opens its own TCP connection. Eventually you hit the local port range limit, the ephemeral port table fills with TIME_WAIT sockets, and new connections fail with ECONNREFUSED or EADDRNOTAVAIL even though the target is healthy.

Neither default is right for a backend. You need a bounded pool, sized to your concurrency, with keep-alive tuned to your infrastructure.

Measuring before you fix

Do not guess at pool size. Measure concurrent connections from the client side and active sockets on the host.

From inside the Node process, undici exposes pool stats if you use a custom dispatcher:

import { Agent, Pool } from 'undici';

const agent = new Agent({
  connections: 64,
  keepAliveTimeout: 30_000,
});

setInterval(() => {
  const stats = agent.getPoolStats('https://billing.internal');
  console.log(JSON.stringify({
    origin: 'https://billing.internal',
    connected: stats.connected,
    free: stats.free,
    pending: stats.pending,
    queued: stats.queued,
    running: stats.running,
  }));
}, 10_000).unref();

Watch pending and queued. pending means a TCP handshake is in flight. queued means a request is waiting for a free connection. If queued is consistently above zero under load, your pool is too small for your concurrency.

If you cannot instrument undici directly, fall back to operating-system metrics. You want the number of sockets in ESTABLISHED or TIME_WAIT to the downstream IP:

# Count sockets to a specific downstream by state
ss -tan state established dst 10.0.4.17 | wc -l
ss -tan state time-wait dst 10.0.4.17 | wc -l

If time-wait is in the tens of thousands, you are opening too many connections and not reusing them. If established flatlines at a suspicious round number like 6 or 12, you are likely hitting the default pool cap.

The fix: configure undici or the agent

If you are on Node 18+ and using global.fetch, the cleanest fix is a custom undici.Agent registered as the global dispatcher. This replaces the implicit default for every fetch call in the process.

import { Agent, setGlobalDispatcher } from 'undici';

const agent = new Agent({
  connections: 128,
  keepAliveTimeout: 30_000,
  keepAliveMaxTimeout: 30_000,
  connect: {
    timeout: 5_000,
    rejectUnauthorized: false, // only for internal mTLS you terminate elsewhere
  },
});

setGlobalDispatcher(agent);

connections: 128 means 128 sockets per origin. Tune this to your peak concurrency per origin, not some abstract multiple. A good starting point: (expected concurrent requests to this origin) × 1.5. If your API handles 200 concurrent requests and each calls the downstream once, 200 × 1.5 = 300. If the downstream is called multiple times per request, multiply accordingly.

keepAliveTimeout: 30_000 keeps idle sockets open for 30 seconds. The default 4 seconds is tuned for browsers where users change pages and tabs constantly. In a server process talking to a fixed set of upstreams, reconnecting every 4 seconds is pure waste.

If you are still on axios or raw http.request, pass an explicit agent:

import http from 'node:http';
import https from 'node:https';
import axios from 'axios';

const agent = new https.Agent({
  maxSockets: 128,
  maxFreeSockets: 128,
  keepAlive: true,
  keepAliveMsecs: 30_000,
  timeout: 5_000,
});

const client = axios.create({
  baseURL: 'https://billing.internal',
  httpAgent: agent,
  httpsAgent: agent,
  timeout: 5_000,
});

The key fields:

maxSockets: upper bound on concurrent connections per origin. The default Infinity is dangerous in a server.
maxFreeSockets: how many idle sockets to keep open. Set it equal to maxSockets so you do not throw away warm connections.
keepAlive: without this, every request is a new TCP handshake.
timeout: total socket timeout, distinct from the HTTP-level request timeout.

When pooling is not enough: TIME_WAIT and ephemeral ports

Even with a correctly sized pool, you can still exhaust ports if connections churn faster than the OS cleans them up. TCP requires the side that closes first to hold the socket in TIME_WAIT for twice the maximum segment lifetime — typically 60 seconds. A socket in TIME_WAIT still occupies an ephemeral port.

The default ephemeral port range on Linux is roughly 32,768–61,000, giving about 28,000 ports. If your process opens and closes 500 connections per second, you burn through 30,000 ports in a minute. New connections fail even though the downstream has capacity.

Fixes, in order of preference:

Reuse connections. A warm pool with keep-alive should rarely close sockets. If you see high TIME_WAIT, check whether your upstream is sending Connection: close or whether your own keepAliveTimeout is too short.
Enable net.ipv4.tcp_tw_reuse. This lets the kernel reuse TIME_WAIT sockets for outgoing connections when the timestamp is safe. It is safe on modern kernels and does not break TCP semantics. Do not use tcp_tw_recycle; it was removed for a reason.

sysctl -w net.ipv4.tcp_tw_reuse=1

Increase the ephemeral port range. Only if the above is not enough:

sysctl -w net.ipv4.ip_local_port_range="15000 65000"

Run a connection proxy. If you have thousands of short-lived connections, consider a local socks or HTTP proxy that multiplexes, or switch to HTTP/2 where a single TCP connection carries many streams. Node.js undici supports HTTP/2 with the allowH2 flag.

Do not forget DNS caching

Connection pooling helps only if the same origin string maps to the same pool. If you resolve a domain to a different IP on every request, the pool is effectively fragmented. Node.js does not cache DNS by default. Every fetch to https://billing.internal may trigger a getaddrinfo call.

Under load, DNS lookups become a hidden bottleneck. Either run a local resolver like systemd-resolved or dnsmasq on the host, or cache lookups in the process:

import dns from 'node:dns';
import { promisify } from 'node:util';

const lookupCache = new Map();
const dnsLookup = promisify(dns.lookup);

async function cachedLookup(hostname, options) {
  const key = `${hostname}:${options?.family ?? 0}`;
  if (lookupCache.has(key)) {
    const { address, family, ttl } = lookupCache.get(key);
    if (Date.now() < ttl) return { address, family };
  }
  const result = await dnsLookup(hostname, options);
  lookupCache.set(key, {
    address: result.address,
    family: result.family,
    ttl: Date.now() + 60_000,
  });
  return result;
}

Pass this to undici via the connect option:

const agent = new Agent({
  connections: 128,
  keepAliveTimeout: 30_000,
  connect: {
    lookup: cachedLookup,
    timeout: 5_000,
  },
});

This removes DNS latency from the hot path and ensures connections to the same hostname reuse the same TCP sockets regardless of whether your resolver is slow or fast.

Putting it together: a production-ready fetch wrapper

Here is a small module you can drop into a service. It wires pool sizing, DNS caching, keep-alive, and reasonable defaults.

// lib/httpClient.js
import { Agent, setGlobalDispatcher } from 'undici';
import dns from 'node:dns';
import { promisify } from 'node:util';

const dnsLookup = promisify(dns.lookup);
const dnsCache = new Map();

function makeCachedLookup(ttlMs = 60_000) {
  return async function cachedLookup(hostname, options) {
    const key = `${hostname}:${options?.family ?? 4}`;
    const cached = dnsCache.get(key);
    if (cached && Date.now() < cached.expiry) {
      return { address: cached.address, family: cached.family };
    }
    const { address, family } = await dnsLookup(hostname, options);
    dnsCache.set(key, { address, family, expiry: Date.now() + ttlMs });
    return { address, family };
  };
}

export function configureHttpClient(opts = {}) {
  const agent = new Agent({
    connections: opts.connections ?? 128,
    keepAliveTimeout: opts.keepAliveTimeout ?? 30_000,
    keepAliveMaxTimeout: opts.keepAliveMaxTimeout ?? 30_000,
    connect: {
      lookup: makeCachedLookup(opts.dnsTtlMs),
      timeout: opts.connectTimeout ?? 5_000,
    },
  });

  setGlobalDispatcher(agent);

  return {
    getPoolStats(origin) {
      try {
        return agent.getPoolStats(origin);
      } catch {
        return null;
      }
    },
  };
}

Initialize it once at startup:

import { configureHttpClient } from './lib/httpClient.js';

const metrics = configureHttpClient({ connections: 256 });

setInterval(() => {
  const stats = metrics.getPoolStats('https://billing.internal');
  if (stats) {
    console.log('billing_pool_queued', stats.queued);
    console.log('billing_pool_running', stats.running);
  }
}, 15_000).unref();

After this, fetch() anywhere in your code uses the tuned pool. No wrapper needed for every call. No accidental new Agent() in a helper library that erases your config.

Practical takeaway

Connection pool misconfiguration is one of those problems that looks like anything else. Your dashboards show CPU, memory, and database time, but they rarely export http_client_queued_requests. You chase the wrong metric, scale the wrong tier, and spend money on pods that are just waiting for a socket.

The fix is three steps:

Measure. Export pool stats or use ss to find queued requests and TIME_WAIT sockets.
Size the pool. Set connections (or maxSockets) to your peak concurrency per origin, not the default.
Keep connections alive. Set keep-alive to at least 30 seconds, cache DNS, and monitor queued as a first-class metric.

Throwing pods at a pool bottleneck is like adding lanes to a highway that ends in a toll booth with one gate. Fix the gate.

A note from Yojji

The gap between “it works on my machine” and “it stays up under production concurrency” is often not in the business logic — it is in the plumbing: connection pools, DNS caching, and kernel tuning. Yojji’s engineering teams handle these details as a matter of course when they build and scale backends for clients, whether that means Node.js microservices, cloud-native APIs, or infrastructure that does not fall over when traffic doubles.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and the kind of backend reliability engineering that keeps services responsive when the load test becomes real traffic.