The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load

The /export endpoint had been fast for months. A background worker read 200MB log files, compressed them with zlib.createGzip, and uploaded them to S3. It was entirely async: fs.createReadStream, pipeline, crypto.createHash for checksums. Nothing blocked the event loop. Then the traffic team doubled the number of export jobs, and everything went sideways.

Health checks stayed green. Event loop lag was under 2 ms. CPU sat at 15%. But p99 latency for the export endpoint jumped from 800 ms to 5.2 seconds. Worse, every other endpoint on the same service started sporadically timing out. fs.readFile calls that normally resolved in 5 ms were taking 400 ms. bcrypt.hash that should finish in 100 ms stretched past a second. There were no errors in the logs, no memory spikes, and no blocked event loop warnings. The service just felt sluggish in a way that did not make sense.

The culprit was the libuv thread pool, and here is why it is invisible to almost every monitoring stack.

What the libuv thread pool actually does

Node.js is single-threaded in JavaScript, but it is not single-threaded in C++. The event loop runs in the main thread, and the libuv library manages a separate pool of threads for work that cannot complete synchronously without blocking. Tasks that use blocking system calls (file system I/O, DNS resolution on older Node versions, some crypto operations, compression) are handed to this pool. When a thread finishes, it queues the callback back onto the event loop.

The default size of this pool is four threads.

That might have been generous in 2009. In 2026, four threads is a bottleneck waiting to happen. Every fs.readFile, fs.writeFile, crypto.pbkdf2, bcrypt.hash, zlib.deflate, and dns.lookup (on Node 18, and some paths on Node 20) that is active at the same time occupies one of those four threads. When all four are busy, the fifth request queues. The sixth queues behind it. They will wait, quietly, for a thread to free up, even though every single call is technically “async.”

This is not an event loop block. The event loop keeps ticking. Timers fire. HTTP requests parse. But the callbacks for any thread-pool task just sit in the worker queue until a thread becomes available. The symptoms are maddening because every metric you normally watch looks fine.

The APIs that quietly consume threads

Not every async Node.js API uses the thread pool. Native I/O that uses epoll or kqueues (network sockets, timers, signals) stays on the event loop via the kernel. But a surprising amount of common work drops into the pool:

API family	Examples
File system	`fs.readFile`, `fs.writeFile`, `fs.stat`, `fs.access`, `fs.mkdir`, `fs.copyFile`
Crypto	`crypto.pbkdf2`, `crypto.scrypt`, `bcrypt.hash`, `crypto.randomFill`
Compression	`zlib.deflate`, `zlib.inflate`, `zlib.gzip`, `zlib.brotliCompress`
DNS	`dns.lookup` (Node 18, some paths on Node 20)

The dns.lookup case is worth calling out. Before Node.js 20, it always ran on the thread pool. If your service opens thousands of HTTP connections to different hosts, every connection triggers a DNS lookup that consumes one of those four threads. We covered DNS caching in another post, but the thread pool angle is the reason lookups can stall even when the DNS server is fast.

fs calls are the silent killers in practice. They hit the pool because POSIX file system calls do not have async equivalents in the kernel (io_uring is changing this on Linux, but Node.js support is still emerging). Every file read is a blocking system call wrapped in a C++ wrapper and farmed to a thread.

Reproducing the stall in 30 lines

Here is a script that demonstrates the behavior without any external dependencies. It fires eight concurrent fs.readFile calls against the same file. With a four-thread pool, only four run in parallel. The rest queue.

import fs from 'node:fs';
import { performance } from 'node:perf_hooks';

const FILE = './test-file.txt';

// Create a 1MB file to ensure the reads take enough time to overlap
fs.writeFileSync(FILE, 'x'.repeat(1024 * 1024));

async function measureRead(id) {
  const start = performance.now();
  await fs.promises.readFile(FILE);
  const duration = performance.now() - start;
  console.log(`read ${id}: ${duration.toFixed(1)}ms`);
}

async function run() {
  const start = performance.now();
  await Promise.all(Array.from({ length: 8 }, (_, i) => measureRead(i)));
  console.log(`total wall time: ${(performance.now() - start).toFixed(1)}ms`);
}

run();

On a typical VM, the output looks like this:

read 0: 12.3ms
read 1: 12.1ms
read 2: 12.5ms
read 3: 12.4ms
read 4: 24.1ms
read 5: 24.3ms
read 6: 24.2ms
read 7: 24.5ms
total wall time: 24.8ms

The first four finish in parallel. The next four wait for a thread. The wall time is roughly 2x a single read because there are only four workers. Now imagine those reads are 50MB log files, or the pool is also occupied by gzip and bcrypt tasks. Eight concurrent exports can turn into a 20-second pipeline even though each file operation is “non-blocking.”

The silent symptom checklist

Because the event loop stays unblocked, most monitoring completely misses this. Look for these signals instead:

Latency spikes that do not correlate with CPU, memory, or event loop lag. The event loop is idle, but callbacks from fs or crypto arrive late.
Timeouts on health checks or outgoing requests that use fs or crypto. The task itself is fast, but it waited in the thread pool queue.
fs or crypto operations that get slower as concurrency rises, even on SSDs. Disk I/O is not the bottleneck. Thread availability is.
Metrics that show libuv metrics are unavailable, because you are not tracking them. Node.js does not expose thread pool queue depth by default.
A sudden fix when you switch to in-memory caches or streams, but you do not know why. You removed file system pressure from the pool.

How to actually measure it

Node.js exposes async_hooks and perf_hooks to observe thread pool behavior. The most direct diagnostic is measuring the duration from when an operation is requested to when its callback fires. If fs.readFile takes 400 ms wall-clock but only 5 ms of actual disk time, the difference is queue wait.

Here is a minimal instrumentation that patches the fs module to emit timing:

import fs from 'node:fs';
import { performance } from 'node:perf_hooks';

const originals = {
  readFile: fs.readFile.bind(fs),
  writeFile: fs.writeFile.bind(fs),
};

function instrument(name, fn) {
  return function (...args) {
    const start = performance.now();
    const cb = args[args.length - 1];

    if (typeof cb !== 'function') {
      return fn(...args); // promise variant, skip for brevity
    }

    args[args.length - 1] = function (err, result) {
      const duration = performance.now() - start;
      console.log(JSON.stringify({
        event: 'fs_timing',
        op: name,
        durationMs: Math.round(duration * 100) / 100,
        timestamp: new Date().toISOString(),
      }));
      cb(err, result);
    };

    return fn(...args);
  };
}

fs.readFile = instrument('readFile', originals.readFile);
fs.writeFile = instrument('writeFile', originals.writeFile);

For a production-grade version, hook into perf_hooks directly. Node.js emits fs performance entries that include start and end timestamps for the underlying work, which is the closest proxy to actual work time. Compare it to wall time. A large gap between wall and work means queue contention.

You can also use trace_event profiling with --trace-event-categories node.async_hooks,node.perf and inspect the trace in Chrome DevTools. Look for long gaps between init and before hooks on thread-pool operations.

Fix 1: raise the thread pool size (carefully)

The blunt fix is UV_THREADPOOL_SIZE. It controls the number of threads libuv spawns at startup. The maximum is 1024, but you almost never want that.

export UV_THREADPOOL_SIZE=16
node server.js

Or in a Dockerfile:

ENV UV_THREADPOOL_SIZE=16
CMD ["node", "server.js"]

More threads mean more concurrent file system, crypto, and DNS operations. The trade-off is memory and context-switching cost. Each thread consumes a small amount of RSS (typically 1-2MB for the stack, plus whatever work it holds). Going from 4 to 16 is usually safe. Going to 128 on a 512MB container is not, unless you know the workload is thread-pool bound and you have the memory.

The right size depends on your workload profile. If your service does occasional bcrypt hashing and file reads, 8-12 is usually enough. If you are running a log-processing pipeline with heavy zlib and concurrent large readFile calls, you might need 32 or more.

Do not guess. Measure queue depth (via the wall-vs-work timing above), pick a size that flattens latency without ballooning RSS, and cap it.

Fix 2: limit concurrency at the application layer

Another approach is to stop overloading the pool in the first place. If your service runs eight export jobs in parallel but each job hits fs and zlib, you are fighting yourself. Use a semaphore or p-retry-style concurrency limit to keep only N jobs running at once, where N matches your thread pool capacity.

Here is a minimal semaphore that works without dependencies:

class Semaphore {
  constructor(max) {
    this.max = max;
    this.count = 0;
    this.queue = [];
  }

  async acquire() {
    if (this.count < this.max) {
      this.count++;
      return;
    }
    await new Promise((resolve) => this.queue.push(resolve));
    this.count++;
  }

  release() {
    this.count--;
    if (this.queue.length > 0) {
      const next = this.queue.shift();
      next();
    }
  }
}

const pool = new Semaphore(4); // match UV_THREADPOOL_SIZE

async function safeExport(job) {
  await pool.acquire();
  try {
    await runExport(job); // uses fs, zlib, crypto
  } finally {
    pool.release();
  }
}

This does not speed up a single export. What it does is protect the rest of your service. With concurrency capped at four, the fifth export waits at the application layer instead of silently queueing in libuv where you cannot see it. Health checks, other endpoints, and unrelated fs calls are not starved.

Fix 3: move heavy work out of the thread pool entirely

UV_THREADPOOL_SIZE helps, but it is not a panacea. Some work does not belong in the main Node.js process at all.

Heavy CPU-bound hashing ( bcrypt, Argon2, PBKDF2 with high rounds): Move to a dedicated microservice or a worker thread pool. Worker threads use V8 isolates, not libuv threads, and they do not compete with file I/O.
Large file compression: Stream with zlib in small chunks rather than buffering the whole file, or offload to a background job worker that runs on separate nodes.
Bulk file reads: If you are reading 50MB files repeatedly, consider an in-memory cache or a shared volume read once at startup. Each readFile blocks a thread for the full duration of the kernel read.

Worker threads and child_process are both better homes for work that would otherwise pin a libuv thread for hundreds of milliseconds. The event loop stays free, and the libuv pool stays available for short tasks.

The decision tree

Symptom	Likely cause	Fix
Event loop is idle, but `fs`/`crypto` callbacks are late	Thread pool queueing	Raise `UV_THREADPOOL_SIZE`, limit concurrency
DNS lookups spike under load (Node 18)	`dns.lookup` hitting the 4-thread pool	Upgrade to Node 20+, add DNS caching
`bcrypt.hash` slows down all requests during signups	One slow task pinning a pool thread	Move to worker threads or dedicated service
Large gzip operations block smaller `fs` reads	Mixed workload overwhelming fixed pool	Separate heavy work to workers or cap concurrency
Timeouts that vanish when you add pods	Fewer jobs per process = less pool contention	Right-size pool or limit concurrency instead of scaling horizontally

The takeaway

Node.js async APIs are not magic. fs.readFile and bcrypt.hash are async at the JavaScript level, but underneath they run on a small, fixed thread pool that defaults to four workers. When that pool saturates, the queue grows silently. Your event loop is healthy. Your CPU is bored. Your service is still slow.

Start by measuring. Instrument fs and crypto wall times, compare them to actual work times, and look for queue wait. If you see it, raise UV_THREADPOOL_SIZE modestly, cap concurrency at the application layer so you do not fight yourself, and move long-running work to worker threads or external services. Do all three, and the “ghost latency” that had no explanation disappears.

Your code is not slow. It is just waiting for a thread.

A note from Yojji

Building production Node.js services means understanding every layer between the HTTP request and the kernel. Yojji engineers regularly diagnose hidden thread pool contention, DNS resolution storms, and event loop stalls in high-throughput systems. Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their team of 50+ senior engineers has completed hundreds of projects using Node.js, TypeScript, and cloud-native architecture. If your team is chasing latency ghosts that do not show up in CPU or memory graphs, Yojji can help you instrument and fix the layers no one else is looking at.