Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop
Your p99 latency spikes every few minutes and they align perfectly with garbage collection pauses. Adding RAM does not help. Here is how V8s generational GC actually works, which flags change its behavior, and the monitoring setup that tells you if the tuning worked.
Your latency graph is clean for hours, then a 400 ms spike appears out of nowhere. It does not correlate with traffic, database slow queries, or deployments. It correlates with nothing you can find in application logs. Then you enable --trace-gc and realize the spikes are exactly aligned with V8s full mark-sweep-compact collections. The garbage collector is doing its job, but it is doing it at the worst possible moment, and the default heap limits mean it waits until the last second to do the expensive work.
Most Node.js services run with default V8 heap settings. That means the garbage collector grows the old generation until it either hits a computed limit based on available memory or the container OOM killer intervenes. On a 4 GB container, the old space can balloon to 1.8 GB before V8 decides a full collection is necessary. At that size, a mark-sweep-compact pause can take hundreds of milliseconds. For a service handling 10,000 RPS, that is a catastrophe.
This post is not a computer science lecture. It is the three heap parameters you set, the one monitoring snippet you add, and the deployment rule that prevents your next latency spike from being a GC pause.
How V8 decides when to collect
V8 splits the heap into two generations: young and old. Young generation collections, called scavenges, are fast and frequent. They copy live objects out of the “from” semi-space into the “to” semi-space, discard the rest, and pay only for the objects that survive. Most objects die young, so scavenges are cheap.
Old generation collections, called mark-sweep-compact, are the expensive ones. V8 walks the entire old heap, marks reachable objects, sweeps dead ones, and compacts live objects to reduce fragmentation. The cost is proportional to the size of the live set, not the allocation rate. A 2 GB heap with 1.5 GB live takes longer to collect than a 1 GB heap with 500 MB live, even if both allocate at the same rate.
The default max-old-space-size is computed at startup based on available physical memory. On a container with a 1 GB limit, it might default to roughly 1.4 GB on a 64-bit machine, which sounds generous until you remember that RSS includes C++ memory, Buffers, TLS overhead, and the heap itself. V8 will push the heap close to that limit, then trigger a full GC. If the live set is large, the pause is long.
The three flags that matter
1. —max-old-space-size: cap the heap before the container does
The single most important flag is --max-old-space-size. It sets the hard ceiling for the old generation. You want this ceiling to be lower than your container memory limit, because Node.js uses memory outside the V8 heap.
A practical rule: set --max-old-space-size to 70% of your containers memory limit, then subtract a fixed buffer for large Buffers or native modules. On a 1 GB container:
node --max-old-space-size=700 server.js
This forces V8 to run full collections earlier and more often. That sounds bad, but a 50 ms collection every minute is usually cheaper than a 400 ms collection every ten minutes. Your p99 thanks you.
2. —max-semi-space-size: tame the scavenges
The young generation uses two semi-spaces. By default, each is 16 MB on 64-bit systems. If your service allocates large temporary objects (JSON parsing, image processing, buffer transforms), objects that do not fit in the young space are promoted directly to old space. This is premature promotion, and it means more expensive full collections.
You can increase the semi-space size to give large temporary objects more room to die young:
node --max-semi-space-size=64 --max-old-space-size=700 server.js
Do not set this to half your heap. Scavenges copy live objects between semi-spaces, so a 512 MB semi-space means a 1 GB young generation and a scavenge copies the live set twice. The sweet spot is usually 32-128 MB for typical API workloads.
3. —heapsnapshot-near-heap-limit: debug the pause, not just the crash
When a full GC does not free enough memory, V8 will try again, then again, then crash with an out-of-memory error. By then, the container is already unhealthy. The flag --heapsnapshot-near-heap-limit=1 tells V8 to write a heap snapshot to disk just before the final GC attempts:
node --max-old-space-size=700 --max-semi-space-size=64 --heapsnapshot-near-heap-limit=1 server.js
The snapshot lands in the working directory. You can load it into Chrome DevTools and see what was alive at the peak. This is invaluable because it tells you whether the pause was caused by a leak (unbounded growth) or simply a heap that is too large for the workload.
Reading —trace-gc before you add instrumentation
You do not need a PerformanceObserver to get a quick signal. The --trace-gc flag prints every collection to stderr. A typical line looks like this:
[12345:0x...] 12345 ms: Mark-sweep 234.5 (289.2) -> 189.2 (289.2) MB, 42.1 / 0.0 ms
The format is: [pid:isolate] timestamp ms: type before_heap (total_heap) -> live_heap (total_heap) MB, pause_ms / incremental_ms.
The first number after the arrow is the live set after the collection. If that number climbs steadily over time, you have a leak. If it stays flat but the before_heap grows, you simply have a large working set and need a bigger --max-old-space-size or more pods.
Add --trace-gc to your container startup for a single day, grep the logs for Mark-sweep, and plot pause duration against time. If the pauses exceed your latency budget, you have a GC tuning problem, not a code problem. Once you see the pattern, remove the flag and switch to the PerformanceObserver approach above for continuous monitoring. You do not want --trace-gc enabled permanently, because the stderr volume can drown your logging pipeline.
The production server setup
Here is the Dockerfile entrypoint and the server bootstrap that applies the tuning and exposes the monitoring endpoint.
FROM node:20-alpine
WORKDIR /app
COPY . .
ENV NODE_ENV=production
ENV UV_THREADPOOL_SIZE=128
CMD ["node", "--max-old-space-size=700", "--max-semi-space-size=64", "--heapsnapshot-near-heap-limit=1", "server.js"]
And the health check endpoint that reports heap pressure:
const v8 = require('v8');
const http = require('http');
function getHeapPressure() {
const stats = v8.getHeapStatistics();
const used = stats.used_heap_size;
const limit = stats.heap_size_limit;
return {
usedMb: Math.round(used / 1024 / 1024),
limitMb: Math.round(limit / 1024 / 1024),
percentUsed: Math.round((used / limit) * 100)
};
}
const server = http.createServer((req, res) => {
if (req.url === '/health') {
const pressure = getHeapPressure();
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({
status: 'ok',
heap: pressure,
gcTuned: true
}));
return;
}
res.writeHead(200);
res.end('ok');
});
server.listen(3000, () => {
console.log('Server listening on port 3000');
console.log('Heap limit:', getHeapPressure().limitMb, 'MB');
});
Monitoring GC events in application code
Flags are static. Runtime monitoring tells you if the tuning worked. Node.js exposes GC events through perf_hooks. The following snippet logs every old-generation collection and its duration:
const { PerformanceObserver } = require('perf_hooks');
const GC_NAMES = {
1: 'scavenge',
2: 'markSweepCompact',
4: 'incrementalMarking',
8: 'weakPhantom',
16: 'weakPhantomGlobal'
};
const obs = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
// Node.js exposes kind and flags directly on the entry for gc events
const kind = GC_NAMES[entry.kind] || `kind-${entry.kind}`;
const duration = entry.duration;
// Only log expensive events (mark-sweep-compact or incremental phases)
if (entry.kind === 2 || entry.kind === 4) {
console.log(JSON.stringify({
event: 'gc',
kind,
durationMs: Math.round(duration * 100) / 100,
flags: entry.flags || 0,
timestamp: new Date().toISOString()
}));
}
}
});
obs.observe({ entryTypes: ['gc'] });
Feed this to your structured logging pipeline. Alert when durationMs exceeds 50 ms for markSweepCompact. That is your signal that the live set is too large for the heap size you picked.
The deployment rule
Set your Kubernetes memory limit, then compute the Node.js flag from it. Never set --max-old-space-size equal to the container limit. A service with a 512 MB limit and --max-old-space-size=512 will OOM during every full GC because V8 needs headroom for the collector itself, plus the C++ memory for libuv, OpenSSL, and any native addons.
Here is the rule we use:
max_old_space = floor(container_limit_mb * 0.7) - 64
max_semi_space = min(128, floor(container_limit_mb * 0.05))
For a 2 GB container: --max-old-space-size=1360 --max-semi-space-size=64.
For a 512 MB container: --max-old-space-size=294 --max-semi-space-size=25 (round to 32).
Add a startup log that prints the effective heap limit. When your next incident starts, the first line in the logs should tell you whether the process was tuned or running defaults.
What this does not fix
If your live set is growing because of a leak, no amount of heap tuning will save you. A smaller heap will just OOM faster. Use the flags to make GC predictable, then use heap snapshots to find the leak.
If your workload is genuinely memory-heavy (image processing, large ML models), consider worker threads for the heavy work and keep the main thread heap small. Worker threads get their own V8 isolate and their own heap limit.
Summary
The default V8 heap behavior is optimized for desktop Chrome, not a containerized API server. It grows lazily and collects rarely, which turns every full GC into a latency event.
- Cap the old space at 70% of your container memory minus a buffer.
- Increase semi-space size if you see premature promotion in heap snapshots.
- Enable heap snapshots near the limit so you can inspect the peak.
- Monitor
markSweepCompactduration viaperf_hooksand alert on it. - Log the configured heap limit at startup.
That is the tuning. The result is not zero GC cost, but predictable GC cost that fits inside your latency budget.
A note from Yojji
Tuning V8 garbage collection for predictable latency in containerized environments is exactly the kind of low-level backend refinement that separates prototypes from production systems. Yojji is an international custom software development company with offices in Europe, the US, and the UK. Their senior engineers routinely work through these kinds of Node.js runtime details to keep backend services stable under real traffic.