Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace
Your pod restarts at random. No error in the application log. No uncaughtException. The process just vanishes. The culprit is the Linux OOM killer, and fixing it means understanding the gap between what Node.js thinks it allocated and what the kernel is actually tracking.
The alert was short and useless: CrashLoopBackOff. I pulled the application logs and found the last entry was a routine GET /health at 14:23:07. Then nothing. No stack trace. No FATAL. No uncaughtException. The process simply stopped writing logs and the container restarted eight seconds later.
Kubernetes reported the reason immediately if you knew where to look: OOMKilled. The pod had hit its memory limit and the Linux kernel had stepped in to protect the rest of the node. But the application had no idea it was dying. V8 never ran out of heap. The garbage collector was not complaining. From Node.js’s perspective, everything was fine until the kernel sent SIGKILL, which cannot be caught, blocked, or ignored.
This is the OOM (Out-Of-Memory) killer, and it is one of the most confusing production failures because your application code is usually innocent. The problem lives in the gap between what Node.js thinks it is using, what the container runtime thinks it is using, and what the kernel actually counts against the cgroup limit. This post covers how the OOM killer makes its decisions, why containers amplify the confusion, how to read the evidence after the fact, and the application and platform changes that stop it from happening in the middle of your Tuesday afternoon.
How the OOM killer decides who dies
When a Linux system runs out of available memory, the kernel cannot allocate a page for a process that requests it. At that moment it has two choices: wait (and hope something frees memory) or kill something. The OOM killer chooses the latter. It walks every process, assigns an oom_score, and sends SIGKILL to the highest one.
The score is calculated from several factors:
- RSS (Resident Set Size): How much physical RAM the process occupies. Bigger processes score higher.
- Memory usage ratio: RSS divided by total available memory.
- Process niceness: Lower-nice (higher priority) processes get a slight reduction in score.
- Runtime: Long-running processes get a small bonus, though it is usually dominated by memory size.
oom_score_adj: A user-configurable adjustment.-1000means “never kill this.”+1000means “kill this first.”
In a containerized world, there is a critical twist. Kubernetes sets a cgroup memory limit on every container. When a container’s memory usage (as counted by the cgroup memory controller) crosses that limit, the kernel triggers the OOM killer inside that cgroup scope. It does not wait for the whole node to run out of RAM. A single container can OOM itself even if the node has 80% memory free.
This is where the confusion starts. Your container limit is 1 GiB. Node.js process.memoryUsage().heapUsed reports 600 MB. You should have 424 MB of headroom. But the kernel kills the pod anyway. Why?
Cgroup memory accounting: what counts against your limit
The cgroup memory controller tracks more than just your process RSS. In Kubernetes, the memory.limit_in_bytes (or memory.max in cgroup v2) is enforced by counting:
- Process RSS and cache: The actual physical pages mapped by your application.
- Page cache: Files read from disk that Linux keeps in memory. Node.js itself does not directly create much page cache, but log shippers, temporary file uploads, and npm install in init containers do.
- Kernel memory: Sockets, TCP buffers, inode caches, and slab allocations charged to your cgroup.
- Buffer cache: In older kernels and certain runtimes, I/O buffers for files opened by the process.
tmpfsmounts: If you mount anemptyDirwithoutmedium: Memory, it is disk-backed and page-cached. If you mount it withmedium: Memory, it counts directly against memory limits.- Shared memory segments: If your application (or a sidecar) uses POSIX shared memory or
/dev/shm, that counts.
The most common surprise for Node.js services is that the V8 heap itself is only part of the story. V8’s heap limit is typically set to about 1.5 GB on 64-bit systems unless you override it with --max-old-space-size. But V8 also allocates memory outside the heap for:
- ArrayBuffers and WASM memory: These live in the V8 external memory space, not the JS heap.
- Native addons: Any C++ addon (database drivers, image processing libraries, gRPC) allocates native memory via
mallocornew. - Thread stacks: Worker threads each consume a few MB of stack space outside the heap.
- Libuv and Node.js internals: Buffers for network I/O, event loop watchers, and TLS session caches.
When you add these up, a Node.js process whose heapUsed is 600 MB can easily have an RSS of 900 MB. If your Kubernetes limit is 1 GiB and you also have a 200 MB page cache from log files or npm caches in /tmp, the cgroup thinks you are at 1.1 GiB and the OOM killer fires.
This explains the most common OOM symptom: the application is not leaking, it is just paying for memory that does not show up in heapUsed.
Reading the evidence after the kill
When a pod is OOMKilled, Kubernetes stores the reason in the container status:
kubectl describe pod your-pod-name | grep -A 5 "Last State"
You want to see:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Exit code 137 is 128 + 9, where 9 is SIGKILL. If you are not sure whether it was OOM or someone ran kubectl delete, look at the node kernel logs:
kubectl get node $NODE_NAME -o jsonpath='{.spec.providerID}'
# Then on the node:
journalctl -k | grep -i "killed process"
The kernel logs the exact process ID, the process name, and the memory statistics that triggered the kill:
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=...,mems_allowed=0,...
Memory cgroup out of memory: Killed process 12345 (node) total-vm:...kB, anon-rss:...kB, file-rss:...kB, shmem-rss:...kB
The critical numbers are anon-rss (anonymous memory, mostly your heap and native allocations) and file-rss (page cache). If file-rss is large, your application or sidecar is creating disk cache inside the cgroup. If anon-rss alone is close to the limit, your process itself is heavy.
Inside the container, before the kill, you can also read the cgroup memory statistics directly:
cat /sys/fs/cgroup/memory/memory.stat
On cgroup v2 systems (most modern Kubernetes clusters):
cat /sys/fs/cgroup/memory.stat
Look for anon and file. The sum of these plus kernel_stack, slab, and sock gives you the memory usage that the kernel compares against your limit.
Node.js memory visibility: what the process can see
Node.js exposes some memory metrics via process.memoryUsage():
const usage = process.memoryUsage();
console.log({
rss: usage.rss,
heapTotal: usage.heapTotal,
heapUsed: usage.heapUsed,
external: usage.external,
arrayBuffers: usage.arrayBuffers,
});
- rss: What the operating system thinks the process uses. This is the closest to
anon-rssfrom the cgroup perspective. - heapTotal / heapUsed: What V8 has allocated for JavaScript objects.
- external: Memory allocated by V8 on behalf of JS objects but outside the JS heap, such as
Bufferinternal data (before Node.js 20) and external strings. - arrayBuffers: Memory backing
ArrayBufferandSharedArrayBufferinstances. This is counted separately because it is the most common source of “heap is low but RSS is high.”
If arrayBuffers or external is climbing while heapUsed stays flat, you have native memory growth. Common causes:
- Streaming large payloads into
Bufferobjects without pipeline backpressure. - Loading large files into
ArrayBufferviafs.readFile. - Native database drivers queuing result sets in unmanaged memory.
- WASM modules with large linear memories.
Here is a small diagnostic function you can drop into an existing /metrics endpoint to track the dangerous gap:
import os from 'node:os';
import fs from 'node:fs';
function getMemoryMetrics() {
const mem = process.memoryUsage();
const systemFree = os.freemem();
const systemTotal = os.totalmem();
// Best-effort cgroup memory limit detection
let cgroupLimit;
try {
const v1 = fs.readFileSync('/sys/fs/cgroup/memory/memory.limit_in_bytes', 'utf8');
cgroupLimit = parseInt(v1, 10);
} catch {
try {
const v2 = fs.readFileSync('/sys/fs/cgroup/memory.max', 'utf8');
cgroupLimit = v2.trim() === 'max' ? systemTotal : parseInt(v2, 10);
} catch {
cgroupLimit = systemTotal;
}
}
const effectiveRss = mem.rss + (mem.external ?? 0);
return {
process_heap_used_bytes: mem.heapUsed,
process_rss_bytes: mem.rss,
process_external_bytes: mem.external ?? 0,
process_array_buffers_bytes: mem.arrayBuffers ?? 0,
process_effective_rss_bytes: effectiveRss,
cgroup_memory_limit_bytes: cgroupLimit,
cgroup_usage_ratio: Number((effectiveRss / cgroupLimit).toFixed(4)),
system_free_bytes: systemFree,
};
}
Export cgroup_usage_ratio to Prometheus and alert when it exceeds 0.75. Do not alert on heapUsed alone. It will lie to you.
Common root causes in Node.js services
1. Large file uploads into memory.
If you accept file uploads and store them in Buffer or ArrayBuffer before streaming to S3, every concurrent upload adds to rss and external. The fix is pipeline streaming:
import { pipeline } from 'node:stream/promises';
import { Upload } from '@aws-sdk/lib-storage';
// Bad: loads entire file into memory
// const buffer = await fs.readFile(uploadPath);
// Good: streams through without buffering
await pipeline(
req,
new Upload({
client: s3Client,
params: { Bucket: 'uploads', Key: filename, Body: req },
}),
);
2. Native addons that allocate outside V8.
Sharp (image processing), libxmljs, and some database client libraries allocate native buffers that do not count against heapUsed. Profile these with process.memoryUsage().external or track RSS directly.
3. Worker thread memory not visible in the main thread heap.
Each worker thread has its own V8 heap and its own RSS contribution. The main thread’s process.memoryUsage() does not include worker memory. If you spawn workers for CPU-intensive tasks, you must account for them in your container limit:
const workerCount = os.cpus().length;
const baseRssEstimate = 300 * 1024 * 1024; // 300 MB base
const workerRssEstimate = 200 * 1024 * 1024; // 200 MB per worker
const containerLimit = (baseRssEstimate + workerCount * workerRssEstimate) * 1.3;
4. Sidecars stealing the cgroup budget.
Istio proxy, Fluent Bit log shippers, and vault agents all run in the same pod and share the same memory limit unless you set container-level limits individually. A log shipper that buffers a burst of stderr output can OOM the entire pod, killing your Node.js app in the crossfire.
Always set per-container limits in your deployment spec:
spec:
containers:
- name: api
resources:
limits:
memory: "1Gi"
- name: istio-proxy
resources:
limits:
memory: "256Mi"
If you only set the pod-level limit, the sum of all containers must fit inside it, but any single container can grow until the pod limit is hit, taking the others down with it.
5. --max-old-space-size mismatched to the cgroup limit.
By default, V8 caps the old generation heap at about 1.5 GB on 64-bit systems. If your Kubernetes limit is 1 GB, V8 will happily try to grow the heap to 1.5 GB and the OOM killer will stop it at 1 GB. The result is a process that behaves like it ran out of heap (frequent GC, growing latency) but actually died from the kernel.
Set --max-old-space-size to roughly 75% of your container memory limit:
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=768"
resources:
limits:
memory: "1Gi"
This gives V8 a clear ceiling below the cgroup limit, so the garbage collector has a chance to reclaim memory before the kernel intervenes. The remaining 25% is headroom for external memory, native allocations, and page cache.
Kernel OOM behavior tuning
You cannot disable the OOM killer without fundamentally changing how Linux handles memory pressure. What you can do is make it less surprising and more informative.
Enable the OOM killer log
Ensure your kernel is configured to log kills (it is by default on most distributions, but verify):
sysctl vm.oom_dump_tasks=1
sysctl vm.oom_kill_allocating_task=0
oom_dump_tasks=1 logs every process in the cgroup when the kill happens, which helps you identify whether a sidecar or the main process was the largest consumer.
oom_kill_allocating_task=0 lets the kernel kill the largest process, which is usually what you want. Setting it to 1 kills whichever process triggered the allocation that crossed the limit, which might be an innocent process that happened to allocate at the wrong moment.
Consider memory overcommit
Linux defaults to vm.overcommit_memory=0, which uses a heuristic to allow or deny allocations. Set it to 1 only if you know your workload does not over-promise memory. For containerized Node.js, leave it at 0 or 2 (strict overcommit) and size your limits correctly.
Use memory.min or memory.low in cgroup v2
If your cluster runs cgroup v2, you can set memory.min to guarantee a baseline reservation for the main container, making it less likely the kernel will choose your app when memory pressure hits:
# Not natively supported in Kubernetes Pod specs as of 1.30,
# but achievable via a custom scheduler or init container that writes to cgroupfs.
For most teams, the simpler fix is accurate sizing and explicit per-container limits.
Preventing OOM: the sizing checklist
Before your next deploy, verify:
-
--max-old-space-sizeis set to 70-80% of the container memory limit. -
process.memoryUsage().rss + externalis exported as a metric, not justheapUsed. - An alert fires when
cgroup_usage_ratioexceeds 0.75. - File uploads and large responses use streaming, not in-memory buffering.
- Worker thread count and memory usage are included in the container limit estimate.
- Every container in the pod has its own memory limit, not just the pod-level limit.
- Sidecars are sized explicitly and their logs are checked after any OOM incident.
- Native addons are audited for external memory allocation.
The takeaway
The OOM killer is not a bug. It is Linux doing exactly what it was designed to do when memory is exhausted. The problem is that containers create a layer of indirection between your application and the kernel, and the metrics Node.js exposes by default do not show the full picture.
If you are only watching heapUsed, you are flying blind. Start tracking RSS and external memory. Size your V8 heap limit below your cgroup limit. Stream large payloads. Give every sidecar its own budget. And when a pod dies without a log entry, go straight to kubectl describe and journalctl -k before you spend an afternoon looking for a leak that was never there.
A note from Yojji
The difference between a container that survives traffic spikes and one that vanishes without a trace is often not the application code but the resource accounting layer beneath it. Understanding how the Linux kernel, cgroup controllers, and V8 negotiate memory boundaries is the kind of systems-level discipline that separates a functioning deployment from a reliable one.
Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their engineering teams specialize in the JavaScript ecosystem, cloud-native infrastructure on AWS, Azure, and Google Cloud, and the operational rigor that keeps production systems predictable when resources get tight.