Your API pods show green health checks while clients get connection refused errors. The culprit is not your application. It is the Linux file descriptor limit, and the fix is a mix of kernel tuning, pool sizing discipline, and monitoring that most teams skip.
Streaming multi-gigabyte files through your Node.js server burns bandwidth, memory, and connection pools. Here is the direct-to-S3 upload pattern that moves the bytes past your API entirely, with presigned URLs, multipart upload logic, and the security guardrails most tutorials skip.
Your p99 latency spikes every few minutes and they align perfectly with garbage collection pauses. Adding RAM does not help. Here is how V8s generational GC actually works, which flags change its behavior, and the monitoring setup that tells you if the tuning worked.
You are running an 8-core server and Node.js uses one. Here is the cluster module wiring — with shared-nothing workers, externalized state, and graceful shutdown — that turns unused silicon into HTTP throughput without touching Kubernetes.
You added read replicas to scale reads. Then users started seeing 404s for records they just created. Here is the request-scoped routing pattern that fixes replication lag without giving up the performance win.