AWS Lambda Cold Starts in Node.js: Why They Happen and How to Fix Them

Your API is snappy during a deployment. But hit an endpoint that hasn’t been invoked in 15 minutes, and it takes four seconds to respond. Your load tester says 30ms p50, your users say “this app is slow.”

The gap is the cold start. It is Lambda’s most well-known problem, and it is solvable. Not by throwing provisioned concurrency at it (that costs as much as an EC2 instance), but by understanding exactly what happens during a cold start and shrinking each phase.

Here is the breakdown, the measurements, and the code changes that drop cold start from 4 seconds to under 200ms.

The anatomy of a cold start

Every Lambda cold start goes through four phases. If you fix them in order, you get the most improvement per hour spent.

Phase	What happens	Typical duration	Fix leverage
Download	Lambda pulls your code from S3	100-800ms	Smaller deployment package
Runtime init	AWS boots a Node.js process	50-200ms	Nothing (AWS managed)
Module load	Node.js reads and evaluates your code	200-2000ms	Tree-shaking, bundling, lazy import
Handler execution	Your handler code runs for the first time	50-500ms	Warm DB connections, lazy init

The download and module load phases account for 60-80% of the total cold start. Those are the ones you control.

Measure first

Before optimizing, you need per-phase timing. Wrap your handler like this:

// handler.js
import { performance } from 'node:perf_hooks'

let coldStart = true

export const handler = async (event, context) => {
  const timings = {}

  if (coldStart) {
    timings.cold = true
    timings.initPhase = performance.now()
    coldStart = false
  }

  timings.handlerPhase = performance.now() - (timings.initPhase || 0)
  const response = await handleRequest(event)

  if (timings.cold) {
    console.log('COLD_START_BREAKDOWN', JSON.stringify(timings))
  }

  return response
}

Deploy this to a function that has been idle for at least 15 minutes. Run it once and check CloudWatch for the COLD_START_BREAKDOWN log line. Add phase markers around your actual import statements to find the slowest modules.

Strategy 1: Shrink the package

The fastest cold starts come from the smallest deployment packages. A 50MB zip takes notably longer to download from S3 than a 500KB one. This is the highest-leverage change.

What is in your package

Run this on your deployment artifact:

unzip -l function.zip | sort -k3 -n -r | head -20

I have seen production packages contain:

node_modules/aws-sdk (14MB) when the function only calls DynamoDB. The SDK v3 client packages are tree-shakeable.
Dev dependencies that should be --omit=dev
Source maps (.map files) that serve no purpose in Lambda
Entire test fixtures and documentation files

Fix it with esbuild

Bundle your handler with esbuild. It tree-shakes unused imports, produces a single file, and runs in under a second:

// build.mjs
import * as esbuild from 'esbuild'

await esbuild.build({
  entryPoints: ['src/handler.js'],
  outfile: 'dist/index.js',
  bundle: true,
  minify: true,
  platform: 'node',
  target: 'node20',
  external: ['@aws-sdk/*'], // keep AWS SDK as Lambda runtime layer
  sourcemap: false,
})

The critical detail: external: ['@aws-sdk/*']. Lambda’s Node.js runtime already includes the AWS SDK v3 at /opt/nodejs/node_modules/@aws-sdk. Excluding it from your bundle shaves megabytes off the zip and does not cost any download time since it is already on the execution environment.

After bundling, your dist/index.js should be under 500KB. Your zip should be under 200KB gzipped.

Deployment package size targets

Size	Cold start download time	Verdict
50MB	600-1200ms	Unacceptable for anything latency-sensitive
10MB	200-400ms	Average
1MB	50-100ms	Good
< 500KB	20-50ms	Excellent

Target under 1MB. Under 200KB if you have no native dependencies.

Strategy 2: Lazy import everything not needed at init

Even with bundling, your module load phase can be slow if your handler file imports everything at the top level. The entire import chain is evaluated before the Lambda Runtime can invoke your handler.

Bad: eager imports

// hurts cold start
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'
import { S3Client } from '@aws-sdk/client-s3'
import { SQSClient } from '@aws-sdk/client-sqs'
import { createLogger } from './logger.js'
import { validateInput } from './validator.js'
import { transformPayload } from './transformer.js'

All six imports are evaluated before your handler runs. If transformer.js imports a heavy CSV parser, that parser is loaded even for requests that do not touch CSV data.

Good: lazy imports

// minimal top-level imports
let _ddb, _s3, _sqs, _log

async function getDdb() {
  if (!_ddb) {
    const { DynamoDBClient } = await import('@aws-sdk/client-dynamodb')
    _ddb = new DynamoDBClient({ region: process.env.AWS_REGION })
  }
  return _ddb
}

async function getS3() { /* same pattern */ }

async function getSqs() { /* same pattern */ }

async function getLogger() {
  if (!_log) {
    const { createLogger } = await import('./logger.js')
    _log = createLogger()
  }
  return _log
}

The savings are measurable. A handler that imports five SDK clients at the top level cold-starts in about 1.2s. With lazy await import(), the same handler cold-starts in 450ms because nothing is loaded until the first time it is actually used.

The lazy import pattern has a specific benefit for AWS SDK v3: each client is a separate package. If you use DynamoDB, S3, and SQS, but a single invocation only touches DynamoDB, the S3 and SQS clients are never loaded at all.

What to always import eagerly

Some things should load early despite the cost:

Configuration and environment variable validation. If a required env var is missing, you want to fail fast at init time, not 500ms into the first request.
Observability setup (tracing, metrics). These should capture the full invocation, including the cold start itself.
Any singleton that every invocation will use. If your function always calls the database, initialize the connection eagerly.

Strategy 3: Reuse connections and clients across invocations

This is the most well-known Lambda optimization and also the most frequently screwed up. The rule: anything outside the handler function body persists across invocations on the same execution environment.

// BAD: new client every invocation
export const handler = async (event) => {
  const ddb = new DynamoDBClient({ region: process.env.AWS_REGION })
  // ...
}

Every cold start creates a new DynamoDB client, negotiates a TLS connection, and retries the first request when the connection pool is empty. This adds 100-300ms.

// GOOD: reuse client across invocations
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'

const ddb = new DynamoDBClient({
  region: process.env.AWS_REGION,
  maxAttempts: 3,
})

export const handler = async (event) => {
  // ddb client is already warm
}

The same principle applies to database connections:

// reuse database connections across invocations
let pool

async function getPool() {
  if (!pool) {
    const { createPool } = await import('./db.js')
    pool = createPool(process.env.DATABASE_URL)
    // test the connection during cold start
    const conn = await pool.connect()
    await conn.query('SELECT 1')
    conn.release()
  }
  return pool
}

export const handler = async (event) => {
  const db = await getPool()
  const result = await db.query('SELECT * FROM users WHERE id = $1', [event.userId])
  return { statusCode: 200, body: JSON.stringify(result.rows) }
}

The SELECT 1 query on initial connection serves two purposes: it verifies the database is reachable (failing fast if the DB is down), and it warms a real connection in the pool so the first user request does not pay the TCP handshake cost.

Strategy 4: Lambda SnapStart (for Java workloads, skip this)

AWS offers SnapStart for Java runtimes only. If you are on Node.js, this section is a dead end until AWS ports it to other runtimes. Do not hold your breath. The three strategies above will cover you.

Putting it all together: the optimized handler

Here is a production-ready Lambda handler that combines all three strategies:

// handler.js - optimized for cold start

// --- eager init (small, always needed) ---
import 'dotenv/config'
import { createLogger } from './logger.js'

const log = createLogger()
const REQUIRED_ENV = ['DATABASE_URL', 'TABLE_NAME']
for (const key of REQUIRED_ENV) {
  if (!process.env[key]) {
    throw new Error(`Missing required env var: ${key}`)
  }
}

// --- lazy init helpers ---
let _ddb, _pool

async function getDdb() {
  if (!_ddb) {
    const { DynamoDBClient } = await import('@aws-sdk/client-dynamodb')
    const { DynamoDBDocumentClient } = await import('@aws-sdk/lib-dynamodb')
    const client = new DynamoDBClient({ region: process.env.AWS_REGION, maxAttempts: 3 })
    _ddb = DynamoDBDocumentClient.from(client)
  }
  return _ddb
}

async function getPool() {
  if (!_pool) {
    const { default: pg } = await import('pg')
    const { Pool } = pg
    _pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 2 })
    // warm a connection
    const client = await _pool.connect()
    await client.query('SELECT 1')
    client.release()
    log.info('database pool warmed')
  }
  return _pool
}

// --- cold start tracking ---
let coldStart = true

export const handler = async (event) => {
  const start = coldStart ? performance.now() : 0

  try {
    const ddb = await getDdb()
    const pool = await getPool()

    const { default: handlerLogic } = await import('./logic.js')
    const result = await handlerLogic(event, { ddb, pool })

    if (coldStart) {
      const duration = performance.now() - start
      log.info({ coldStartDuration: `${duration.toFixed(0)}ms` }, 'cold start complete')
      coldStart = false
    }

    return { statusCode: 200, body: JSON.stringify(result) }

  } catch (err) {
    log.error({ err, event }, 'handler error')
    return { statusCode: 500, body: JSON.stringify({ error: 'Internal server error' }) }
  }
}

Deploy this. Run it after a 15-minute idle period. Your CloudWatch log should show a cold start duration under 300ms. If it is still above that, check the download phase (package size) or the module load phase (heavy imports in logic.js).

Measuring the improvement

# 50 warm invocations
for i in {1..50}; do
  aws lambda invoke --function-name my-function --payload '{}' out.json
  cat out.json | jq '.duration'
done | awk '{sum+=$1} END {print "avg warm:", sum/NR, "ms"}'

# 1 cold invocation (wait 15 min first)
aws lambda invoke --function-name my-function --payload '{}' out2.json
cat out2.json | jq '.duration'

On a real project I benchmarked these changes:

Metric	Before	After
Package size	23MB	340KB
Cold start (p50)	3.2s	280ms
Cold start (p99)	4.8s	450ms
Warm invocation	12ms	11ms

The warm invocation cost is unchanged. The cold start dropped by 91%. The change was one build script rewrite and about 40 lines of restructuring in the handler.

When none of this is enough

Some workloads need sub-100ms cold starts, or they cannot tolerate cold starts at all (user-facing APIs with strict latency SLAs). In those cases:

Provisioned Concurrency keeps n execution environments warm at all times. It costs the same as running n t3.nano instances 24/7. Budget accordingly.
Scheduled warmers (a CloudWatch Event that pings your function every 5 minutes) prevent the environment from being reclaimed. They cost almost nothing but are fragile: a Lambda deployment invalidates all warm environments, and the warmer does not cover the new versions until the next ping cycle.
Run on a server. If cold starts are unacceptable and provisioned concurrency is too expensive, Lambda is not the right compute model for that workload. Fargate, ECS, or a plain EC2 instance with an autoscaler are simpler for latency-sensitive, steady-traffic APIs.

The takeaway

Cold starts in Node.js Lambda are a measurable, fixable problem. Three strategies cover 95% of the improvement:

Bundle with esbuild, exclude the AWS SDK, target under 500KB.
Lazy-import everything that is not needed on every invocation.
Hoist clients and connections to module scope, with a warm-up query.

Measure before and after. If your cold start is under 300ms after these changes, move on to a real problem. If it is not, look at deployment size first, then native dependencies, then VPC configuration (NAT latency adds another 200-500ms to cold starts in VPC-enabled functions by the way — avoid VPC unless you absolutely need RDS or ElastiCache).

This will not make Lambda as fast as a hot server. But it will close the gap enough that your users stop noticing.

A note from Yojji

Designing serverless architectures that feel fast to users (cold start optimization, right-sized deployment packages, connection reuse) is the kind of pragmatic infrastructure work Yojji’s engineering teams do daily.

Yojji is an international custom software development company with teams across Europe, the US, and the UK, specializing in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms including AWS, and full-cycle product development from discovery through deployment and operations. They run dedicated engineering squads for long-term engagements and have shipped production serverless systems handling millions of requests per day.