Structured Logging With Pino: The 60 Lines That Make Your 3 a.m. Debugging Possible

A customer emails you at 03:14 saying “I clicked Pay, got a 500, and was charged anyway.” You open the production logs. You see this:

[2026-05-21 03:14:07] processing request
[2026-05-21 03:14:07] db connected
[2026-05-21 03:14:07] error: cannot read property of undefined
[2026-05-21 03:14:07] processing request
[2026-05-21 03:14:07] db connected

Whose request was that? Which user? Which charge? Was the second request a retry of the first? The string-formatted, timestamped wall of text gives you nothing to grep against. You spend an hour cross-referencing IDs across three log streams and finally find the bug, ten minutes before standup.

The fix is structured logging — JSON logs, one event per line, with a correlation ID that ties every log line for a single request together. Sixty lines of Node.js, one library, and a habit change. Here is exactly what that looks like, why each piece matters, and the five queries that turn it into a debugging superpower.

What “structured” actually means

A structured log entry is a JSON object, not a sentence. Compare:

[2026-05-21 03:14:07] user 4821 charge failed: card declined (amount: 1500)

…to:

{"level":"error","time":1716258847123,"req_id":"01HXYZ...","user_id":4821,"charge_id":"ch_abc","amount_cents":1500,"event":"charge_failed","reason":"card_declined","msg":"charge failed"}

The first is a story you have to read. The second is a record you can query. In a log aggregator, the JSON version lets you write event:charge_failed AND reason:card_declined and get every relevant line back in 200ms. The string version requires either a regex or your eyeballs.

Every observability tool worth using — Datadog, Logtail, Grafana Loki, ClickHouse, even jq against a file — works dramatically better against JSON. The investment to switch is small. The investment to not switch compounds with every incident.

Why Pino over Winston, Bunyan, or `console.log`

Three reasons. First, Pino is roughly 5× faster than Winston for the same output, because it writes JSON synchronously to a stream and offloads anything fancy (rotation, transports, pretty-printing) to a separate process. Second, its API is smaller — there are basically four methods to learn (info, warn, error, child) and one config object. Third, the Pino ecosystem assumes structured logs from day one; child loggers, redaction, and serializers all just work without ceremony.

You can argue Winston has more transports. You can argue Bunyan has the cleanest CLI. In production, the thing that matters is “did the log line get written before the process died, and is it parseable.” Pino wins both, and the maintainer (Matteo Collina, who also maintains parts of Node core) has been steadily de-risking it for years.

The 60 lines

Three files: a logger module, a request-context module that carries a correlation ID through the whole request, and the Express wiring. Together, exactly enough to make every log line traceable.

// logger.ts
import pino from 'pino';

export const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  // Pino's default time is epoch ms — keep it. Human-readable timestamps
  // are an aggregator concern; epoch is precise and timezone-free.
  base: {
    service: process.env.SERVICE_NAME ?? 'api',
    env: process.env.NODE_ENV ?? 'development',
    version: process.env.GIT_SHA ?? 'dev',
  },
  // Redact common secret fields before they ever hit the JSON stream.
  redact: {
    paths: [
      'req.headers.authorization',
      'req.headers.cookie',
      '*.password',
      '*.token',
      '*.card_number',
    ],
    censor: '[REDACTED]',
  },
  // Production: raw JSON to stdout. Local: pretty-print via pino-pretty.
  transport: process.env.NODE_ENV === 'development'
    ? { target: 'pino-pretty', options: { colorize: true, singleLine: true } }
    : undefined,
});

// request-context.ts
import { AsyncLocalStorage } from 'node:async_hooks';
import { randomUUID } from 'node:crypto';
import type { Logger } from 'pino';

type Context = { reqId: string; logger: Logger };

const storage = new AsyncLocalStorage<Context>();

export function runWithContext<T>(ctx: Context, fn: () => T): T {
  return storage.run(ctx, fn);
}

export function getLogger(): Logger {
  // Falls back to the root logger outside a request — useful for boot logs.
  return storage.getStore()?.logger ?? require('./logger').logger;
}

export function getReqId(): string | undefined {
  return storage.getStore()?.reqId;
}

export function newReqId(): string {
  return randomUUID();
}

// middleware.ts
import type { Request, Response, NextFunction } from 'express';
import { logger } from './logger';
import { runWithContext, newReqId } from './request-context';

export function requestLogger(req: Request, res: Response, next: NextFunction) {
  // Honor an upstream correlation id (load balancer, gateway, calling service).
  const reqId = (req.header('x-request-id') as string) || newReqId();
  res.setHeader('x-request-id', reqId);

  // A child logger inherits the root config and adds reqId/route/method
  // to every line written through it. This is the line that ties the
  // entire request together.
  const child = logger.child({
    req_id: reqId,
    method: req.method,
    route: req.path,
  });

  const start = process.hrtime.bigint();

  res.on('finish', () => {
    const durMs = Number(process.hrtime.bigint() - start) / 1e6;
    child.info(
      { status: res.statusCode, duration_ms: Math.round(durMs) },
      'request completed',
    );
  });

  runWithContext({ reqId, logger: child }, () => next());
}

Wire it once at the top of the chain:

import express from 'express';
import { requestLogger } from './middleware';
import { getLogger } from './request-context';

const app = express();
app.use(express.json());
app.use(requestLogger);

app.post('/api/charges', async (req, res) => {
  const log = getLogger(); // child logger — already has req_id baked in
  log.info({ user_id: req.body.user_id, amount: req.body.amount }, 'charge_attempt');

  try {
    const charge = await createCharge(req.body);
    log.info({ charge_id: charge.id }, 'charge_succeeded');
    res.json(charge);
  } catch (err) {
    log.error({ err, user_id: req.body.user_id }, 'charge_failed');
    res.status(500).json({ error: 'internal_error' });
  }
});

That is the whole change. Three small modules, one middleware, and a habit of calling getLogger() inside every handler instead of importing the root logger.

Why each piece is there

A few of these decisions are easy to skip and will quietly bite you later.

AsyncLocalStorage instead of passing logger as a parameter. You could thread the logger through every function call, and some teams do. In practice, that breaks the moment a downstream library calls back into your code (a Postgres callback, an event emitter, a setTimeout). AsyncLocalStorage follows the async chain automatically — even into setImmediate, Promise.then, and most well-behaved libraries. The cost is roughly 2% CPU at high request rates, which is far cheaper than the alternative (logs without correlation IDs, or every function having a log parameter).

req_id as a UUID, not a counter. A monotonic counter is tempting because it is small. It also collides the moment you scale to multiple pods. UUIDs are 36 characters and cost nothing meaningful in storage. randomUUID() from node:crypto is fast enough that you will not measure it.

Honoring an upstream x-request-id header. When a request arrives from your load balancer, API gateway, or another internal service, that caller almost certainly already generated a correlation ID. Use it. The whole point of distributed tracing is that one ID flows from the edge all the way to the database. Generating a fresh one in your service breaks the chain and forces you to JOIN logs across services on timestamps and user IDs — which is exactly the work you are trying to eliminate.

Setting x-request-id on the response. Now your error responses can include the ID, your frontend can surface it to users (“Error: contact support with reference 01HXYZ…”) and your support team can paste a single string into the log search bar. Tickets that used to take twenty minutes to localize take ten seconds.

Redaction, not “remember to not log secrets.” People log auth headers by accident — usually by serializing the entire request object during a hard-to-reproduce bug. Pino’s redact option is applied at the serialization layer; it does not depend on every developer remembering. Add password, token, card_number, ssn, and anything else that should not be in your log retention. The wildcard syntax (*.password) catches it at any depth.

A service, env, and version on every line. If you operate more than one service, you will eventually pipe all their logs into one stream. Without those three fields, you cannot tell which one emitted the line. Add them once in base and they show up on every log entry for free.

event as a snake_case constant. This is a convention, not a code change, but it pays back forever. Pick a vocabulary — charge_attempt, charge_succeeded, charge_failed, webhook_received, db_pool_exhausted — and use those exact strings as the event field. Now you can write dashboards on top of them (“count of charge_failed per minute, grouped by reason”) instead of grepping log message strings that someone will reword next quarter.

Five queries that turn this into a debugging tool

Once the fields are consistent, the queries are stupidly powerful. These work in any decent log aggregator (Datadog, Logtail, Grafana, even jq against a flat file).

1. The full story of one request. A customer pastes the x-request-id from the error page they screenshotted:

req_id:01HXYZ7K2P...

Every log line for that request — middleware, handler, database, downstream service — comes back ordered. No JOINs.

2. All errors in the last hour, grouped by event.

level:error AND time:[now-1h TO now] | stats count by event

You instantly see whether the spike is concentrated in one code path (a deploy regression, usually) or spread evenly across the service (an upstream outage, usually).

3. P99 duration per route.

event:request_completed | percentile(duration_ms, 99) by route

This is your own slow-endpoint dashboard, free, with a one-line config in any aggregator. No APM agent required.

4. The “weird user” filter. Every customer support ticket about a charging bug should resolve to:

user_id:4821 AND time:[2026-05-21 TO 2026-05-22] | sort by time

The reason this matters: when a single user reports a bug, half the time it is actually two bugs hiding behind the same complaint. The full timeline tells you how many.

5. Pre-incident baseline. Run weekly and stash the result:

level:warn OR level:error | stats count by event, service

If db_pool_exhausted rises from 4/week to 800/week, that is your incident, and you have caught it three days before it page-storms on-call.

Common mistakes and the fixes

A few things go wrong in practice that the toy version does not show.

Logging objects with circular references. Pino handles most of these, but a Mongoose document or a Bull queue job can have hidden circular structures that produce [Object] placeholders. Fix it once with a custom serializer — serializers: { user: (u) => ({ id: u.id, email: u.email }) } — and the field becomes both safe and consistent.

Log shipping that drops lines under load. If you pipe stdout to a sidecar, the sidecar’s buffer will overflow during a traffic spike. Pino writes synchronously by default, which protects ordering but blocks the event loop on a slow stdout. Under bursty load, switch to pino.transport({ target: 'pino/file', options: { destination: '/proc/1/fd/1' } }) or use Vector/Fluent Bit with a backpressure-aware shipper. The defaults are fine until you measure dropped lines; the moment you measure them, fix it.

Pretty-printing in production. A common mistake: developers love pino-pretty so much they leave it on in production. It uses ~3× more CPU than raw JSON and prevents your aggregator from parsing fields. Keep transport conditional on NODE_ENV, never on a feature flag.

Logging at info for things that should be debug. The temptation to log every database query at info is strong. Resist it. info is for events that matter to a human reading the timeline of a request: “request started,” “charge succeeded,” “external API responded.” debug is for noise you turn on per-environment. If your info line count grows linearly with traffic and you cannot afford the storage, every team eventually disables logging entirely — which is worse than verbose logging.

Logging the same event in multiple places. Pick one place per event. If you log charge_succeeded in the handler and again in the database hook and again in a downstream emitter, your dashboards get triple-counted and your ratio queries lie. The convention “one event field per logical event, emitted once” pays back forever.

Forgetting to log the err field correctly. Pino has a built-in serializer for objects under the err key — it captures message, stack, name, and any custom fields. Always pass errors as log.error({ err, ...context }, 'message'), never as log.error('failed: ' + err.message). The first version gives you a structured error object you can err.code:ECONNRESET-query against. The second is back to grepping strings.

The metric that proves it shipped

The number to track is “median time from incident report to first useful log query.” Before structured logging, that number is between fifteen minutes and two hours, depending on how big the haystack is and how good your search skills are. After structured logging plus correlation IDs, it should be under sixty seconds: paste the req_id from the error page, sort by time, read the timeline.

If you cannot measure that directly, a decent proxy is “percentage of incidents where the postmortem cites a specific log line versus general log behavior.” Before, most postmortems say “we saw elevated 5xx” — vague. After, they say “at 03:14:07 the request 01HXYZ... failed in charge_failed with reason card_declined_after_capture, which then caused the duplicate write.” That level of specificity is what makes a postmortem actionable.

The takeaway

Most teams put off structured logging because it feels like a refactor. It isn’t. It is one new dependency, three small modules, and a one-week period of replacing console.log with getLogger() whenever you touch a file. After that, every incident gets faster to debug, every dashboard gets easier to build, and the next time a customer pastes you a request ID, you find the bug before they finish typing the follow-up email.

The version of this you build under pressure, the night of an outage, is always worse than the version you build on a quiet Tuesday. Build it on the Tuesday.

A note from Yojji

The kind of plumbing this post describes — observability that holds up under real load, request-scoped context that survives the messy parts of an async stack, dashboards that surface the right number on the right day — is the unglamorous middle of every reliable backend. It is also the kind of work Yojji has been shipping since 2016.

Yojji is an international custom software development company with offices across Europe, the US, and the UK. Their teams specialize in the JavaScript stack (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and microservices architecture. They run dedicated, senior outstaffed teams for long-running engagements, plus full-cycle product work covering discovery, design, development, QA, and DevOps.

If your team would rather hire the practice of building observable, debuggable services than learn it the hard way after a 3 a.m. page, Yojji is worth a conversation.