The Practical Developer

Node.js Metrics with Prometheus: The Instrumentation That Stops You Guessing During an Outage

Logs show one request. Traces show a path. Metrics show the shape of the whole system. Here is the prom-client setup that turns your Node.js service from a black box into a dashboard you can read at 2 a.m., with the four metric types, the Express middleware, and the PromQL queries that actually predict failures.

Glowing server room with blue metrics dashboards, the kind of production visibility that turns a guessing game into hard data

The pager goes off at 02:17. The on-call engineer opens the logs and sees a wall of 500s. The trace sample rate is 1%, so there are maybe three spans to look at, all from twenty minutes ago. The dashboard shows a red line, but nobody knows what the line means because it was added by an intern six months ago and the tooltip says “api_latency_p95” with no unit. Is traffic up? Is the database slow? Is one pod OOMing?

The logs will not tell you the shape of the system; they tell you the shape of a single request. Traces show you a path, but only for the requests you sampled. The only thing that compresses a million requests into a single number is a metric, and most Node.js services ship none at all.

This post is the starter kit: prom-client, four metric types, an Express middleware that instruments every route, the PromQL queries that turn raw counters into actionable graphs, and the cardinality trap that destroys Prometheus servers if you get labels wrong.

Why logs alone fail during an incident

Logs are great for “what exactly happened to request 7f3a9c?” Metrics are for “what changed across the fleet at 02:17?” You cannot grep a million lines of JSON in the time it takes a customer to open a support chat. You cannot aggregate traces after the fact if you decided not to sample them.

A metric is a time series: a name, a set of labels, and a sequence of timestamped values. Because it is pre-aggregated inside your process, the cost to emit it is a memory increment. The cost to query it is a disk scan over a tiny index. This is why a Grafana dashboard can render a six-hour latency trend in 200 milliseconds, while the equivalent log query times out after thirty seconds.

Most teams have logs because Pino is easy. Most teams have traces because OpenTelemetry is automatic. Metrics are the third pillar that gets skipped because it feels like “ops work” instead of “code work.” It is not. It is code work, and it is thirty minutes of setup that pays off on the very first incident.

The library: prom-client

prom-client is the standard Prometheus client for Node.js. It maintains metrics in memory, exposes them on an HTTP endpoint, and handles everything from label validation to histogram bucket selection.

npm install prom-client

Import the registry and create the metrics you actually need before the server starts listening. Prometheus is a pull-based system; your metrics server waits for the scraper (usually Prometheus itself) to call /metrics every 15 or 30 seconds.

Counter: the workhorse

A counter only goes up, or resets to zero on restart. Use it for request totals, error totals, and any business event that accumulates over time.

import { Counter } from 'prom-client';

export const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status'],
});

Inside the Express middleware, increment it once per response on the finish event:

import type { Request, Response, NextFunction } from 'express';

export function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
  const start = process.hrtime.bigint();

  res.on('finish', () => {
    const status = res.statusCode.toString();
    const route = req.route?.path || req.path;
    httpRequestsTotal.inc({ method: req.method, route, status });

    const durationSec = Number(process.hrtime.bigint() - start) / 1e9;
    httpRequestDuration.observe({ method: req.method, route, status }, durationSec);
  });

  next();
}

Labels are powerful and dangerous. The combination of labels defines a unique time series. A counter with three labels that each have ten possible values creates one thousand series. A single counter with a user_id label creates one series per user and will melt your Prometheus server. The hard rule: never put an unbounded or high-cardinality value in a label. Status code, route, method, error type, and outcome (success/failure) are safe. User ID, session ID, request ID, email, or random GUID are not.

Histogram: the latency truth teller

A histogram counts observations into buckets. It is the right way to measure latency because it stores the distribution, not just the average.

import { Histogram } from 'prom-client';

export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
});

The default buckets in prom-client top out at ten seconds and start at five milliseconds. For a JSON API with a p95 under 200ms, most of your data lands in two buckets, which is useless. Tailor the buckets to your service. If your p99 is normally 120ms, use buckets at 10ms, 25ms, 50ms, 75ms, 100ms, 150ms, 200ms, 300ms, 500ms. Fine-grained buckets around your normal latency give you precision where it matters. The coarse buckets above that catch regressions.

Why not a Summary? Prometheus summaries calculate quantiles client-side using a sliding time window. They cannot be aggregated across pods. If pod A has a p99 of 80ms and pod B has a p99 of 400ms, there is no mathematically correct way to merge those two summaries into a fleet-wide p99. Histograms can be aggregated because Prometheus computes histogram_quantile() server-side from the bucket counts. Always use histograms for latency unless you have a single-instance deployment.

Gauge: the current state

Counters answer “how many.” Gauges answer “how many right now.” Active requests, memory, CPU, event loop lag, database connection pool size.

import { Gauge } from 'prom-client';

export const activeRequests = new Gauge({
  name: 'http_active_requests',
  help: 'Number of active HTTP requests',
});

export const eventLoopLag = new Gauge({
  name: 'nodejs_event_loop_lag_seconds',
  help: 'Event loop lag in seconds',
});

export const memoryHeapUsed = new Gauge({
  name: 'nodejs_heap_used_bytes',
  help: 'Used heap size in bytes',
});

Wire the active request gauge inside the same middleware:

export function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
  activeRequests.inc();
  const start = process.hrtime.bigint();

  res.on('finish', () => {
    activeRequests.dec();
    const status = res.statusCode.toString();
    const route = req.route?.path || req.path;
    httpRequestsTotal.inc({ method: req.method, route, status });

    const durationSec = Number(process.hrtime.bigint() - start) / 1e9;
    httpRequestDuration.observe({ method: req.method, route, status }, durationSec);
  });

  next();
}

For event loop lag, sample it on an interval using node:perf_hooks:

import { monitorEventLoopDelay } from 'node:perf_hooks';

const h = monitorEventLoopDelay({ resolution: 10 });

setInterval(() => {
  h.enable();
  eventLoopLag.set(h.mean / 1e9); // nanoseconds to seconds
  h.reset();
}, 10000);

For heap memory, read it directly from the V8 engine:

import { getHeapStatistics } from 'node:v8';

setInterval(() => {
  memoryHeapUsed.set(getHeapStatistics().used_heap_size);
}, 10000);

Do not update gauges on every request. That is wasteful. CPU and memory change on multi-second boundaries, and Prometheus scrapes every 15 seconds anyway.

Custom business metrics: the signal in the noise

Infrastructure metrics tell you when the service is on fire. Business metrics tell you when the business is on fire. A payment processor should expose:

export const paymentAttemptsTotal = new Counter({
  name: 'payment_attempts_total',
  help: 'Total payment attempts',
  labelNames: ['currency', 'outcome'],
});

export const itemsReservedTotal = new Counter({
  name: 'items_reserved_total',
  help: 'Total inventory reservations',
  labelNames: ['warehouse', 'outcome'],
});

Emission is one line at the point of action:

paymentAttemptsTotal.inc({
  currency: charge.currency,
  outcome: charge.success ? 'success' : 'declined',
});

These metrics are invaluable because they bridge engineering and operations. A 500 error rate might spike because the payment gateway changed its response format. The payment_attempts_total{outcome="gateway_error"} metric will show the exact second the format changed, while http_requests_total{status="500"} just tells you that something broke.

Keep business metrics in the same registry as infrastructure metrics. The Prometheus server does not care about the semantic difference, and having one scrape endpoint means one source of truth.

The /metrics endpoint

Create a dedicated route. It is common to run it on a separate port so the load balancer does not route public traffic to it.

import express from 'express';
import { register } from 'prom-client';

const metricsApp = express();

metricsApp.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

metricsApp.listen(9090, () => {
  console.log('Metrics server listening on :9090');
});

In cluster mode, each worker maintains its own counters. If your Prometheus server scrapes the load balancer and gets a different worker each time, the counters will jump around because workers reset at different times. Fix this with prom-client’s aggregation:

import { AggregatorRegistry } from 'prom-client';
import cluster from 'node:cluster';

const aggregatorRegistry = new AggregatorRegistry();

if (cluster.isPrimary) {
  metricsApp.get('/metrics', async (req, res) => {
    const metrics = await aggregatorRegistry.clusterMetrics();
    res.set('Content-Type', aggregatorRegistry.contentType);
    res.send(metrics);
  });
}

Or, simpler: run a single metrics server outside the workers and funnel metrics to it via IPC, or rely on Kubernetes to scrape each pod directly through service endpoints. The per-pod scrape is often the cleanest approach because you get per-instance statistics for free.

PromQL queries that turn data into answers

Metrics without queries are just numbers. Here are the four PromQL expressions you will use every week.

1. Request rate per route.

rate(http_requests_total[5m])

Wrap it with a route filter: rate(http_requests_total{route="/api/charges"}[5m]). This tells you traffic shape. A sudden drop often means an upstream routing change, not a code bug.

2. Error ratio.

rate(http_requests_total{status=~"5.."}[5m])
/
rate(http_requests_total[5m])

This is your SLI. Alert when it exceeds 0.01 (1%). The reason to divide rates instead of using raw increase is that rate normalizes for scrape interval and counter resets. Always rate your counters before doing math on them.

3. p99 latency.

histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

The key detail: sum by (le) aggregates the bucket counts across pods, then histogram_quantile estimates the percentile from the aggregated buckets. If you forget the by (le), Prometheus tries to compute a quantile per instance, which is rarely what you want for a fleet-wide dashboard.

4. Active request saturation.

http_active_requests

If you know your service handles roughly 100 concurrent requests before latency degrades, this gauge tells you how close you are to the cliff. It is a far better autoscaling signal than CPU for Node.js, because a single-threaded event loop can be saturated while CPU looks idle.

Testing that your metrics are correct

Metrics are code, and code should be tested. prom-client exposes register.getMetricsAsJSON() for exactly this.

import assert from 'node:assert/strict';
import test from 'node:test';
import { register } from 'prom-client';
import { httpRequestsTotal } from './metrics';

test('failed charge increments error counter', () => {
  register.removeSingleMetric('http_requests_total');
  httpRequestsTotal.reset();

  httpRequestsTotal.inc({ method: 'POST', route: '/api/charges', status: '500' });

  const metrics = register.getMetricsAsJSON();
  const counter = metrics.find((m) => m.name === 'http_requests_total');
  const value = counter.values.find(
    (v) => v.labels.status === '500'
  )?.value;

  assert.strictEqual(value, 1);
});

Test the middleware, too. Spin up an Express app, hit a route, scrape the metrics endpoint, and assert that the counter exists with value 1. It takes thirty lines and it prevents the “our metrics went to zero after the refactor” surprise.

The cardinality trap

The most common way to destroy a Prometheus setup is to put a high-cardinality label on a high-frequency metric.

Bad:

  • httpRequestsTotal.inc({ user_id: req.user.id })
  • httpRequestsTotal.inc({ request_id: req.id })
  • httpRequestsTotal.inc({ search_query: req.query.q })

Each unique combination of labels is a time series. Prometheus stores every series in memory. Ten thousand users hitting your API once per minute creates ten thousand new series every minute. The server runs out of RAM, the disk fills with WAL files, and scrape times explode.

Safe labels: method, route, status, outcome, error_type, region, environment. Unsafe labels: anything with more than a few hundred distinct values, anything unique per request, anything user-supplied.

If you really need per-user analytics, ship an event to a data warehouse or analytics pipeline. Metrics are for aggregates, not individuals.

What to ship today

You do not need a metrics server, a Grafana instance, and twenty dashboards to get value. Here is the minimal path:

  1. Install prom-client.
  2. Add http_requests_total (Counter) and http_request_duration_seconds (Histogram) with the Express middleware.
  3. Expose /metrics on a side port.
  4. Run one PromQL query locally with curl to verify it works.
  5. Add the error-ratio query to your existing alerting path, even if that path is just a cron job that curls /metrics and pages when the count looks wrong.

The next time the pager fires at 02:17, you will open a graph that shows request rate, error ratio, and latency in the same view. You will know within ten seconds whether the problem is traffic, a bad deploy, or a downstream dependency. That is the difference between observability and hope.


A note from Yojji

The kind of work this post describes (turning raw runtime data into metrics that survive traffic spikes, instrumenting services so incidents become graphs instead of guesses, and wiring the right alerts before the outage happens) is the backbone of every reliable production system. It is also the kind of engineering Yojji ships as standard practice.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their teams specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and microservices architecture. If your team needs senior backend engineers who treat observability as a feature, not an afterthought, Yojji is worth a conversation.