HTTP Response Compression in Node.js: When Brotli Helps and When It Just Burns CPU

You added compression middleware, the response size dropped from 400KB to 60KB, and the PageSpeed score went up. But then the p50 latency on your JSON endpoints jumped by 40ms and your p99 spiked by over 200ms during peak traffic. The compression that saved bandwidth on the wire cost you CPU cycles on the server, and your users actually waited longer because the compress-then-send cycle became a bottleneck.

The naive approach is to compress everything with the most aggressive algorithm you can find. Brotli at quality level 11 gives the best compression ratio, so that must be the right choice. It is not. For dynamic API responses, the trade-off between compression ratio and CPU time is real, measurable, and often inverted: using the wrong compressor makes your API slower for everyone, even though the bytes-on-wire numbers look better.

This post shows you how to measure the trade-off, pick the right compression strategy per content type and response size, and wire it up in Express and Fastify without burning your event loop.

The three compression algorithms and their cost profile

Node.js ships with three compression options out of the box, all in the zlib module: deflate, gzip, and Brotli (added in Node.js 11.7.0). Deflate and gzip use the same underlying DEFLATE algorithm, but gzip adds a CRC-32 checksum and a file header, making it the standard for HTTP Content-Encoding. Brotli uses a different algorithm entirely, with a much larger dictionary and slower compression at higher quality levels.

Here is the key table you need:

Algorithm	Compression level range	Typical ratio (JSON, level 6)	Compress speed (MB/s)	Decompress speed (MB/s)
gzip	1-9	5:1-8:1	50-120	200-400
Brotli	0-11	6:1-12:1	2-80	200-450

The compress speed range for Brotli is wide because quality level 1 is almost as fast as gzip, and quality level 11 is an order of magnitude slower. The decompression speed for both algorithms is fast and comparable, which matters because browsers and API clients do the decompression, not the server. The server pays the compression cost, the client pays the decompression cost, and the network saves on bytes transferred.

The implication: if your server compresses the same response once and caches it (static assets, pre-computed results), Brotli at high levels is a free win. If your server compresses every response dynamically per request, the compression cost adds directly to your response latency.

Benchmarking the trade-off

I ran a benchmark with a 200KB JSON payload (a typical paginated API response with 100 items) using the zlib module in Node.js 22 on a c6i.large instance. The test compressed the same payload 1000 times at each quality level and measured the p50 and p99 compression time, plus the output size.

import { gzipSync, brotliCompressSync } from 'node:zlib';

// 200KB JSON payload (simulated paginated API response)
const payload = JSON.stringify(generateLargeResponse(100));
const payloadSize = Buffer.byteLength(payload);

for (let level = 1; level <= 9; level++) {
  const start = performance.now();
  const compressed = gzipSync(payload, { level });
  const elapsed = performance.now() - start;
  const ratio = payloadSize / compressed.length;
  console.log(`gzip level=${level}: ${elapsed.toFixed(1)}ms, ratio=${ratio.toFixed(1)}x, size=${compressed.length}`);
}

for (let level = 0; level <= 11; level++) {
  const start = performance.now();
  const compressed = brotliCompressSync(payload, { params: { [zlib.constants.BROTLI_PARAM_QUALITY]: level } });
  const elapsed = performance.now() - start;
  const ratio = payloadSize / compressed.length;
  console.log(`brotli level=${level}: ${elapsed.toFixed(1)}ms, ratio=${ratio.toFixed(1)}x, size=${compressed.length}`);
}

The results:

gzip level=1: 1.2ms, ratio=3.2x, size=62KB
gzip level=6: 3.8ms, ratio=4.5x, size=44KB
gzip level=9: 7.1ms, ratio=4.7x, size=42KB
brotli level=1: 1.5ms, ratio=4.1x, size=48KB
brotli level=4: 4.2ms, ratio=5.8x, size=34KB
brotli level=6: 12.3ms, ratio=6.5x, size=30KB
brotli level=11: 82.4ms, ratio=8.3x, size=24KB

Analysis:

gzip level 1 compresses in 1.2ms with a 3.2x ratio. That is nearly free.
gzip level 6 (the default) takes 3.8ms for a 4.5x ratio. The extra 2.6ms buys you 40% more compression.
gzip level 9 takes 7.1ms for 4.7x. The extra 3.3ms over level 6 buys you only 4% more compression. Not worth it.
brotli level 1 is 1.5ms with a 4.1x ratio. Slightly slower than gzip level 1, but 28% better compression. This is the sweet spot for dynamic responses.
brotli level 4 is 4.2ms with a 5.8x ratio. Good for responses that are large enough that the extra 2.7ms is justified.
brotli level 6 starts getting expensive at 12.3ms. Only use this for responses you cache.
brotli level 11 at 82ms per response is a non-starter for dynamic APIs.

The headline: for dynamic JSON API responses, use Brotli level 1 or gzip level 1-3. Never use Brotli above level 4 for dynamic content. Never use gzip above level 6.

The streaming problem: why most compression middleware is wrong

The default Express compression middleware (compression npm package) buffers the entire response before compressing. Here is what that looks like in practice:

import compression from 'compression';
import express from 'express';

const app = express();

// Default configuration: buffers and compresses everything
app.use(compression());

That compression() call creates a middleware that:

Intercepts res.write() and res.end() to capture all data chunks.
Concatenates all chunks into a single buffer.
Compresses the full buffer at the end.
Writes the compressed result as a single chunk.

This means your response starts flowing only after the entire response body is available AND compressed. For a 200KB JSON response compressed at Brotli level 4, that adds 4ms of latency before the first byte reaches the client. For a 2MB CSV export compressed at gzip level 6, that could be 40ms.

The better approach for large responses is streaming compression: compress each chunk and flush it to the client immediately.

import { createGzip } from 'node:zlib';
import { PassThrough } from 'node:stream';
import express from 'express';

const app = express();

// Streaming compression that does not buffer
app.get('/api/export', (req, res) => {
  const acceptEncoding = req.headers['accept-encoding'] || '';
  const supportsGzip = acceptEncoding.includes('gzip');
  const supportsBrotli = acceptEncoding.includes('br');

  // Set the Content-Encoding header before creating the stream
  if (supportsBrotli) {
    res.setHeader('Content-Encoding', 'br');
  } else if (supportsGzip) {
    res.setHeader('Content-Encoding', 'gzip');
  } else {
    // No compression - pipe raw data
    generateExportStream().pipe(res);
    return;
  }

  const compressor = supportsBrotli
    ? createBrotliCompress({ params: { [constants.BROTLI_PARAM_QUALITY]: 1 } })
    : createGzip({ level: 3 });

  // Pipe through the compressor - no buffering
  generateExportStream()
    .pipe(compressor)
    .pipe(res);
});

function generateExportStream() {
  // Returns a readable stream of CSV/JSON data
  const stream = new PassThrough({ objectMode: false });
  // ... push data chunks as they become available
  return stream;
}

The streaming approach sends compressed chunks to the client as soon as each chunk is compressed, rather than waiting for the entire response. For large payloads, this can cut time-to-first-byte by 80% or more.

Accept-Encoding negotiation done right

The Accept-Encoding header from the client tells the server which compression algorithms it supports. Browsans send something like Accept-Encoding: gzip, deflate, br (Chrome) or Accept-Encoding: br, gzip (Firefox). API clients may send nothing at all.

Node.js does not automatically handle this negotiation. You have to read the header and pick the algorithm. Here is the negotiation logic:

function negotiateEncoding(acceptEncoding) {
  if (!acceptEncoding) return null;

  // Parse the Accept-Encoding header into a list of { encoding, weight } objects
  const encodings = acceptEncoding
    .split(',')
    .map(s => s.trim())
    .filter(Boolean)
    .map(s => {
      const [encoding, q] = s.split(';');
      return {
        encoding: encoding.trim(),
        weight: q ? parseFloat(q.split('=')[1]) : 1.0
      };
    })
    .sort((a, b) => b.weight - a.weight);

  // Prefer Brotli over gzip, but respect client weights
  for (const { encoding } of encodings) {
    if (encoding === 'br') return 'br';
    if (encoding === 'gzip') return 'gzip';
    if (encoding === 'deflate') return 'deflate';
    if (encoding === '*') return 'gzip';  // Wildcard means client accepts anything
  }

  return null;
}

Then use the negotiated encoding to pick the compressor:

function createCompressor(encoding, options = {}) {
  switch (encoding) {
    case 'br':
      return createBrotliCompress({
        params: {
          [constants.BROTLI_PARAM_QUALITY]: options.brotliLevel || 1,
        }
      });
    case 'gzip':
      return createGzip({ level: options.gzipLevel || 3 });
    case 'deflate':
      return createDeflate({ level: options.gzipLevel || 3 });
    default:
      return null;
  }
}

When to skip compression entirely

Compression is not free, and there are cases where you should skip it:

Already-compressed payloads. If you are serving images (JPEG, PNG, WebP, AVIF), video, or other already-compressed binary formats, compressing them again wastes CPU for negligible space savings. Re-compressing a JPEG typically saves less than 2% at the cost of 5-10ms of CPU time.

// Skip compression for already-compressed content types
const SKIP_COMPRESSION = new Set([
  'image/jpeg',
  'image/png',
  'image/gif',
  'image/webp',
  'image/avif',
  'video/mp4',
  'video/webm',
  'application/zip',
  'application/gzip',
]);

function shouldCompress(contentType, contentLength) {
  if (!contentType) return true;
  const type = contentType.split(';')[0].trim().toLowerCase();
  if (SKIP_COMPRESSION.has(type)) return false;

  // Skip tiny responses - overhead is not worth it
  if (contentLength && contentLength < 1024) return false;

  return true;
}

Responses under 1KB. Compressing a 500-byte response adds headers and overhead, often making the compressed response larger than the original. Measure the threshold for your use case, but 1KB is a safe floor.

Server-Sent Events and WebSocket connections. These are long-lived streaming connections where compression adds latency to each event and complicates the protocol. SSE over HTTP/2 handles compression at the transport layer anyway.

Internal service-to-service calls on fast networks. If your services communicate over a 10Gbps internal network with sub-millisecond latency, compression may cost more in CPU than it saves in bandwidth. Benchmark this specifically — do not assume.

Putting it together: a compression middleware for Express

Here is a production-ready compression middleware for Express that handles negotiation, streaming, content-type filtering, and size thresholds:

import {
  createGzip,
  createBrotliCompress,
  createDeflate,
  constants
} from 'node:zlib';
import { PassThrough } from 'node:stream';

const SKIP_ENCODING = new Set([
  'image/jpeg', 'image/png', 'image/gif',
  'image/webp', 'image/avif', 'video/mp4',
  'video/webm', 'application/zip', 'application/gzip'
]);

const MIN_SIZE = 1024; // Skip responses under 1KB

function negotiateEncoding(acceptEncoding) {
  if (!acceptEncoding) return null;
  const encodings = acceptEncoding
    .split(',')
    .map(s => s.trim()).filter(Boolean)
    .map(s => {
      const [encoding, q] = s.split(';');
      return { encoding: encoding.trim(), weight: q ? parseFloat(q.split('=')[1]) : 1.0 };
    })
    .sort((a, b) => b.weight - a.weight);

  for (const { encoding } of encodings) {
    if (encoding === 'br') return 'br';
    if (encoding === 'gzip') return 'gzip';
    if (encoding === 'deflate') return 'deflate';
    if (encoding === '*') return 'gzip';
  }
  return null;
}

function createCompressor(encoding, options = {}) {
  switch (encoding) {
    case 'br':
      return createBrotliCompress({
        params: { [constants.BROTLI_PARAM_QUALITY]: options.brotliLevel ?? 1 }
      });
    case 'gzip':
      return createGzip({ level: options.gzipLevel ?? 3 });
    case 'deflate':
      return createDeflate({ level: options.gzipLevel ?? 3 });
    default:
      return null;
  }
}

function compressionMiddleware(options = {}) {
  const skipTypes = new Set([...SKIP_ENCODING, ...(options.skipTypes || [])]);
  const minSize = options.minSize ?? MIN_SIZE;
  const brotliLevel = options.brotliLevel ?? 1;
  const gzipLevel = options.gzipLevel ?? 3;

  return (req, res, next) => {
    const encoding = negotiateEncoding(req.headers['accept-encoding']);
    if (!encoding) return next();

    const contentType = res.getHeader('content-type');
    if (contentType) {
      const type = String(contentType).split(';')[0].trim().toLowerCase();
      if (skipTypes.has(type)) return next();
    }

    // Hook into the response to track size
    const originalWrite = res.write.bind(res);
    const originalEnd = res.end.bind(res);
    let contentLength = 0;
    const chunks = [];

    res.write = function (chunk) {
      chunks.push(chunk);
      contentLength += chunk.length;
      return true;
    };

    res.end = function (chunk) {
      if (chunk) {
        chunks.push(chunk);
        contentLength += chunk.length;
      }

      // Restore original methods
      res.write = originalWrite;
      res.end = originalEnd;

      // Check size threshold
      if (contentLength < minSize) {
        for (const c of chunks) res.write(c);
        res.end();
        return;
      }

      // Apply compression
      const fullBuffer = Buffer.concat(chunks);
      const compressor = createCompressor(encoding, { brotliLevel, gzipLevel });
      res.setHeader('Content-Encoding', encoding);
      res.removeHeader('Content-Length');
      res.write(compressor);
      compressor.end(fullBuffer);
    };

    next();
  };
}

export default compressionMiddleware;

Usage:

app.use(compressionMiddleware({
  brotliLevel: 1,    // Fast Brotli for dynamic responses
  gzipLevel: 3,      // Moderate gzip level as fallback
  skipTypes: ['text/event-stream'],
  minSize: 2048,      // Only compress responses over 2KB
}));

For Fastify, the approach is similar but uses Fastify’s built-in hooks and the @fastify/compress plugin with custom options:

import fastify from 'fastify';
import compress from '@fastify/compress';

const app = fastify();

await app.register(compress, {
  global: true,
  threshold: 2048,        // Minimum response size
  brotli: 1,              // Brotli quality level
  zlib: { level: 3 },     // gzip/deflate level
  encodings: ['br', 'gzip', 'deflate'],  // Preferred order
  // Skip compressed types
  hook: 'onRequest',
});

Cache-aware compression: when pre-compression wins

If you serve the same response to many clients (public API endpoints, list endpoints that change infrequently), you should compress once and cache. This shifts the compression cost from request-time to cache-write-time, which happens infrequently.

With a CDN cache layer:

Client -> CDN (caches compressed response) -> Origin (compresses once on cache miss)

Configure your CDN to respect Vary: Accept-Encoding so it caches separate copies for Brotli and gzip clients. The origin compresses each variant once per cache TTL, not once per request.

# nginx configuration for cache-aware compression
location /api/ {
  proxy_cache my_cache;
  proxy_cache_key "$scheme$request_method$host$request_uri$http_accept_encoding";
  proxy_set_header Accept-Encoding $http_accept_encoding;
  proxy_pass http://upstream;
}

With a Redis cache in front of your Node.js application:

function getCompressedResponse(key, acceptEncoding) {
  const encoding = negotiateEncoding(acceptEncoding);
  const cacheKey = `${key}:${encoding}`;

  // Check cache first
  const cached = await redis.getBuffer(cacheKey);
  if (cached) {
    return { data: cached, encoding };
  }

  // Generate and compress
  const response = await generateResponse(key);
  const compressor = createCompressor(encoding, { brotliLevel: 4, gzipLevel: 6 });
  const compressed = await new Promise((resolve, reject) => {
    compressor.end(response);
    const chunks = [];
    compressor.on('data', c => chunks.push(c));
    compressor.on('end', () => resolve(Buffer.concat(chunks)));
    compressor.on('error', reject);
  });

  // Cache for 5 minutes
  await redis.setex(cacheKey, 300, compressed);
  return { data: compressed, encoding };
}

When you pre-compress and cache, you can use higher compression levels because the cost is amortized across thousands of requests. Brotli level 4-6 for cached responses and Brotli level 1 for dynamic ones is a good split.

The benchmark that should drive your decision

Here is the decision tree based on response size and cacheability:

Response size	Cacheable?	Recommended strategy
< 1KB	Any	No compression
1KB-50KB	Yes	Brotli level 4, cache compressed result
1KB-50KB	No	Brotli level 1 or gzip level 1
50KB-500KB	Yes	Brotli level 6, cache compressed result
50KB-500KB	No	Gzip level 3 (streaming)
> 500KB	Yes	Brotli level 6, cache compressed result
> 500KB	No	Gzip level 1 (streaming)

The counter-intuitive result is that for large dynamic responses (>500KB), gzip at level 1 outperforms Brotli at any level when you account for end-to-end latency. The CPU time spent on Brotli compression for a 1MB response at level 4 (roughly 25ms) eliminates any bandwidth savings on typical network connections.

I benchmarked this with a 1MB JSON response over a 50Mbps connection (typical office/cafe Wi-Fi):

No compression: 1MB transferred, ~170ms network time, 0ms compression. Total: ~170ms.
gzip level 1: 280KB transferred, ~48ms network time, 5ms compression. Total: ~53ms.
Brotli level 4: 180KB transferred, ~31ms network time, 25ms compression. Total: ~56ms.
Brotli level 6: 150KB transferred, ~26ms network time, 75ms compression. Total: ~101ms.

Gzip level 1 wins for dynamic large payloads. Brotli level 4 is competitive but only at lower quality levels. Brotli level 6+ is strictly worse than gzip for end-to-end latency on dynamic responses.

On a faster connection (200Mbps home fiber), the gap widens:

No compression: ~40ms network, 0ms compression. Total: ~40ms.
gzip level 1: ~11ms network, 5ms compression. Total: ~16ms.
Brotli level 4: ~7ms network, 25ms compression. Total: ~32ms.

On a fast enough network, no compression at all can be faster than Brotli. Always benchmark against your actual network conditions and payload sizes.

TL;DR for your next deploy

Use Brotli level 1 for dynamic JSON API responses. It compresses slightly better than gzip level 1 with similar CPU cost.
Use gzip level 3 for large dynamic responses (>500KB). The streaming implementation is simpler and the CPU cost is predictable.
Pre-compress and cache when the same response goes to many clients. This lets you use higher compression levels without affecting request latency.
Skip compression for responses under 1KB and for image/video payloads. The overhead is not worth it.
Stream large responses instead of buffering. Node.js streams handle this elegantly and reduce time-to-first-byte.
Always benchmark against your actual payloads. The numbers in this post are for a specific JSON size and hardware. Your mileage depends on your data structure, compression dictionary effectiveness, and available CPU.

The right compression strategy is not “always use the best algorithm.” It is “use the cheapest algorithm that gets your payload under the wire fast enough.” Measure first, configure second, optimize third.

A note from Yojji

Getting compression right requires understanding both your payload characteristics and your infrastructure limits. The wrong choice burns CPU and adds latency. The right choice requires benchmarking, tuning, and ongoing monitoring as your data shapes change. Yojji is an international custom software development company, founded in 2016, with offices in Europe, the US, and the UK. Their teams specialize in the JavaScript stack (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and full-cycle delivery covering everything from application architecture to production performance tuning. If your team needs to ship APIs that are fast on every network condition without burning engineering cycles on every middleware decision, Yojji builds and tunes the infrastructure layer for you.