NDJSON Streaming with Node.js: Send Partial Responses Before You Have the Full Payload
Your API endpoint builds a full JSON array before sending anything, so the first byte arrives after the last database query. Switch to newline-delimited JSON streaming and get the first item to the client in milliseconds, with backpressure handling, proper error recovery, and no extra dependencies.
Your API returns a list of 10,000 search results. The database returns the first matching row in 30 milliseconds. The last one arrives 1.7 seconds later. But your API endpoint waits for all of them, serializes the entire array into JSON, sets Content-Length, and then sends the response. The client sits on a loading spinner for two seconds before it can render anything.
This is the buffered response antipattern, and it is everywhere. Any endpoint that returns a collection built from database rows, paginated API calls, or computed results suffers from it. The fix is not to paginate smaller (though you should do that too). The fix is to stream the response as newline-delimited JSON (NDJSON), so the client gets usable data the moment the first row is ready.
I have seen this pattern cut median visual-render time from 1.8 seconds to 320 milliseconds on a search endpoint, with zero changes to the database query. The client starts painting results while the server is still fetching and serializing the remaining rows.
This post covers the server implementation in Node.js (plain http, Express, and Fastify), the client fetch handler that parses NDJSON, the error-recovery protocol, and the one case where you should not stream.
The problem with buffered JSON arrays
When you build a JSON array in memory and serialize it all at once, the response timeline looks like this:
Time 0ms --- Begin request
Time 30ms --- First DB row available (server waits)
Time 700ms --- All rows fetched
Time 1200ms --- JSON.stringify(rows) done
Time 1200ms --- First byte sent to client
Time 1200ms --- Client starts parsing (must download entire body first)
The client cannot do anything useful until JSON.parse succeeds, which requires the entire body. For a 5 MB response over a 10 Mbps mobile connection, that is 4 seconds of blank screen.
With NDJSON streaming, the timeline changes:
Time 0ms --- Begin request
Time 30ms --- First DB row available
Time 31ms --- First NDJSON line sent to client
Time 31ms --- Client parses first row, renders it
Time 32ms --- First row visible on screen
Time 50ms --- Second row available, sent, parsed, rendered
...
Time 700ms --- Last row sent
Time 701ms --- Client renders last row
The first paint happens in ~30ms instead of 1200ms+. The user sees results almost instantly, and more results stream in as the server finishes processing them.
What NDJSON looks like
Newline-delimited JSON is exactly what it sounds like: one JSON object per line, terminated by \n.
{"id": "row-1", "title": "First result", "score": 0.95}
{"id": "row-2", "title": "Second result", "score": 0.92}
{"id": "row-3", "title": "Third result", "score": 0.87}
No enclosing array brackets. No commas between objects. Each line is independently parseable. The client reads line by line and processes each row as it arrives.
The response headers signal the content type:
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
No Content-Length. HTTP/1.1 chunked transfer encoding lets the server write lines as they become available and close the connection when done.
Server implementation in plain Node.js
Here is the minimal streaming endpoint using Node’s built-in http module, with a simulated database cursor that yields rows one at a time:
import http from 'node:http';
import { Readable } from 'node:stream';
// Simulate a DB cursor that yields rows with a delay
async function* queryRows() {
for (let i = 0; i < 10000; i++) {
// Simulate fetching the next row (30ms each)
await new Promise((r) => setTimeout(r, 30));
yield { id: `row-${i}`, title: `Result ${i}`, score: Math.random() };
}
}
const server = http.createServer(async (req, res) => {
if (req.url === '/search' && req.method === 'GET') {
res.writeHead(200, {
'Content-Type': 'application/x-ndjson',
'Transfer-Encoding': 'chunked',
'Cache-Control': 'no-cache',
});
for await (const row of queryRows()) {
const line = JSON.stringify(row) + '\n';
if (!res.write(line)) {
// Backpressure: client is reading slower than we produce
await new Promise((resolve) => res.once('drain', resolve));
}
}
res.end();
return;
}
res.writeHead(404).end();
});
server.listen(3001);
The res.write() call sends each NDJSON line immediately. The res.once('drain') handler respects backpressure: if the client’s TCP receive buffer is full (slow consumer), we pause until it drains. Without this, an unbounded stream would heap-allocate data in the kernel buffer until the process runs out of memory.
Express with async iteration
Express does not natively support async request handlers returning streams cleanly, but you can write to the response object directly. Here is the same pattern in Express with a reusable streaming helper:
import express from 'express';
const app = express();
function ndjsonStream(res: express.Response, rows: AsyncIterable<unknown>) {
res.writeHead(200, {
'Content-Type': 'application/x-ndjson',
'Transfer-Encoding': 'chunked',
'Cache-Control': 'no-cache',
});
return (async () => {
for await (const row of rows) {
const line = JSON.stringify(row) + '\n';
if (!res.write(line)) {
await new Promise((resolve) => res.once('drain', resolve));
}
}
res.end();
})();
}
app.get('/search', async (req, res) => {
try {
const rows = queryRows(req.query.q as string);
await ndjsonStream(res, rows);
} catch (err) {
// If the stream has not started, send a proper error response.
if (!res.headersSent) {
res.status(500).json({ error: 'Stream failed to start' });
} else {
// Stream already started. Write an error line so the client
// can distinguish a partial result from a complete failure.
res.write(JSON.stringify({ error: 'Partial failure', message: String(err) }) + '\n');
res.end();
}
}
});
The error handling split matters. Before res.writeHead sends headers, you can still send a proper JSON error response. After the first res.write, the headers are already sent and you are committed to NDJSON. An error line in the stream lets the client decide what to do with the partial results instead of getting a truncated stream with no explanation.
Fastify with native stream support
Fastify handles this more cleanly because it natively supports returning streams and async generators from route handlers:
import Fastify from 'fastify';
const fastify = Fastify();
fastify.get('/search', async (req, reply) => {
reply.raw.writeHead(200, {
'Content-Type': 'application/x-ndjson',
'Transfer-Encoding': 'chunked',
'Cache-Control': 'no-cache',
});
for await (const row of queryRows(req.query.q as string)) {
const line = JSON.stringify(row) + '\n';
if (!reply.raw.write(line)) {
await new Promise((resolve) => reply.raw.once('drain', resolve));
}
}
reply.raw.end();
return reply.hijack(); // Tell Fastify not to send its own response
});
await fastify.listen({ port: 3002 });
The reply.hijack() call is critical. Without it, Fastify will try to send its own response after your handler returns, which produces a double-response error. Hijacking tells Fastify: “I handled the entire response lifecycle manually.”
Client: parsing NDJSON from a streaming fetch
The server side is only half the equation. The client needs to read the stream line by line as it arrives. Here is the browser-side handler and the Node.js fetch handler:
// Browser or Node.js 22+ with fetch
async function* streamNdjson(url: string): AsyncIterable<unknown> {
const response = await fetch(url);
if (!response.ok || !response.body) {
throw new Error(`Stream request failed: ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
const buffer: string[] = [];
function pushLine(line: string) {
if (line.length > 0) {
try {
buffer.push(JSON.parse(line));
} catch {
// Malformed line - log and skip
console.warn('Skipping malformed NDJSON line:', line);
}
}
}
let remainder = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = (remainder + chunk).split('\n');
// The last element may be incomplete - save as remainder
remainder = lines.pop() ?? '';
for (const line of lines) {
pushLine(line);
}
// Yield whatever we have accumulated
if (buffer.length > 0) {
yield buffer.splice(0, buffer.length);
}
}
// Process the final remainder if the stream did not end with \n
if (remainder.length > 0) {
pushLine(remainder);
}
if (buffer.length > 0) {
yield buffer.splice(0, buffer.length);
}
}
// Usage in the browser
(async () => {
const results = document.getElementById('results');
for await (const batch of streamNdjson('/search?q=nodejs')) {
for (const row of batch) {
const el = document.createElement('div');
el.textContent = `${row.title} (score: ${row.score.toFixed(2)})`;
results!.appendChild(el);
}
}
})();
The client yields batches of parsed rows, not single rows, so the UI can do a single DOM batch update per read cycle. On a fast connection each reader.read() may return multiple lines. Batching avoids layout thrash.
Error recovery: the partial-result protocol
Streaming introduces a problem that buffered responses do not have: what happens when the server crashes mid-stream? The client gets an incomplete set of results and cannot easily tell the difference between “the stream ended because all results were sent” and “the stream ended because the server died.”
The fix is a termination line that signals a clean end:
// Server: send a sentinel line before res.end()
res.write(JSON.stringify({ __done: true, totalRows: i + 1 }) + '\n');
res.end();
// Client: check for sentinel
for await (const batch of streamNdjson('/search')) {
for (const row of batch) {
if (row.__done) {
console.log(`Completed: ${row.totalRows} rows`);
continue; // Do not render this sentinel row
}
renderRow(row);
}
}
If the client reaches the end of the stream without seeing __done: true, it knows the stream was truncated and could retry or display a warning.
When NOT to stream NDJSON
Streaming NDJSON is not free. There are cases where a buffered JSON array is the right call.
The response is small. If you return fewer than 50 rows and the full payload fits in a single TCP packet, the overhead of chunked encoding and client-side streaming parsing is wasted complexity. Just send the array.
The client is not the browser. If your API consumer is another server that needs the entire dataset before processing (aggregations, batch exports, data pipelines), force them to request a buffered format explicitly via the Accept header or a ?format=json query parameter.
You need strong error guarantees. If a partial response is unacceptable (financial transactions, medical data), do not stream. The atomicity of a single JSON response is worth the latency. Send the full result or send nothing.
Your response is highly compressible. A single large JSON array compresses better with gzip than many small NDJSON lines because the compression dictionary can learn patterns across the entire payload. If bandwidth is the bottleneck, serve gzipped JSON arrays and paginate instead.
A production-ready NDJSON helper
Here is a self-contained helper you can drop into a project today:
import { type ServerResponse } from 'node:http';
type StreamResult = { __done: true; totalRows: number } | Record<string, unknown>;
export class NdjsonStreamer {
private headersSent = false;
private rowsWritten = 0;
constructor(private readonly res: ServerResponse) {}
start(): void {
this.res.writeHead(200, {
'Content-Type': 'application/x-ndjson',
'Transfer-Encoding': 'chunked',
'Cache-Control': 'no-cache',
});
this.headersSent = true;
}
async write(row: Record<string, unknown>): Promise<void> {
if (!this.headersSent) this.start();
this.rowsWritten++;
const line = JSON.stringify(row) + '\n';
if (!this.res.write(line)) {
await new Promise((resolve) => this.res.once('drain', resolve));
}
}
end(): void {
if (!this.headersSent) this.start();
const sentinel: StreamResult = { __done: true, totalRows: this.rowsWritten };
this.res.end(JSON.stringify(sentinel) + '\n');
}
abort(reason: string): void {
if (!this.headersSent) {
this.res.writeHead(500, { 'Content-Type': 'application/json' });
this.res.end(JSON.stringify({ error: reason }));
return;
}
this.res.write(JSON.stringify({ __error: true, message: reason }) + '\n');
this.res.end();
}
}
Usage:
app.get('/search', async (req, res) => {
const stream = new NdjsonStreamer(res);
try {
for await (const row of queryRows()) {
await stream.write(row);
}
stream.end();
} catch (err) {
stream.abort(String(err));
}
});
This handles header-sent detection (so you can call start() lazily), backpressure with drain, the sentinel line, and clean error recovery. It is 40 lines and solves the entire streaming contract.
The backpressure trap
The most common bug I see in NDJSON server implementations is ignoring the return value of res.write(). Node’s Writable stream has internal buffering. res.write() returns false when the internal buffer exceeds the highWaterMark (default 16 KB). If you ignore it and keep writing, data is buffered in memory until the process exhausts its heap.
The drain handler in the examples above is not optional. It is the difference between a stream that handles 10,000 rows and one that OOMs at 2,000. If you cannot await the drain (e.g., in a synchronous loop), pipe through a Transform stream with an appropriate highWaterMark instead:
import { Transform } from 'node:stream';
const serialize = new Transform({
objectMode: true,
highWaterMark: 100, // Max 100 rows buffered in memory
transform(row, _enc, callback) {
callback(null, JSON.stringify(row) + '\n');
},
});
Readable.from(queryRows())
.pipe(serialize)
.pipe(res);
The Transform stream’s highWaterMark limits how many rows are held in memory when the downstream (the client) is slow. Combined with backpressure from the TCP socket, this prevents unbounded memory growth without explicit drain handling.
The takeaway
Buffered JSON arrays are the default because they are the easiest thing to write. But any endpoint that returns a collection built from slower data sources (database queries, API calls, computed results) benefits from NDJSON streaming. The pattern is small: iterate, serialize, write, await drain, repeat. Add a sentinel line for error detection. Use the NdjsonStreamer helper or pipe through a Transform with a capped highWaterMark.
The first byte of the first row arrives in the same event-loop tick as the first database result. Everything after that is incremental rendering. Your users stop staring at loading spinners and start seeing results while your server is still working.
Before your next API review, run through this checklist:
- Does this endpoint return a collection that could be rendered incrementally?
- Is the response larger than a single TCP packet (roughly 1,500 bytes)?
- Would the user benefit from seeing partial results before the full response is ready?
- Have you handled backpressure with
drainorpipe()with a capped buffer? - Does the client know whether the stream completed successfully or was truncated?
If the answer to all five is yes, switch to NDJSON streaming. If the answer to the first three is no, keep the buffered array. You have both tools now. Use the one that fits the problem.
A note from Yojji
Streaming partial responses across an API is the kind of performance engineering that looks trivial in a demo and turns out to require careful backpressure handling, error recovery protocols, and client-side parsing logic to get right in production. Yojji’s engineering teams build this level of pragmatism into every API they design, whether it is a real-time search endpoint or a high-throughput event pipeline.
Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their full-cycle teams specialize in the JavaScript ecosystem, cloud-native infrastructure on AWS, Azure, and Google Cloud, and shipping production systems where the details of response encoding, backpressure, and error semantics are not afterthoughts. They are worth a conversation if your next project demands APIs that perform under real traffic.