Change Data Capture with PostgreSQL: Sync Your Search, Cache, and Analytics Without the Double-Write Tax
You write to Postgres, then write to Elasticsearch, then invalidate Redis, then hope nothing failed halfway through. CDC streams the WAL to every downstream consumer in order, with exactly-once semantics, and eliminates the dual-write pattern that corrupts data under load.
Your order service writes a new row to the orders table. Then it calls the search service to index the order so the customer can find it in their history. Then it publishes a message to the analytics pipeline. Then it invalidates the cached order count on the dashboard. Somewhere in that chain, the search call times out. The order is in Postgres but not in search. The customer refreshes the page, sees nothing, and opens a support ticket.
You have a dual-write problem. Every piece of data you care about lives in Postgres, but five other systems need a copy of it, and keeping those copies consistent is a distributed systems problem you did not sign up for. The application code that writes the order is now also responsible for orchestrating a fragile multi-step sync to Elasticsearch, Redis, BigQuery, and whoever else needs the data.
The fix is Change Data Capture (CDC): stream the write-ahead log (WAL) from Postgres and let downstream consumers read the changes directly, in order, without the application layer orchestrating anything.
This post shows what CDC looks like in practice, how to set it up with Postgres logical replication and the pg driver, the consumers you actually need to write, and the trap that makes most CDC pipelines silently lose data on schema changes.
What CDC actually is
Change Data Capture is a pattern that observes every insert, update, and delete on a database table and streams those changes to external systems. Instead of your application code calling multiple services after every write, the database emits a record of the change, and a consumer picks it up asynchronously.
There are two main approaches:
Polling-based CDC. Your consumer runs SELECT * FROM table WHERE updated_at > $last_checked every N seconds. Simple to build. Falls apart with deletes (the row is gone), renames (the user changes their name back and forth), and high-volume tables (the query gets slower as the table grows, and you miss rows between poll intervals unless you use FOR UPDATE SKIP LOCKED).
Log-based CDC. Your consumer connects to Postgres’s replication protocol and reads the write-ahead log directly. Every change is captured at the storage level: inserts, updates, deletes, even schema changes. No polling, no updated_at columns, no missed rows. This is the production-grade approach.
Postgres has supported logical replication since version 9.4 (released 2014). The feature is mature, well-documented, and does not require any extensions for basic use. What it gives you is a stream of changes decoded into a readable format (JSON, protobuf, or a relational representation) that you can consume from any language that speaks the Postgres wire protocol.
Why dual-writes are the default and why they break
Dual-writes are the path of least resistance. A controller creates a record, then fires off side effects:
async function createOrder(req: Request, res: Response) {
const order = await orderRepo.insert(req.body);
// Side effect 1: index for search
await searchClient.index('orders', order.id, order);
// Side effect 2: invalidate cache
await cache.del(`order-count:${order.userId}`);
// Side effect 3: send to analytics
await analytics.emit('order.created', order);
res.status(201).json(order);
}
Three side effects, three points of failure. If the search index call times out after 5 seconds, the whole request fails. The order is in Postgres but not in search. If the cache invalidation throws an exception, the next request sees a stale count. If analytics is down, the request handler crashes unless every side effect is wrapped in its own try/catch, and even then, you have to decide whether to swallow the error or retry.
Adding a fourth consumer means editing the controller. Adding a fifth means more surface area for failure. The application layer becomes an orchestration layer for data synchronization, and that is the wrong place for it.
CDC inverts the architecture. The controller writes one thing (the order to Postgres) and returns. The CDC pipeline picks up the change and delivers it to every consumer asynchronously. If Elasticsearch is down, the pipeline retries. If Redis is down, the pipeline retries. If the analytics pipeline is slow, it gets its own stream position and does not block anything else.
async function createOrder(req: Request, res: Response) {
const order = await orderRepo.insert(req.body);
res.status(201).json(order);
// That is it. The CDC pipeline handles the rest.
}
That is the architectural win. Not the specifics of the tooling, but the decoupling.
Setting up Postgres logical replication
Logical replication requires two Postgres configuration changes. Both can be set via ALTER SYSTEM or in postgresql.conf:
wal_level = logical
max_replication_slots = 5 # at least 1 per consumer
Set wal_level to logical and restart Postgres. Then create a publication for the tables you want to replicate:
CREATE PUBLICATION order_events
FOR TABLE orders, order_items, order_status_history;
A publication defines the set of tables whose changes you want to stream. You can publish all tables (FOR ALL TABLES) but scoping to specific tables makes the pipeline easier to reason about and reduces WAL decode overhead.
Each consumer creates a replication slot that tracks its position in the WAL:
SELECT pg_create_logical_replication_slot(
'order_consumer',
'pgoutput'
);
The slot holds the WAL position. If the consumer goes offline for an hour, the slot retains the position. When the consumer reconnects, it resumes from where it left off. The trade-off is that the WAL cannot be recycled past the oldest slot position, so a dead consumer that never reconnects causes WAL bloat and eventually fills the disk. Monitor replication slot lag as a first-class alert.
The Node.js consumer
The pg driver supports the replication protocol natively through Client#replication. Here is a consumer that connects to the logical replication slot and processes changes as they arrive.
import { Client } from 'pg';
import { Transform, Writable } from 'node:stream';
interface ChangeEvent {
action: 'insert' | 'update' | 'delete';
table: string;
schema: string;
timestamp: bigint;
old?: Record<string, unknown>;
new?: Record<string, unknown>;
}
async function startConsumer(slotName: string) {
const client = new Client({
connectionString: process.env.DATABASE_URL,
// replication mode is required for logical replication
connection: { database: process.env.PGDATABASE },
});
await client.connect();
const stream = await client.replication.startLogical({
slotName,
plugin: 'pgoutput',
startLsn: '0/0', // '0/0' starts from the oldest available WAL
});
const parser = new Transform({
objectMode: true,
transform(raw: Buffer, _encoding, callback) {
const messages = parsePgoutput(raw);
for (const msg of messages) {
this.push(msg);
}
callback();
},
});
const consumer = new Writable({
objectMode: true,
highWaterMark: 64,
async write(event: ChangeEvent, _encoding, callback) {
try {
await handleChange(event);
callback();
} catch (err) {
// Log the error and acknowledge the LSN anyway.
// See the "error handling" section below.
console.error(`Failed to process event:`, err);
callback();
}
},
});
stream.pipe(parser).pipe(consumer);
return { client, stream, consumer };
}
The pgoutput plugin decodes the WAL into protocol buffer messages. The parser converts those into ChangeEvent objects your application can work with. Each event includes the table name, the operation type, and the before/after column values.
The consumer handlers
Each downstream system gets its own handler function, registered for the appropriate table and operation:
async function handleChange(event: ChangeEvent): Promise<void> {
switch (event.table) {
case 'orders':
if (event.action === 'insert' || event.action === 'update') {
await Promise.all([
searchClient.index('orders', event.new!.id, event.new),
cache.del(`order-count:${event.new!.user_id}`),
]);
} else if (event.action === 'delete') {
await Promise.all([
searchClient.delete('orders', event.old!.id),
cache.del(`order-count:${event.old!.user_id}`),
]);
}
break;
case 'order_items':
if (event.action === 'insert') {
await analytics.emit('item_purchased', event.new);
}
break;
case 'order_status_history':
await webhookClient.send('order.status_changed', event.new);
break;
}
}
Notice that handleChange does not orchestrate anything. It looks at the event and dispatches to the appropriate downstream systems. If the search index call fails, the error is logged but the pipeline moves on. The change is not lost; it is still in the WAL, and a separate retry mechanism (or a dead-letter queue) can reprocess it later.
The schema change trap
CDC pipelines have one sharp edge that bites every team eventually: schema changes. When you run ALTER TABLE orders ADD COLUMN discount numeric(10,2), the publication still publishes changes for the orders table. But the consumer’s ChangeEvent shape no longer matches. The new column appears in the event, and the handler that destructures event.new might silently ignore it or throw depending on how you access it.
The fix is to version your consumer schema or use a schema registry. The simplest production approach is to store the schema version alongside each event and let the consumer handle migration:
interface ChangeEvent {
action: 'insert' | 'update' | 'delete';
table: string;
schema: string;
schemaVersion: number; // bumped on every ALTER TABLE
timestamp: bigint;
old?: Record<string, unknown>;
new?: Record<string, unknown>;
}
When schemaVersion changes, the consumer pauses and runs a migration function that updates its internal column mapping. During the migration window, events are buffered in memory or written to a staging table. Once the migration completes, the consumer resumes from the buffered position.
For teams that cannot tolerate even a short pause, the more robust approach is to use a schema registry like Apicurio or Confluent Schema Registry alongside Protobuf-encoded WAL output. The schema registry maps each schema version to a numeric ID included in the message. Consumers fetch the latest schema from the registry when the ID changes and adapt on the fly. This is the Debezium approach, and it works well at scale, though it adds operational complexity.
Idempotency and ordering
CDC pipelines deliver changes in the order they were committed to the WAL, but downstream systems might not preserve that order naturally. If you update order 42’s status from “pending” to “shipped” and then to “delivered” within a second, both updates hit the search index in the right order on the wire. But if the search index client batches writes internally, the “delivered” update could land before the “shipped” update.
The fix is to include an LSN or commit timestamp in every event and let the consumer use it for ordering:
async function updateSearchIndex(event: ChangeEvent): Promise<void> {
const existing = await searchClient.get('orders', event.new!.id);
if (existing && existing._lsn >= event.lsn) {
// Already processed a newer or identical change
return;
}
await searchClient.index('orders', event.new!.id, {
...event.new,
_lsn: event.lsn,
});
}
The LSN is a monotonic position in the WAL. No two commits share the same LSN. By storing the last-processed LSN in the target document, the consumer becomes idempotent: it can safely retry or receive the same event twice.
When to add Debezium and Kafka
The Node.js-native approach above works well for teams with fewer than ten tables and moderate throughput (under 1,000 changes per second). Beyond that, three problems emerge:
-
Connection management. Each consumer opens its own replication connection. Ten consumers means ten replication slots and ten WAL decoder processes on the Postgres server. Postgres handles this fine until it does not, and the failure mode is WAL bloat that cascades to an outage.
-
Full snapshots. When you add a new consumer that needs to start from the beginning, you have to snapshot the entire table, then catch up on WAL changes that arrived during the snapshot. The two-phase snapshot (lock table, copy data, release lock, replay WAL) is tricky to get right, and a naive implementation holds locks longer than production can tolerate.
-
Schema evolution at scale. Mapping column renames and type changes across multiple consumers without a schema registry is tedious and error-prone.
Debezium solves all three. It runs as a set of Kafka Connect connectors, one per database. Each connector manages a single replication slot and publishes changes to a Kafka topic per table. Consumers read from Kafka topics instead of connecting to Postgres directly. The architecture looks like this:
Postgres WAL -> Debezium connector -> Kafka topic `dbserver1.public.orders` -> Consumer
Debezium handles full snapshots automatically (with snapshot.mode controlling whether it takes an initial snapshot, resumes from the last offset, or recovers from a schema-only state). It integrates with the Confluent Schema Registry for Protobuf or Avro schemas. It tracks schema changes as separate messages so consumers can adapt.
The cost is operational: you need Kafka and Kafka Connect in your infrastructure. For teams already running Kafka as a message bus, adding Debezium is a small operational increment. For teams that are not, the Node.js-native approach above is the right starting point.
The checklist that prevents silent data loss
Before you put a CDC pipeline in production, verify each of these:
-
wal_level = logicalis set on the primary. If you use a replica for CDC, ensurehot_standby_feedback = onso the replica’s WAL is not recycled prematurely. - Replication slots are monitored. Alert when slot lag exceeds 10 minutes. A disconnected consumer inflates the WAL and fills the disk.
- Every consumer stores its last-processed LSN in a durable store and resumes from it on restart.
- Schema changes are deployed with a consumer migration plan. Either version the consumer schema or use
ALTER TABLE ... ADD COLUMN IF NOT EXISTSto make column additions backward-compatible with old WAL events. - The consumer writes failed events to a dead-letter table before acknowledging them. Logging and moving on silently loses data.
- You have tested the pipeline with a schema change. Run
ALTER TABLE ... ADD COLUMN ...while the pipeline is running and verify the consumer adapts. - The consumer is idempotent. Replaying the same WAL position produces the same state in downstream systems.
Takeaway
CDC eliminates the dual-write pattern that makes application code brittle and downstream systems inconsistent. Instead of orchestrating five side effects per write, your controller writes once and the WAL stream handles distribution. The Postgres-native approach using pgoutput is production-ready for moderate throughput and straightforward to implement in Node.js. Teams that outgrow it graduate to Debezium and Kafka for schema registry integration, snapshot management, and operational isolation.
Dual-writes are the default because they are the obvious thing to type. CDC is the alternative because it is the correct architectural boundary between your database and everyone who needs its data.
A note from Yojji
Building a data pipeline that reliably streams production changes to search, cache, and analytics systems without losing events or corrupting state is the kind of infrastructure work that separates a prototype from a platform. It requires deep familiarity with Postgres replication internals, careful consumer design for idempotency, and operational monitoring to catch slot lag before it fills a disk. Yojji’s teams build this kind of data infrastructure regularly, from CDC pipelines and event-driven architectures to full-cycle product engineering across the JavaScript ecosystem and cloud platforms on AWS, Azure, and Google Cloud.
Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their senior engineering teams specialize in the JavaScript stack (React, Node.js, TypeScript), cloud-native deployments, and the kind of production-hardened backend architecture that makes data flow reliably between systems without the heroics.