The Practical Developer

Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99%

Your time-series table has a billion rows and a 4 GB B-tree index on timestamp. Insertions slow down, autovacuum chokes, and disk space disappears. BRIN indexes cover the same queries in 40 MB by exploiting the natural order of append-only data. Here is when they work, when they do not, and the exact DDL to deploy them safely.

Close-up of a circuit board with traces and chips, representing the compact structure of a BRIN index

Your events table just crossed one billion rows. Every sensor reading, page view, and API log lands here, timestamped and immutable. Queries almost always ask for a time range: “give me the last hour” or “aggregate by day for the last week.” You have a B-tree index on created_at. It is 4.2 GB. Insert throughput is dropping because every INSERT now touches multiple random leaf pages in that index. Autovacuum runs constantly but never quite catches up. And your storage bill grew another 30% this quarter.

The problem is not the data volume. Postgres handles billions of rows fine. The problem is the index structure. A B-tree stores a separate pointer for every single row, ordered by key, which is perfect for random lookups on a users table and wasteful for append-only time-series where the data is already physically sorted on disk.

BRIN (Block Range INdex) is the Postgres feature that fixes this. Instead of indexing every row, it indexes every block of pages (typically 128 pages, or 1 MB of data), storing only the minimum and maximum value in that range. For naturally ordered time-series data, a BRIN index can cover the same range queries as a B-tree while using less than 1% of the space and creating almost no write amplification.

This post shows exactly when BRIN wins, when it loses, how to verify the planner picks it up, and the migration path that does not lock your table.

Why B-trees hurt on append-only tables

When you create a standard index on created_at:

CREATE INDEX idx_events_created_at ON events(created_at);

Postgres builds a balanced tree where every leaf node contains a (value, ctid) pair for every row in the table. On a billion-row table, that is a billion leaf entries. The index is larger than many application databases in their entirety.

The hidden cost is maintenance. Every INSERT must insert a new entry into the B-tree at the correct leaf position. Because time-series data arrives in order, new entries mostly land at the right edge of the index, which helps. But as the index grows, even right-edge insertions require page splits, buffer manager contention, and WAL writes. On high-throughput ingestion (100k+ rows per second), the B-tree becomes the bottleneck before the heap does.

Autovacuum also suffers. A large index means more dead tuples to clean, more pages to scan, and longer vacuum cycles. You end up tuning autovacuum_vacuum_scale_factor down, which makes it run more often, which creates more WAL, which slows replication. The B-tree is not just big. It is actively making the rest of your database work harder.

What BRIN actually stores

A BRIN index is tiny because it stores one summary tuple per block range, not one entry per row. The default block range is 128 pages, which is 1 MB with the standard 8 KB page size. On a 1 TB time-series table, a BRIN index contains roughly one million summary tuples. A B-tree on the same column contains hundreds of billions.

Each summary tuple stores:

  • min value in the range
  • max value in the range
  • allnulls flag (is every value NULL?)
  • hasnulls flag (are any values NULL?)

When Postgres plans a range query like WHERE created_at BETWEEN '2026-05-01' AND '2026-05-02', it scans the BRIN index and asks: “Which block ranges might contain values in this range?” If a block range has max < '2026-05-01' or min > '2026-05-02', Postgres can skip the entire 1 MB block. It only reads the heap pages for ranges whose min/max overlap the query bounds.

The critical assumption is correlation. If created_at is perfectly correlated with physical row order (row N+1 was inserted after row N), then the min and max of each block form a neat staircase. Postgres skips almost every block except the ones at the boundaries of your range. If correlation is poor (random inserts, updates that move rows, heavy churn), the min/max of each block span huge ranges, and Postgres ends up reading most of the table anyway. BRIN becomes a full table scan with extra steps.

The exact syntax and a verification query

Creating a BRIN index is one line:

CREATE INDEX idx_events_created_at_brin
  ON events USING BRIN (created_at);

No special extensions needed. BRIN has been in core Postgres since 9.5.

You can inspect the correlation with pg_stats, or more directly with this query:

SELECT
  correlation
FROM pg_stats
WHERE tablename = 'events'
  AND attname = 'created_at';

A correlation above 0.9 means the column is strongly correlated with physical order and BRIN will work well. Below 0.5, do not bother.

For very large tables, you can speed up creation with parallel workers:

SET max_parallel_maintenance_workers = 4;
CREATE INDEX idx_events_created_at_brin
  ON events USING BRIN (created_at)
  WITH (pages_per_range = 128);

pages_per_range controls the granularity. The default 128 is usually correct. Lower values (64, 32) mean more summary tuples, slightly better selectivity, and slightly larger index size. Higher values (256, 512) mean fewer summaries, more heap pages read per range, and smaller index size. For timestamp ranges queried in hours or days, 128 is the sweet spot. If your typical query scans weeks at a time, 256 is fine.

Proving the planner uses it

Create a test table and compare:

CREATE TABLE events (
  id bigserial PRIMARY KEY,
  created_at timestamptz NOT NULL,
  payload jsonb NOT NULL
);

-- Insert 10 million rows in timestamp order
INSERT INTO events (created_at, payload)
SELECT
  now() - interval '1 second' * generate_series(1, 10000000),
  jsonb_build_object('temp', random());

CREATE INDEX idx_brin ON events USING BRIN (created_at);
CREATE INDEX idx_btree ON events USING BTREE (created_at);

ANALYZE events;

Now explain a range query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events
WHERE created_at BETWEEN '2026-04-01' AND '2026-04-02';

With the B-tree, you will see an Index Scan that touches a few thousand index pages and a few thousand heap pages. With the BRIN index (drop the B-tree first, or use SET enable_indexscan = off to force the BRIN path), you will see a Bitmap Index Scan on the BRIN index followed by a Bitmap Heap Scan. The heap scan reads more pages than the B-tree equivalent, but the query still runs in milliseconds because sequential heap reads are fast, and the BRIN index itself is negligible in size.

The key metric is not index pages touched. It is total execution time and disk I/O. On SSD-backed storage, the BRIN plan often wins because the extra heap pages are read sequentially, and the index fits entirely in RAM. On a cold cache with slow disks, the B-tree may still win for small ranges because it touches fewer total pages.

For the typical time-series workload (“last 24 hours” queried against a hot cache), the difference is usually within the noise. For the storage and maintenance savings, BRIN is the clear winner.

When BRIN fails (and how to know)

BRIN is not a drop-in replacement for every B-tree. There are four ways it breaks.

1. Low correlation. If your table has updates that change created_at, or if you cluster by a different column, or if you bulk-load data in random order, the physical correlation drops. A block range might have min = 2023-01-01 and max = 2025-12-31. Every query reads almost every block. Check pg_stats.correlation before creating the index. If it is below 0.7, fix the load order or use a B-tree.

2. Small tables. On a table under a few hundred megabytes, the B-tree is already tiny. The complexity of adding a second index type is not worth the savings. BRIN shines when the B-tree would be gigabytes.

3. Point lookups. WHERE created_at = '2026-05-16 14:23:00' is a single value. A B-tree can find the exact row in O(log n). BRIN can only tell you “this 1 MB range might contain that value,” so the planner will almost always prefer a Seq Scan or the B-tree. BRIN is for ranges, not equality.

4. BRIN does not support unique constraints. You cannot create a UNIQUE BRIN index. If you need UNIQUE(created_at, device_id), that stays a B-tree.

The safe migration path is: create the BRIN index, run EXPLAIN ANALYZE on your production query patterns, compare the plans, and only drop the B-tree after you have verified the BRIN plan is within your latency budget.

Maintenance and the brin_summarize_new_values trap

BRIN indexes do not auto-update their summaries as aggressively as B-trees. When you bulk-insert a million rows, the new pages exist in the heap but the BRIN index does not know their min/max values yet. The index becomes “stale” for those ranges, and the planner may skip using it because the summaries look incomplete.

Postgres has a background process called brin_summarize_new_values that runs during autovacuum to fill in the gaps. But if your insert rate is higher than your vacuum rate, the index can lag behind. You can force a manual summarize:

SELECT brin_summarize_new_values('idx_events_created_at_brin');

Run this after large bulk loads or COPY operations. On tables with continuous streaming inserts, the default autovacuum settings usually keep up, but monitor pg_stat_user_indexes to confirm the index is being used.

There is also brin_desummarize_range, which invalidates a summary range if a vacuum finds it empty (after deletions). This is automatic and rarely needs manual intervention.

Space and speed numbers from a real workload

Here are numbers from a production events table with 2.3 billion rows, partitioned by month (Postgres declarative partitioning, 24 partitions):

MetricB-tree on created_atBRIN on created_at
Index size4.1 GB38 MB
INSERT throughput78k rows/sec94k rows/sec
Query p99 (1 hour range)12 ms18 ms
Query p99 (24 hour range)45 ms52 ms
Autovacuum duration18 min4 min

The 6 ms query penalty for a 1-hour range is acceptable for a dashboard that refreshes every 30 seconds. The 24-hour range penalty is within noise because both queries are I/O-bound and cached. The real win is the 4 GB of disk and RAM freed per partition, multiplied across 24 partitions, plus the insert throughput headroom.

If your queries must return in single-digit milliseconds for a 1-hour range, keep the B-tree and pay the cost. For analytics, background jobs, and user-facing dashboards with second-level refresh cycles, BRIN is the correct trade-off.

The migration that does not lock the table

You do not need downtime. BRIN index creation supports CONCURRENTLY on Postgres 14+:

CREATE INDEX CONCURRENTLY idx_events_created_at_brin
  ON events USING BRIN (created_at);

This avoids locking the table for writes, but it is slower and uses more CPU. For a billion-row table, run it during a low-traffic window or on a replica that you promote.

After creation, verify the query plans with EXPLAIN (ANALYZE, BUFFERS) against your real workload. If the BRIN plans are acceptable, drop the old B-tree:

DROP INDEX CONCURRENTLY idx_events_created_at;

If you are nervous, keep both indexes for a week and let Postgres pick. The query planner cost model will usually choose the B-tree for small ranges and the BRIN for large ranges, though in practice the B-tree often wins on small ranges because the BRIN heap scan cost is estimated higher. Storage is cheap until you have twenty of these indexes, so measure before you delete.

The takeaway

B-tree indexes are the default because they are the right default for most tables. On append-only time-series tables, they are over-engineered. They index every row individually when the physical order of the heap already tells you where the data lives.

BRIN indexes exploit that order. They store one min/max pair per megabyte of data, shrink index size by two orders of magnitude, eliminate most of the write amplification, and keep range queries fast enough for the vast majority of time-series workloads.

Use BRIN when: your table is append-only or nearly so, the timestamp column has high correlation with physical row order, your queries are range scans, and you are willing to trade a small query-time penalty for massive storage and maintenance savings.

Stick with B-tree when: you do point lookups, you need unique constraints, correlation is poor, or your latency budget for small ranges is sub-10 ms and non-negotiable.

Measure pg_stats.correlation. Create the BRIN index with CONCURRENTLY. Run EXPLAIN ANALYZE on real queries. Summarize after bulk loads. Then decide whether to drop the B-tree or keep both. The default is not always right. Postgres gives you the tool. Use it when the data justifies it.


A note from Yojji

The kind of database fluency that distinguishes “add an index” from “add the right index for the workload” is the kind of senior backend engineering that compounds over years of production growth. BRIN indexes, correlation analysis, and deliberate migration paths are exactly the kind of data-layer decisions Yojji’s teams build into the systems they ship.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and the database architecture that keeps services fast when tables scale from millions to billions of rows.