Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal

The UPDATE latency on our orders table jumped from 4 ms to 180 ms over the course of a week. Transaction volume was flat. CPU and disk I/O were flat. The query plan was a simple Index Scan using orders_pkey followed by Update on orders. No N+1. No missing index. The latency was pure write amplification: every single UPDATE was forcing Postgres to create new versions of every index tuple, the indexes were bloating faster than autovacuum could clean them, and the table pages were packed so tightly that new row versions had nowhere to land on the same page.

This is the story of how we learned that Postgres does not “just update a row.” It versions rows via MVCC, maintains indexes per version, and has a specific optimization (HOT updates) that only works under narrow conditions. When HOT fails, every UPDATE writes N+1 tuples: one new heap tuple plus one new tuple in every index. For a table with five indexes, one UPDATE becomes six physical writes. Tuning fillfactor and understanding when HOT applies is how you stop that amplification at the source.

What Postgres actually does on UPDATE

Postgres uses Multi-Version Concurrency Control (MVCC). An UPDATE does not overwrite the existing row in place. It marks the old row as dead and inserts an entirely new row version on the heap somewhere. This is why Postgres does not need read locks for consistency: readers see a snapshot of rows that existed at the start of their transaction.

The catch is that every index stores a pointer (a ctid, or physical tuple ID) to the heap location of every indexed version of the row. When a new row version is created, every index that covers the table needs a new entry pointing to the new version. That is why an UPDATE on a table with five indexes does not do five writes. It does five index writes plus one heap write.

In the worst case, an UPDATE that changes an indexed column forces a new entry in every single one of those indexes even if the column value did not change. The old index entries point to the old heap tuple, which is now dead. The new index entries point to the new heap tuple. Postgres cannot reuse the old index entries because the ctid changed.

HOT updates: the optimization most teams do not know they lost

HOT stands for Heap-Only Tuple. It is a Postgres optimization that avoids writing new index tuples when an UPDATE only changes non-indexed columns and the new row version can fit on the same page as the old one.

Here is how it works. When a HOT update happens, Postgres inserts the new row version on the same table page as the old version and chains them together using a special pointer. The indexes still point to the old ctid, but that old tuple now has a forwarding pointer to the new version on the same page. When an index scan reaches that ctid, Postgres follows the chain to find the visible version. No new index entries are written. No index bloating. One heap write, zero index writes.

The three conditions for a HOT update are strict:

The update does not touch any indexed columns. If the column is in any index (including composite indexes and partial indexes), HOT is out.
The new row version fits on the same page as the old one. If the page is full, Postgres must put the new version on a different page, which changes the ctid and breaks the chain.
The table does not have a UNIQUE constraint on the updated columns. There are edge cases with unique indexes where Postgres conservatively avoids HOT.

The second condition is where fillfactor comes in. By default, Postgres packs table pages to 100% capacity on a fresh INSERT or VACUUM FULL. If every page is full, there is no room for a HOT update to place the new version next to the old one. Every UPDATE becomes a non-HOT update, and all your indexes start bloating immediately.

Diagnosing whether HOT is working

You do not need pg_stat_statements to see this. The first signal is usually in pg_stat_user_tables:

SELECT
  relname AS table_name,
  n_tup_upd,
  n_tup_hot_upd,
  CASE WHEN n_tup_upd > 0
    THEN ROUND(n_tup_hot_upd::numeric / n_tup_upd * 100, 2)
    ELSE 0
  END AS hot_rate_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY n_tup_upd DESC;

A healthy HOT rate on an update-heavy table should be 80% or higher. We saw orders at 12%. That meant 88% of our updates were writing five new index tuples each. With five indexes, that is 4.4 physical index writes per UPDATE instead of zero.

Next, check whether HOT updates are even possible for the columns you are updating. Run this to see which indexes cover which columns:

SELECT
  t.relname AS table_name,
  i.relname AS index_name,
  array_agg(a.attname ORDER BY array_position(ix.indkey, a.attnum)) AS indexed_columns,
  ix.indisunique AS is_unique
FROM pg_index ix
JOIN pg_class t ON t.oid = ix.indrelid
JOIN pg_class i ON i.oid = ix.indexrelid
JOIN pg_attribute a ON a.attrelid = t.oid AND a.attnum = ANY(ix.indkey)
WHERE t.relname = 'orders'
GROUP BY t.relname, i.relname, ix.indisunique
ORDER BY t.relname, i.relname;

If you are running UPDATE orders SET status = 'shipped', updated_at = now() WHERE id = $1, and status or updated_at is indexed, HOT is impossible for that statement. The fix is either to drop the index on updated_at (it is usually not query-selective enough to justify the write cost) or to redesign the access pattern.

The most common mistake is indexing updated_at for sorting. If you need “latest 20 orders” frequently, a composite index on (user_id, created_at DESC) is usually enough. A standalone index on updated_at is just a tax on every write.

Fillfactor: leaving room for HOT on the page

Fillfactor controls how tightly Postgres packs rows into table pages. It defaults to 100 for tables, which means INSERT and VACUUM FULL fill pages completely. For tables that receive frequent updates, that default is aggressive and wrong.

When fillfactor is set to 70, Postgres leaves 30% of each page empty. That slack space is reserved for HOT updates. If an UPDATE changes only non-indexed columns, the new row version can land in that empty space on the same page. HOT succeeds. No index writes.

The trade-off is that 30% of your table storage is now air. For read-mostly tables, that is wasted disk. For tables with hundreds of updates per second, it is a bargain. You trade 30% more table pages for 70-90% fewer index writes.

Set it with:

ALTER TABLE orders SET (fillfactor = 70);

This does not rewrite the table. It only affects future INSERT, UPDATE, and VACUUM operations. To benefit immediately, you need to rewrite the table so existing pages are re-packed at 70%:

-- Method 1: VACUUM FULL (blocks the table; do this during maintenance)
VACUUM FULL orders;

-- Method 2: pg_repack (online, no long locks)
-- pg_repack -d production -t orders

After running VACUUM FULL or pg_repack, inspect pgstattuple to confirm the pages have slack:

CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('orders');

Look at tuple_percent versus free_percent. On a table with fillfactor = 70, tuple_percent should be roughly 70% or lower after a rewrite. If it is still 90%+, the rewrite did not happen.

Measuring index bloat in real time

HOT updates are the best defense against index bloat, but you still need to measure whether the defense is holding. Use the standard bloat query (simplified here for clarity):

WITH btree_index_atts AS (
  SELECT n.nspname, c.relname, i.relname AS index_name,
         c.reltuples, c.relpages, psai.attnum
  FROM pg_index pi
  JOIN pg_class c ON c.oid = pi.indrelid
  JOIN pg_namespace n ON n.oid = c.relnamespace
  JOIN pg_class i ON i.oid = pi.indexrelid
  CROSS JOIN LATERAL unnest(pi.indkey) WITH ORDINALITY AS psai(attnum, ord)
  WHERE c.relname = 'orders' AND pi.indisvalid
)
SELECT
  index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  CASE WHEN relpages > 0 THEN ROUND(100 * (relpages - (reltuples * (avg_width + 8) / (current_setting('block_size')::int - 24)))::numeric / relpages, 1)
       ELSE 0
  END AS bloat_pct
FROM pg_index
JOIN pg_class ON pg_class.oid = pg_index.indexrelid
WHERE indrelid = 'orders'::regclass
ORDER BY pg_relation_size(indexrelid) DESC;

A more practical approach is to install pgstattuple and use pgstatindex:

CREATE EXTENSION IF NOT EXISTS pgstattuple;

-- For each index on the table:
SELECT * FROM pgstatindex('orders_pkey');
SELECT * FROM pgstatindex('orders_status_idx');

Look at avg_leaf_density. On a healthy B-tree index that is mostly HOT-updated, this should be 70-90%. When it drops below 50%, you have significant bloat. The index is using twice as many pages as it needs, which means twice as many disk reads for the same range scan.

Our orders_status_idx was at 31% avg_leaf_density. It was 340 MB and should have been 110 MB. The fix was not to REINDEX. The fix was to make HOT updates possible again by dropping the updated_at index (which we were not using for actual query plans) and setting fillfactor = 70. After a pg_repack and two days of production traffic, avg_leaf_density climbed to 78% and UPDATE latency dropped from 180 ms back to 4 ms.

The per-statement smoke test

Before you commit to a fillfactor change, prove that HOT is actually possible for your workload. Create a test table, populate it, and inspect whether updates are HOT:

-- Create a test table with the same index pattern as production
CREATE TABLE test_orders (
  id bigserial PRIMARY KEY,
  user_id bigint NOT NULL,
  status text NOT NULL DEFAULT 'pending',
  metadata jsonb,
  created_at timestamptz NOT NULL DEFAULT now(),
  updated_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX idx_test_orders_user_id ON test_orders(user_id);
CREATE INDEX idx_test_orders_status ON test_orders(status);

-- Insert some rows
INSERT INTO test_orders (user_id, status, metadata)
SELECT g, 'pending', '{"sku": "abc"}'
FROM generate_series(1, 100000) AS g;

-- Reset stats
SELECT pg_stat_reset_single_table_counters('test_orders'::regclass);

-- Update a non-indexed column (metadata is not indexed)
UPDATE test_orders SET metadata = '{"sku": "xyz"}' WHERE id <= 10000;

-- Check HOT rate
SELECT n_tup_upd, n_tup_hot_upd,
  ROUND(n_tup_hot_upd::numeric / NULLIF(n_tup_upd, 0) * 100, 2) AS hot_pct
FROM pg_stat_user_tables
WHERE relname = 'test_orders';

In this case, hot_pct should be 100 because metadata is not in any index. Now drop and recreate with an index on metadata, run the same update, and watch hot_pct drop to zero.

CREATE INDEX idx_test_orders_metadata ON test_orders USING gin (metadata);
SELECT pg_stat_reset_single_table_counters('test_orders'::regclass);
UPDATE test_orders SET metadata = '{"sku": "xyz"}' WHERE id <= 10000;
-- hot_pct will now be 0

That zero is the cost of your GIN index on a jsonb column that changes frequently. Whether it is worth it depends on whether you query metadata more often than you update it. Most teams add the GIN index because JSONB is “flexible” and never audit how many writes it taxes.

Setting fillfactor for the right tables only

Do not set fillfactor = 70 globally. It is a per-table knob. Apply it selectively:

Tables with > 20% UPDATE/DELETE ratio relative to INSERT. If a table is append-only, you are wasting 30% of your disk for no benefit.
Tables where updates change only a small subset of columns. If every UPDATE changes indexed columns, HOT is impossible regardless of fillfactor.
Tables with 3+ indexes. The write amplification of non-HOT updates compounds with each index.

After changing fillfactor and rewriting, monitor the HOT rate and bloat density for a week. If bloat is still growing faster than autovacuum can reclaim, tighten your autovacuum_vacuum_scale_factor for that table:

ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05);

This tells Postgres to trigger autovacuum when 5% of the table is dead tuples instead of the default 20%. The threshold is checked after every transaction, so lowering it increases vacuum frequency. That is usually fine if your table is small enough that a vacuum takes seconds, not minutes. For very large tables, use autovacuum_vacuum_threshold with a base row count instead.

What `VACUUM` can and cannot fix

VACUUM reclaims dead tuples and marks their space as reusable. It does not shrink the table file on disk. It does not rebuild indexes. It also cannot turn a non-HOT update into a HOT update retroactively. If you had a week of non-HOT updates because your pages were full, VACUUM frees the dead tuples but the index entries are already bloated. You need REINDEX (online with Postgres 12+ via REINDEX INDEX CONCURRENTLY) or a table rewrite to recover the space.

The sequence for fixing a bloated, HOT-starved table is:

Drop unnecessary indexes (especially on updated_at).
Set fillfactor = 70 (or tune to your actual slack needs).
Rewrite the table (pg_repack for zero-downtime).
Rebuild bloated indexes (REINDEX INDEX CONCURRENTLY idx_name).
Lower autovacuum_vacuum_scale_factor to prevent re-bloat.
Monitor pg_stat_user_tables and pgstatindex weekly.

A practical takeaway

The next time someone says “our database is slow,” do not reach for connection pooling or a bigger instance. Run the HOT rate query. If it is below 50%, you are paying a write-amplification tax on every update. Index bloat is a symptom. The disease is either indexing columns that change too often, or packing table pages so tightly that Postgres has nowhere to chain a HOT update. Fix the disease and the symptoms disappear.

A note from Yojji

The kind of work this post describes (reading pg_stat_user_tables to discover that 88% of your writes are amplified, deciding whether a GIN index on a churn-heavy JSONB column is worth the cost, and tuning fillfactor so HOT updates stay on the same page) is the unglamorous database engineering that keeps production fast without inflating the infrastructure bill. It is also exactly the kind of backend craft Yojji’s senior engineers bring to the full-cycle products they build.