Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance

Your users table has 10 million rows. Two hundred thousand of those users are active. The rest were soft-deleted over the years, their deleted_at timestamp quietly marking them as gone.

You look up a user by email: SELECT * FROM users WHERE email = '[email protected]' AND deleted_at IS NULL. You have an index on email. The query planner says “Index Scan” and returns in 2 ms. You ship it.

Six months later the table has 50 million rows. The same query takes 45 ms. The index is 2 GB. Your working set no longer fits in RAM. What changed? The index on email now covers 50 million rows, even though your application never queries deleted users. Postgres is crawling through a graveyard to find the living.

This is the soft-delete performance cliff, and it is predictable. The fix is a partial index: an index that only includes the rows your application actually queries. Not a new table, not a partitioning scheme, not a materialized view. Just a WHERE clause in your CREATE INDEX statement.

This post shows the exact syntax, the planner proof that it works, the write-amplification trade-off, and the three ways teams accidentally neutralize them.

The soft-delete trap

Most ORMs default to soft deletes. Rails, Django, Laravel, Prisma, TypeORM (all of them can add a deleted_at column and hide rows from SELECT queries automatically). The generated SQL looks innocent:

SELECT * FROM users
WHERE email = '[email protected]'
  AND deleted_at IS NULL
LIMIT 1;

The problem is the index. A standard index on email covers every row in the table, including the 9.8 million deleted ones. When Postgres executes the query, it walks the index to find [email protected], then checks deleted_at IS NULL for each match. If Alice was deleted three years ago, Postgres finds her in the index, fetches the heap tuple, sees deleted_at is set, and discards the row. The work was already done.

As the table grows, the index grows linearly with total history, not with active data. The shared_buffers cache that held the entire index in memory six months ago now only holds a fraction. Every lookup starts touching disk.

The fix is to tell Postgres exactly what you already told your ORM: “we only query rows where deleted_at IS NULL.”

Creating a partial index

CREATE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;

That WHERE clause is the entire trick. The index only contains entries for rows where deleted_at IS NULL. If 98% of your table is soft-deleted, the index is roughly 98% smaller than the full index. On disk, it is not 2 GB. It is 40 MB.

More importantly, the index is structurally unable to contain deleted rows. When Postgres plans the query WHERE email = ? AND deleted_at IS NULL, it recognizes that the partial index satisfies both conditions. It never even considers deleted rows.

Here is the before-and-after EXPLAIN (ANALYZE, BUFFERS) on a 50M-row table with 2M active rows:

Full index:

Index Scan using idx_users_email on users
  Index Cond: (email = '[email protected]')
  Filter: (deleted_at IS NULL)
  Rows Removed by Filter: 1
  Buffers: shared hit=12 read=3
  Time: 42.315 ms

Partial index:

Index Scan using idx_users_email_active on users
  Index Cond: (email = '[email protected]')
  Buffers: shared hit=2
  Time: 0.142 ms

The difference is not just size. The partial index is a covering index for the query condition. There is no Filter line because the index itself enforces the predicate. The planner does not need to check the heap to reject deleted rows. The deleted rows are not in the index at all.

When partial indexes beat partitioning

Some teams solve this by partitioning: one partition for active rows, one for deleted. Partitioning works, but it is schema surgery. Foreign keys become complicated. Some queries need to scan all partitions. Your ORM may not support it cleanly.

Partial indexes are a zero-downtime, zero-migration alternative. They live in the same table. They do not break foreign keys. They do not require application changes. The only requirement is that your query predicate matches the index predicate exactly.

The three ways teams break partial indexes

Partial indexes are not fire-and-forget. Three mistakes make them invisible to the planner.

1. The predicate does not match

Postgres matches partial indexes with literal equality. If your index is WHERE deleted_at IS NULL, but your ORM generates WHERE deleted_at IS NULL AND archived = false, the planner will not use the partial index because the index does not guarantee archived = false.

Fix: make the partial index match the query.

CREATE INDEX idx_users_email_active_unarchived
ON users (email)
WHERE deleted_at IS NULL AND archived = false;

If your ORM always adds both conditions, the index must include both. The cost is a slightly larger index, but it is still far smaller than a full index.

2. Functions and casts in the predicate

If your query uses a function that transforms the column, the planner cannot match the index unless the index uses the exact same expression. This is where partial indexes overlap with expression indexes.

-- Query generated by an ORM using lower() for case-insensitive lookup
SELECT * FROM users
WHERE lower(email) = '[email protected]'
  AND deleted_at IS NULL;

-- Wrong: planner cannot use this
CREATE INDEX idx_users_email_active ON users (email)
WHERE deleted_at IS NULL;

-- Right: match the expression exactly
CREATE INDEX idx_users_lower_email_active
ON users (lower(email))
WHERE deleted_at IS NULL;

The partial index predicate and the index expression must both match the query.

3. Parameterized queries with generic plans

Prepared statements and ORM query builders sometimes use generic plans that do not inline the literal NULL check. If the planner sees deleted_at = $1 where $1 might be NULL or a timestamp, it cannot prove the query satisfies deleted_at IS NULL. It falls back to the full index or a seq scan.

Fix: ensure the ORM generates a literal IS NULL in the SQL, or set plan_cache_mode = force_custom_plan if generic plans are ruining partial index usage. Most modern ORMs do the right thing here, but it is worth verifying with EXPLAIN.

Measuring the win

Size comparison is immediate:

SELECT
  pg_size_pretty(pg_relation_size('idx_users_email')) AS full_index,
  pg_size_pretty(pg_relation_size('idx_users_email_active')) AS partial_index;

On a table with 50M rows and 2M active rows, you will see something like 2145 MB versus 86 MB.

For query performance, use EXPLAIN (ANALYZE, BUFFERS, TIMING) before and after. Look for:

Filter: (deleted_at IS NULL) disappearing
Buffers: shared read= dropping toward zero
Execution time dropping from double-digit milliseconds to sub-millisecond

For cache pressure, monitor pg_stat_user_indexes over time. The idx_blks_hit versus idx_blks_read ratio on the partial index should be higher than the full index because the working set fits in memory.

Beyond soft deletes: status columns and time ranges

Soft deletes are the most common case, but partial indexes work anywhere your queries repeatedly filter on the same predicate.

Status columns. An orders table with status = 'pending', 'shipped', 'delivered' will have queries that always look for pending orders. A partial index on WHERE status = 'pending' stays tiny even after millions of completed orders accumulate.

Time ranges. A notifications table where the application only queries rows from the last 30 days can use:

CREATE INDEX idx_notifications_user_recent
ON notifications (user_id, created_at)
WHERE created_at > now() - interval '30 days';

This is not a perfect filter (the predicate uses a volatile function, so the index only contains rows that satisfied the condition at creation time), but for append-only time-series data it is often good enough. For stricter bounds, combine it with table partitioning on created_at and use the partial index inside each partition.

Boolean flags. An is_published flag on an articles table, an is_verified flag on a users table, or an is_archived flag on a projects table all create the same pattern: a small active set and a large historical set. If your queries always include AND is_archived = false, the partial index belongs there.

The pattern is always the same: identify the filter that appears in every hot query, move it into the index predicate, and verify that the planner agrees.

When the planner still says no: a debug checklist

Sometimes you build the partial index, run EXPLAIN, and Postgres still scans the full index or the table. Here is the checklist:

Run ANALYZE on the table. Partial indexes rely on up-to-date statistics. If the planner thinks the table is tiny because autovacuum has not run, it may prefer a seq scan. Force stats with ANALYZE users;.
Check for implicit casts. If the query compares a text column against a varchar parameter, or an integer against a bigint, the planner may see the predicate as a function call and refuse the match. Make the types line up.
Look for OR conditions. WHERE deleted_at IS NULL OR is_admin = true cannot use a partial index on deleted_at IS NULL because the index does not contain rows where is_admin = true but deleted_at is set. Rewrite to a UNION or accept the full scan.
Verify the index is valid. A failed CREATE INDEX CONCURRENTLY can leave an invalid index that the planner ignores. Check pg_index for indisvalid = false.
Check for enable_indexscan = off or custom planner settings. Some monitoring tools or ORMs disable index scans for testing. Reset with SET enable_indexscan = on;.

The write-amplification trade-off

Partial indexes are not free. Every UPDATE or DELETE that touches the indexed columns must maintain the index. But here is the key: partial indexes maintain fewer entries. An update to a deleted row does not touch the partial index at all because the row is not in it. An update to an active row touches both the full index and the partial index, but the full index was already being maintained.

The real cost is the extra index on INSERT. When you insert a new active row, Postgres must write to both indexes. On high-throughput insert workloads (logging, events, telemetry), adding a partial index on a hot table can add 10-20% write latency. If your table is insert-heavy and query-light, a partial index may not pay for itself.

The rule: partial indexes are for tables where reads dominate, where a clear subset of rows is queried repeatedly, and where the query predicate is stable.

Composite partial indexes

Single-column partial indexes are the intro. The production version is usually composite:

CREATE INDEX idx_orders_user_status_created
ON orders (user_id, created_at)
WHERE status = 'pending';

This index is tiny (only pending orders) and perfectly answers:

SELECT * FROM orders
WHERE user_id = 123
  AND status = 'pending'
ORDER BY created_at DESC;

The planner will use it for index-only scans if the selected columns are in the index or the table is visible enough. Even with heap fetches, the scan is bounded to the small pending set.

Dropping the old index

Do not leave the full index in place after the partial index is working. Postgres will happily maintain both, and the query planner may even pick the full index if its statistics are slightly off. Verify with EXPLAIN that the partial index is chosen, then:

DROP INDEX idx_users_email;

This frees disk space and removes write amplification. Keep the partial index.

Summary

Partial indexes are the closest thing Postgres has to a “view over an index.” They are small, fast, and require no application changes. The only requirement is honesty about what your application actually queries.

If your table has soft deletes, status flags, or archival columns, and your queries always filter on them, a partial index is almost always the right answer. Build it, verify it with EXPLAIN (ANALYZE, BUFFERS), drop the bloated full index, and move on.

A note from Yojji

The kind of database optimization work that turns a routine lookup into a sub-millisecond query (partial indexes, planner verification, and honest predicate analysis) is exactly the kind of backend engineering Yojji’s teams build into the systems they ship for clients.