Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips

The product needs to render a comment thread. Each comment has a parent_id. The naive approach: fetch the top-level comments, then for each one fetch its children, then for each of those fetch grandchildren. For a 200-comment thread, that’s 200+ round trips. The page takes 4 seconds to load.

Recursive Common Table Expressions (CTEs) are the SQL feature that handles this in one query. Org charts, comment threads, file trees, “find all dependencies of dependency X” — all the same shape, all solvable by the same query pattern.

This post is the working pattern, applied to the four common shapes (descendants, ancestors, paths, totals), with the performance considerations that decide whether it scales to thousands or millions of rows.

The pattern

A recursive CTE has two parts: a base case and a recursive step. They are unioned together:

WITH RECURSIVE tree AS (
  -- Base case: the top of the tree.
  SELECT id, parent_id, name, 0 AS depth
  FROM nodes
  WHERE parent_id IS NULL

  UNION ALL

  -- Recursive step: find children of nodes already in `tree`.
  SELECT n.id, n.parent_id, n.name, t.depth + 1
  FROM nodes n
  JOIN tree t ON n.parent_id = t.id
)
SELECT * FROM tree ORDER BY depth, id;

The base case finds the root. The recursive step looks up children of nodes already in the result. Postgres keeps applying the recursive step until no new rows are produced. The result is the entire tree.

UNION ALL is faster than UNION (the latter deduplicates). Only use UNION if you have actual duplicates, which a tree shouldn’t.

Pattern 1: Find all descendants of a node

A comment thread under one root comment:

WITH RECURSIVE descendants AS (
  SELECT id, parent_id, body, 0 AS depth
  FROM comments
  WHERE id = $1               -- the root comment

  UNION ALL

  SELECT c.id, c.parent_id, c.body, d.depth + 1
  FROM comments c
  JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants ORDER BY depth, id;

200 comments returned in one round trip instead of 200.

Pattern 2: Find all ancestors

The reverse — given a leaf, walk up:

WITH RECURSIVE ancestors AS (
  SELECT id, parent_id, name
  FROM nodes
  WHERE id = $1               -- the starting node

  UNION ALL

  SELECT n.id, n.parent_id, n.name
  FROM nodes n
  JOIN ancestors a ON a.parent_id = n.id
)
SELECT * FROM ancestors;

Useful for breadcrumbs, permission inheritance (“does this user have access via any parent group?”), category-tree navigation.

Pattern 3: The full path as a string

Often you want not just “the row” but “the path from root to it”:

WITH RECURSIVE tree AS (
  SELECT id, parent_id, name, name::text AS path
  FROM nodes
  WHERE parent_id IS NULL

  UNION ALL

  SELECT n.id, n.parent_id, n.name, t.path || ' > ' || n.name
  FROM nodes n
  JOIN tree t ON n.parent_id = t.id
)
SELECT id, path FROM tree;

Result:

id   path
1    Engineering
2    Engineering > Backend
3    Engineering > Backend > Platform
4    Engineering > Frontend

This is the materialized-path pattern computed on the fly. For frequently-queried paths, store them denormalized; for occasional queries, the recursive CTE is fine.

Pattern 4: Aggregating up the tree

Sum the budget of every department, including all sub-departments:

WITH RECURSIVE descendants AS (
  SELECT id, id AS root_id FROM departments
  UNION ALL
  SELECT d.id, root.root_id
  FROM descendants root
  JOIN departments d ON d.parent_id = root.id
)
SELECT root_id, sum(budget) AS total_budget
FROM descendants
JOIN departments ON departments.id = descendants.id
GROUP BY root_id;

Each row in descendants is “this descendant belongs to this root.” The aggregation rolls them up.

Cycle detection

A buggy data set might have cycles (A -> B -> A). A recursive CTE without cycle detection runs forever and consumes memory.

Postgres 14+ supports CYCLE:

WITH RECURSIVE tree AS (
  SELECT id, parent_id FROM nodes WHERE id = $1
  UNION ALL
  SELECT n.id, n.parent_id FROM nodes n JOIN tree t ON n.parent_id = t.id
) CYCLE id SET is_cycle USING path
SELECT * FROM tree;

CYCLE id SET is_cycle USING path tells Postgres to track visited IDs and stop when a cycle is detected.

For older versions:

WITH RECURSIVE tree AS (
  SELECT id, parent_id, ARRAY[id] AS path
  FROM nodes WHERE id = $1
  UNION ALL
  SELECT n.id, n.parent_id, t.path || n.id
  FROM nodes n
  JOIN tree t ON n.parent_id = t.id
  WHERE NOT (n.id = ANY(t.path))    -- skip if already visited
)
SELECT * FROM tree;

Performance: indexes and depth

Two things determine whether your recursive CTE is fast:

1. Index on the join column. The recursive step joins on parent_id (or whatever your reference column is). Without an index, every recursion is a sequential scan. With one, it’s an index lookup per row.

CREATE INDEX nodes_parent_id_idx ON nodes (parent_id);

For a tree of 1M nodes, the difference is “10 ms” vs “30 seconds.”

2. Tree depth and breadth. A balanced tree of 1M nodes is fast. A pathologically unbalanced tree (one long chain of 100k nodes) does 100k recursion steps, each adding to the result set. Postgres handles this fine but it’s not free.

For trees deeper than ~10k levels, you may need to limit depth:

WITH RECURSIVE tree AS (
  SELECT id, parent_id, 0 AS depth FROM nodes WHERE id = $1
  UNION ALL
  SELECT n.id, n.parent_id, t.depth + 1
  FROM nodes n JOIN tree t ON n.parent_id = t.id
  WHERE t.depth < 100              -- safety limit
)
SELECT * FROM tree;

The materialized-path alternative

For trees you read more than write, denormalizing the path stops being a recursive query at all:

ALTER TABLE nodes ADD COLUMN path text;
-- Backfill. Then on every insert/update, recompute path.

SELECT * FROM nodes WHERE path LIKE '/eng/backend/%';
-- All descendants of /eng/backend.

A B-tree or GiST index on path makes this very fast. Trade-off: you maintain path on every update — usually fine for trees that don’t change often.

For very-write-heavy or very-deep trees, ltree is a Postgres extension specifically for hierarchical paths with native operators.

Closure tables: the third option

For aggressive read performance with mutable trees, use a closure table:

CREATE TABLE node_closure (
  ancestor_id   bigint,
  descendant_id bigint,
  depth         int,
  PRIMARY KEY (ancestor_id, descendant_id)
);
-- One row per (ancestor, descendant) pair, including self.

Now “find all descendants” is a single index lookup with no recursion:

SELECT n.* FROM nodes n
JOIN node_closure c ON c.descendant_id = n.id
WHERE c.ancestor_id = $1;

Trade-off: the closure table is O(N²) worst case. For a 10k-node deep tree, that’s 100M closure rows. Use only for trees where reads massively outnumber writes.

The four traps

1. Forgetting indexes. A recursive CTE without an index on the parent_id is the slowest version of every query.

2. Using UNION instead of UNION ALL. UNION is slower (deduplicates). For tree traversal, you don’t have duplicates; use UNION ALL.

3. Selecting too much in the base case. If your base case returns 100k rows, the recursion explodes from there. Constrain the base.

4. Not testing on production-size data. A recursive CTE that flies on 100 rows can crawl on 1M. Always test with realistic volume.

When to use what

A practical decision tree:

Tree changes often, reads are occasional: recursive CTE.
Tree changes rarely, reads are frequent: materialized path.
Tree changes rarely, reads are extremely frequent and complex: closure table.
Hierarchical paths are the primary access pattern: ltree extension.

For most teams’ first hierarchical query, recursive CTE is the right starting point.

The takeaway

Hierarchical data is not unfriendly to SQL. Recursive CTEs handle descendants, ancestors, paths, and rolled-up aggregations in single queries that replace N+1 application-level round trips. Index the join column, use UNION ALL, watch for cycles, and limit depth as a safety net.

The next time you find yourself writing a loop in application code that fetches children-of-children, write the CTE instead. One query, one round trip, problem solved.

A note from Yojji

The kind of database fluency that turns “we need a recursive query” from a five-meeting investigation into a ten-line CTE is the kind of senior backend skill Yojji’s teams bring to client work — including the data-modeling judgment that decides whether to query, materialize, or denormalize.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and Postgres-backed application engineering — including the SQL patterns that decide whether a hierarchical feature scales.