SQL Window Functions: The Five Patterns That Replace Half Your Application Code

The team needs “the top three orders per customer by value.” Somebody writes a query that pulls all orders, groups them in JavaScript, sorts each group, and slices the first three. It works on a small dataset. On the real production data — 50 million orders — it loads 4 GB into the API process and times out.

The same query in SQL with a window function is one statement, runs in 200 ms, and returns 30 KB. Window functions are the underused SQL feature that replaces a huge category of “pull the data and process it in code” patterns. This post is the five window-function patterns that cover 90% of real use cases.

What a window function is, in 30 seconds

A window function does an aggregation over a window of rows without collapsing the rows into one. Compare:

-- Aggregation: one row per customer.
SELECT customer_id, sum(amount) FROM orders GROUP BY customer_id;

-- Window function: one row per order, with the customer's total alongside.
SELECT customer_id, amount, sum(amount) OVER (PARTITION BY customer_id) AS customer_total
FROM orders;

The OVER (...) clause is what makes it a window function. PARTITION BY slices the data; ORDER BY orders within each slice; the function computes per-row over the slice.

That is the entire idea. The five patterns below are variations on it.

Pattern 1: Top N per group

The motivating example. Top three orders per customer:

SELECT customer_id, id AS order_id, amount, rk
FROM (
  SELECT customer_id, id, amount,
         row_number() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS rk
  FROM orders
) ranked
WHERE rk <= 3;

row_number() assigns 1, 2, 3, … within each PARTITION BY customer_id, ordered by amount. The outer query keeps only ranks 1–3.

row_number() vs rank() vs dense_rank():

row_number() — strict 1, 2, 3, 4. Ties broken arbitrarily.
rank() — 1, 1, 3, 4 (ties get the same rank, next rank skips).
dense_rank() — 1, 1, 2, 3 (ties get the same rank, next rank does not skip).

For “top N” queries with ties you want kept, use dense_rank().

Pattern 2: Running totals

SELECT created_at::date AS day,
       sum(amount) AS daily,
       sum(sum(amount)) OVER (ORDER BY created_at::date) AS running_total
FROM orders
GROUP BY 1
ORDER BY 1;

The outer sum() OVER (ORDER BY ...) computes a cumulative sum across rows in the order specified. Result:

day         daily   running_total
2023-04-01    120          120
2023-04-02    340          460
2023-04-03    180          640

For monthly running totals, partition by month:

sum(daily) OVER (PARTITION BY date_trunc('month', day) ORDER BY day) AS running_month

Pattern 3: Lead and lag — comparing rows to neighbors

lag() gives you the previous row’s value; lead() gives you the next. Useful for “compared to yesterday” or session-gap analysis.

-- Day-over-day change in daily orders.
SELECT day, daily,
       daily - lag(daily, 1) OVER (ORDER BY day) AS change_vs_yesterday,
       round(100.0 * (daily - lag(daily) OVER (ORDER BY day)) / lag(daily) OVER (ORDER BY day), 1) AS pct_change
FROM daily_summary;

For session detection — “two events more than 30 minutes apart belong to different sessions”:

SELECT user_id, event_at,
       CASE WHEN event_at - lag(event_at) OVER (PARTITION BY user_id ORDER BY event_at) > interval '30 min'
            THEN 1 ELSE 0
       END AS new_session
FROM events;

A cumulative sum on the new_session column gives you a session ID per user.

Pattern 4: Percentile and quartile by group

ntile(n) divides each partition into n buckets:

SELECT customer_id, amount,
       ntile(4) OVER (PARTITION BY customer_id ORDER BY amount) AS quartile
FROM orders;

For statistical percentiles, use percentile_cont or percent_rank:

-- p95 latency per endpoint.
SELECT endpoint,
       percentile_cont(0.95) WITHIN GROUP (ORDER BY duration_ms) AS p95
FROM api_logs
GROUP BY endpoint;

percentile_cont is technically an ordered-set aggregate, not a window function — but it lives in the same family. For per-row percentile rank:

SELECT id, duration_ms,
       percent_rank() OVER (PARTITION BY endpoint ORDER BY duration_ms) AS pct
FROM api_logs;

percent_rank() = 0.99 means “this row is in the top 1% slowest.”

Pattern 5: First / last per group

The “first” and “last” version of min/max:

SELECT customer_id,
       first_value(id) OVER (PARTITION BY customer_id ORDER BY created_at) AS first_order_id,
       last_value(id)  OVER (PARTITION BY customer_id ORDER BY created_at
                             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_order_id
FROM orders;

The annoying gotcha: last_value() defaults to a frame that ends at the current row, so without the explicit ROWS BETWEEN ..., you get the current row, not the actual last. Use nth_value(col, n) for the Nth element.

Frames: the under-the-hood part

Every window function operates over a frame. The default frame for ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That is why running totals work — but it is also why last_value() returns the current row.

Frame syntax:

sum(x) OVER (
  ORDER BY t
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW   -- last 7 rows
)

A 7-row sliding window. Useful for moving averages:

SELECT day, daily,
       avg(daily) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma7
FROM daily_summary;

RANGE BETWEEN '7 days' PRECEDING AND CURRENT ROW (with ORDER BY some_date_col) gives you a time-based window.

Performance: the indexes that matter

A window function reads all rows in the partition, sorts them by the window’s ORDER BY, and computes. Indexing matters:

An index on (partition_col, order_col) lets Postgres skip the sort entirely. Big win for large tables.
The query planner sometimes does not notice this without a hint — EXPLAIN ANALYZE and check whether the plan has a Sort node above the WindowAgg.

For “top N per group” specifically, LATERAL joins are sometimes faster than row_number():

SELECT c.id, o.id, o.amount
FROM customers c
CROSS JOIN LATERAL (
  SELECT id, amount FROM orders
   WHERE customer_id = c.id
   ORDER BY amount DESC
   LIMIT 3
) o;

This pushes the per-customer sort into a sub-query the planner can handle with the appropriate index. For very wide partitions, LATERAL wins.

Common mistakes

Forgetting the OVER () parentheses. sum(x) OVER (without parens) is a syntax error. sum(x) OVER () is a window function over the entire result set — different from sum(x) (an aggregation that collapses rows).

Using WHERE to filter window-function results. The window function is computed after WHERE. To filter on the result, wrap in a subquery or CTE:

-- Wrong:
SELECT id, row_number() OVER (...) AS rn FROM t WHERE rn <= 3;
-- Right:
WITH ranked AS (SELECT id, row_number() OVER (...) AS rn FROM t)
SELECT id FROM ranked WHERE rn <= 3;

Mixing GROUP BY and window functions. Allowed, but confusing. The window function operates on the post-group rows. Most of the time, it is cleaner to use one or the other.

When to push it back to the application

Window functions are not always the right answer:

If the data set is small (<1000 rows), the difference is invisible. Use whatever code is clearer.
If the computation requires arbitrary code (regex, complex string transforms), application code is fine.
If the result needs to be combined with non-DB data, the application layer is where the join happens anyway.

For everything that is “compute aggregates within groups” or “compare rows to their neighbors,” window functions are dramatically faster than the round-trip-to-application version.

The takeaway

Window functions cover a category of work that most developers do in application code by reflex. Top-N per group, running totals, lead/lag comparisons, percentile rankings, first/last — these are five patterns that account for most of the “pull the data, process it” code in a typical API.

Spend an hour rewriting one such query and the value is obvious. The next time someone says “we’ll loop through the rows in JavaScript,” the answer is “or we use a window function and ship 100× less data.”

A note from Yojji

The kind of database fluency that pushes computation into SQL where it belongs — and out of application code that has to load gigabytes to do the same job — is the kind of senior backend skill Yojji’s teams bring to client work.

Yojji is an international custom software development company founded in 2016, with teams in Europe, the US, and the UK. They specialize in the JavaScript ecosystem, cloud platforms (AWS, Azure, GCP), and full-cycle product engineering — including the database work that decides whether a feature scales or stalls.