PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases

The product catalog has 200,000 SKUs and the search box is the highest-traffic feature on the site. Six months ago, someone decided that LIKE '%query%' was too slow and added Elasticsearch. Now you run two datastores, a sync pipeline that breaks every other deploy, and a mapping version conflict that corrupted last Tuesday’s index rebuild. The search is fast, but the operational surface area doubled for a problem Postgres already solves natively.

PostgreSQL’s full-text search is not a toy feature. It has been production-ready since 8.3, supports ranked relevance, phrase matching, highlighting, and multiple languages out of the box. For catalogs, documentation, support tickets, and internal admin search, it is often the only tool you need. This post shows the schema, indexing, and query patterns that turn a slow LIKE scan into a sub-10-millisecond ranked search, plus the exact limits where Elasticsearch still wins.

Why LIKE and ILIKE fall over

A query like SELECT * FROM products WHERE name ILIKE '%leather%' cannot use a B-tree index. Postgres must scan every row, lowercase the column, and pattern-match against the input. On 200,000 rows, that is 200,000 string comparisons. Add an OR description ILIKE '%leather%' and you are scanning the table twice. On a busy site, these queries show up in pg_stat_statements with execution times in the hundreds of milliseconds and rapidly rising buffer reads.

You can add a pg_trgm GIN index and use LIKE 'leather%' (anchored to the left), but that only helps prefix searches. Users expect word-stem matching, so “boot” matches “boots.” They expect relevance ranking, so an exact title match beats a passing mention in a description. They expect typo tolerance, which neither B-tree nor trigram indexes provide well. At some point, teams give up and reach for Elasticsearch. But Postgres has a dedicated full-text index type and query language that handles the first three of those expectations without leaving the database.

tsvector and tsquery: the core idea

Full-text search in Postgres works by converting text into a tsvector, a sorted list of lexemes (normalized word stems) with positional information. A query is converted into a tsquery, a structured search predicate. The operator @@ checks whether the tsvector matches the tsquery.

SELECT to_tsvector('english', 'The quick brown fox') @@ to_tsquery('english', 'quick & brown');
-- returns true

to_tsvector strips stop words (“the”), lowercases, and stems (“foxes” becomes “fox”). to_tsquery parses boolean operators: & for AND, | for OR, ! for NOT, and <-> for followed by. This is not string matching. It is linguistic indexing, and it is fast because the tsvector is precomputed and stored in a GIN index.

The key insight is that you do not call to_tsvector on the column at query time. You store the tsvector in a generated column, index it, and query against the index. That makes search an index-only lookup, not a table scan.

Building the search column and index

Start with a products table:

CREATE TABLE products (
  id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name text NOT NULL,
  description text,
  category text,
  search_vector tsvector GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(category, '')), 'C')
  ) STORED
);

setweight tags each source field with a priority: A (highest) for name, B for description, C for category. When ranking, a match in the name scores higher than a match in the description. The coalesce prevents NULL inputs from nullifying the whole vector.

Now add the GIN index:

CREATE INDEX idx_products_search ON products USING GIN (search_vector);

GIN (Generalized Inverted Index) is the right index type for full-text search. It stores each lexeme with a posting list of row IDs that contain it. A search for “leather” jumps directly to the posting list for the lexeme “leather,” then intersects it with other lexeme lists if the query has multiple terms. The index is larger than a B-tree (expect 30-50% of the table size), but lookups are logarithmic in the number of unique lexemes, not the number of rows.

Querying with ranking

A basic ranked search looks like this:

SELECT
  id,
  name,
  ts_rank_cd(search_vector, query, 32) AS rank
FROM products,
  plainto_tsquery('english', 'leather boots') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

plainto_tsquery converts plain user input into a tsquery, inserting & between words. ts_rank_cd computes a relevance score based on term frequency, proximity, and the weights you assigned. The optional third argument (32) tells it to divide the rank by the document length, so a short title match does not lose to a long description that mentions the term ten times.

This query uses the GIN index for the @@ filter, then sorts the results by rank. If your result set after filtering is small (under a few thousand), the sort is cheap. If it is large, add a WHERE clause on category or price to narrow the set before ranking.

Highlighting results

Users want to see why a result matched. ts_headline extracts fragments with search terms highlighted:

SELECT
  id,
  name,
  ts_headline(
    'english',
    description,
    plainto_tsquery('english', 'leather boots'),
    'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
  ) AS highlight
FROM products,
  plainto_tsquery('english', 'leather boots') query
WHERE search_vector @@ query
ORDER BY ts_rank_cd(search_vector, query) DESC
LIMIT 20;

ts_headline is powerful but not free. It re-parses the source text at query time. For high-traffic search, consider caching the highlighted snippet in application code or using it only on the top N results after ranking.

Phrase and proximity search

plainto_tsquery treats input as an AND of independent words. If the user searches for “database index,” it matches a product named “index cards for your database.” To require the words in order, use phraseto_tsquery:

SELECT * FROM products
WHERE search_vector @@ phraseto_tsquery('english', 'database index');

For custom proximity, tsquery supports the distance operator <N>:

SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'database <2> index');

This matches “database” followed by “index” within two word positions. It is useful for disambiguating compound terms without requiring an exact phrase.

Prefix matching for autocomplete

Full-text search does not do prefix matching by default. “boot” will not match “boots” because stemming normalizes both to “boot,” but “boot” will not match “bootstrap” because the stem is different. For autocomplete, combine full-text search with pg_trgm:

CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);

SELECT id, name
FROM products
WHERE name % 'boot'
ORDER BY name <-> 'boot'
LIMIT 10;

The % operator is trigram similarity, and <-> is the distance operator for ordering by closest match. This is a separate index and query path from the full-text search, but they complement each other. Use trigram for autocomplete suggestions and tsvector for the final ranked search.

Keeping the index current

Because search_vector is a generated column, it updates automatically when name, description, or category changes. There is no application-level sync to maintain. If you batch-load data with COPY, the generated column is computed during the load, which is slower than loading plain text. For large bulk imports, consider:

Dropping the GIN index before bulk load.
Loading the data.
Re-creating the index with CREATE INDEX CONCURRENTLY.

This can reduce load time by 60-80% on multi-million-row imports.

Performance and scaling limits

On a table with 500,000 products, the ranked search query above typically executes in 2-8 milliseconds with a warm cache. The GIN index size is roughly 40% of the table size. Here is what to watch:

Index bloat. GIN indexes can bloat under heavy updates because Postgres uses a pending list for fast insertions, then flushes it to the main index structure during vacuum. If autovacuum cannot keep up, searches slow down as they scan the pending list. Monitor pgstatindex('idx_products_search') and ensure pending_pages stays low. If you have a write-heavy workload, consider increasing gin_pending_list_limit or running VACUUM more aggressively.

Ranking large result sets. If a common term matches 100,000 rows, ranking all of them is expensive. Add mandatory filters (category, price range, in-stock flag) to shrink the candidate set before ts_rank_cd runs. If you cannot filter, consider using a materialized view for precomputed top-N results per category.

Multi-language content. to_tsvector('english', ...) only handles English. If you store multilingual text, use a language column and a functional index:

CREATE INDEX idx_products_search_multilang ON products USING GIN (
  to_tsvector(coalesce(language, 'english'), coalesce(name, '') || ' ' || coalesce(description, ''))
);

Query with the same language parameter used at index time.

When Elasticsearch still wins

Postgres full-text search is not a universal replacement. Reach for Elasticsearch when:

You need fuzzy matching with edit distance (“lether” matching “leather”). Postgres trigrams handle mild typos, but not Levenshtein distance at scale.
You run complex aggregations (faceted search with 20+ dimensions and counts). Postgres GROUP BY works for simple facets, but Elasticsearch aggregations are purpose-built for this.
Your search volume exceeds what a single Postgres primary can serve. A read replica helps, but Elasticsearch is designed to shard search horizontally.
You need geo-spatial search combined with text relevance. Postgres has PostGIS, but combined geo-text ranking is smoother in Elasticsearch.
Your document count is in the tens of millions and growing fast. GIN indexes are fast, but they are not distributed.

For everything else, Postgres search removes a moving part, eliminates sync pipelines, and keeps your data in one transactional system.

Migrating from LIKE queries

The safest migration path is additive:

Add the generated search_vector column and GIN index to the existing table. This is an online operation in Postgres 11+ if you use CREATE INDEX CONCURRENTLY.
Backfill the column with UPDATE products SET search_vector = ... in batches of 10,000 rows to avoid locking the table. On Postgres 12+, the generated column backfills automatically on creation.
Update application code to query search_vector when a text_search parameter is present, falling back to the old LIKE query if the parameter is absent.
Run both paths in parallel for a week, comparing result quality and latency.
Remove the old LIKE path once metrics confirm the new path is faster and produces better results.

This approach carries zero downtime and gives you an instant rollback if ranking behavior surprises your users.

Monitoring what matters

Add these checks to your observability stack:

-- Average search latency by query pattern
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%search_vector%'
ORDER BY mean_exec_time DESC;

-- GIN index bloat
SELECT pg_size_pretty(pg_relation_size('idx_products_search')) AS index_size,
       pgstatindex('idx_products_search')->>'avg_leaf_density' AS leaf_density,
       pgstatindex('idx_products_search')->>'leaf_fragmentation' AS fragmentation;

Alert if average search latency crosses 50 ms or if GIN fragmentation exceeds 30%. Both indicate that vacuum is falling behind or the query pattern needs stricter filtering.

The takeaway

Elasticsearch is a powerful search engine, but it is also a second database with its own replication, monitoring, backup, and schema migration story. For product catalogs, documentation search, support ticket lookup, and most internal admin tools, PostgreSQL full-text search is fast enough, feature-rich enough, and operationally simpler by an order of magnitude.

The migration is not a rewrite. It is one generated column, one GIN index, and a query that uses ts_rank_cd instead of LIKE. Start there. Measure latency, relevance, and index size. If you hit the scaling limits (fuzzy matching, complex facets, or horizontal sharding), then you will have concrete evidence that Elasticsearch is justified. Until then, you are probably operating two databases because no one checked whether the first one could already do the job.

A note from Yojji

Simplifying infrastructure by using the tools you already own is one of the fastest ways to reduce operational risk. Yojji’s engineering teams regularly help clients evaluate whether their current stack can handle new requirements before adding complexity.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their 50+ senior engineers specialize in Postgres, Node.js, and cloud-native architecture, building systems that stay maintainable as they scale.