Excellent! We'll continue the series with a very practical problem in almost every application: optimizing search functionality.
Case 20: Supercharging Search: LIKE vs. Full-Text Search
The Story
- System: A knowledge-sharing website or a technical blog.
- Table:
articles, containing 2,000,000 articles. - Columns:
id,title,content(of typeTEXT). - The Problem: The application needs a powerful search function that allows users to find articles containing a set of keywords, for example: "optimize database performance".
The Common but Inefficient Approach: Using ILIKE
The simplest approach is to use multiple ILIKE (or case-insensitive LIKE) clauses.
sql
EXPLAIN ANALYZE
SELECT id, title
FROM articles
WHERE
content ILIKE '%optimize%'
AND content ILIKE '%database%'
AND content ILIKE '%performance%';The Bottleneck
- Inevitable
Seq Scan: Because the search condition starts with a wildcard (%), the database cannot use a standard B-Tree index. It is forced to perform aSeq Scan—sequentially reading all 2 million articles. - Not Linguistically Smart:
LIKEis just a simple string comparison.- No Stemming: A search for "optimize" will not find an article containing the word "optimizing."
- No Stop Words: A search for "how to do" will waste resources looking for extremely common words like "how" and "to."
- No Ranking: There is no way to know which article is more relevant. An article that mentions "database" once is treated the same as an in-depth article about "database."
The Solution: Use Full-Text Search (FTS) ✅
The Logic: PostgreSQL provides a powerful, built-in full-text search system. It works in two steps:
- Indexing: It converts text into a special data type called a
tsvector. This process splits words, reduces them to their root form (stemming), and removes stop words. Thistsvectorcan then be efficiently indexed using aGINindex. - Querying: It converts the user's search string into a
tsquery, which can understand operators likeAND(&),OR(|), andNOT(!).
- Indexing: It converts text into a special data type called a
The Steps:
sql-- Step 1 (do this once): Add a tsvector column to store the processed content ALTER TABLE articles ADD COLUMN content_tsv tsvector; -- Step 2: Populate the new column UPDATE articles SET content_tsv = to_tsvector('english', title || ' ' || content); -- Note: Choose the appropriate language. -- Step 3: Create a GIN index to speed up searches CREATE INDEX idx_articles_content_tsv ON articles USING GIN(content_tsv); -- Step 4: Create a trigger to automatically update the tsv column on changes -- (This is a crucial step in a real application to keep data fresh)The Optimized Query:
sqlEXPLAIN ANALYZE SELECT id, title, -- Rank the relevance of the results ts_rank(content_tsv, to_tsquery('english', 'optimize & database & performance')) as relevance FROM articles WHERE content_tsv @@ to_tsquery('english', 'optimize & database & performance') ORDER BY relevance DESC LIMIT 50;
Analyzing the Result
- Superior Performance: Instead of a
Seq Scan, the query will use theGINindex for a nearly instantaneous search, even on millions of rows. The execution time drops from seconds (or minutes) to just a few milliseconds. - Smarter Search Results:
- Thanks to stemming, a search for "optimize" will also return results for "optimizing" and "optimization."
- The
&operator ensures that all keywords are present.
- Ranking is Available:
ts_rankallows you to order the results by relevance, bringing the best articles to the top and significantly improving the user experience.
Conclusion:
- For text search functionality,
LIKEandILIKEare the wrong tools for the job. - PostgreSQL's Full-Text Search is a specialized, powerful solution that not only provides superior performance but also delivers much higher-quality search results.