Skip to content

Excellent! We'll continue the series with a very practical problem in almost every application: optimizing search functionality.


The Story

  • System: A knowledge-sharing website or a technical blog.
  • Table: articles, containing 2,000,000 articles.
  • Columns: id, title, content (of type TEXT).
  • The Problem: The application needs a powerful search function that allows users to find articles containing a set of keywords, for example: "optimize database performance".

The Common but Inefficient Approach: Using ILIKE

The simplest approach is to use multiple ILIKE (or case-insensitive LIKE) clauses.

sql
EXPLAIN ANALYZE
SELECT id, title
FROM articles
WHERE
    content ILIKE '%optimize%'
AND content ILIKE '%database%'
AND content ILIKE '%performance%';

The Bottleneck

  1. Inevitable Seq Scan: Because the search condition starts with a wildcard (%), the database cannot use a standard B-Tree index. It is forced to perform a Seq Scan—sequentially reading all 2 million articles.
  2. Not Linguistically Smart: LIKE is just a simple string comparison.
    • No Stemming: A search for "optimize" will not find an article containing the word "optimizing."
    • No Stop Words: A search for "how to do" will waste resources looking for extremely common words like "how" and "to."
    • No Ranking: There is no way to know which article is more relevant. An article that mentions "database" once is treated the same as an in-depth article about "database."

The Solution: Use Full-Text Search (FTS) ✅

  • The Logic: PostgreSQL provides a powerful, built-in full-text search system. It works in two steps:

    1. Indexing: It converts text into a special data type called a tsvector. This process splits words, reduces them to their root form (stemming), and removes stop words. This tsvector can then be efficiently indexed using a GIN index.
    2. Querying: It converts the user's search string into a tsquery, which can understand operators like AND (&), OR (|), and NOT (!).
  • The Steps:

    sql
    -- Step 1 (do this once): Add a tsvector column to store the processed content
    ALTER TABLE articles ADD COLUMN content_tsv tsvector;
    
    -- Step 2: Populate the new column
    UPDATE articles SET content_tsv = to_tsvector('english', title || ' ' || content);
    -- Note: Choose the appropriate language.
    
    -- Step 3: Create a GIN index to speed up searches
    CREATE INDEX idx_articles_content_tsv ON articles USING GIN(content_tsv);
    
    -- Step 4: Create a trigger to automatically update the tsv column on changes
    -- (This is a crucial step in a real application to keep data fresh)
  • The Optimized Query:

    sql
    EXPLAIN ANALYZE
    SELECT
        id,
        title,
        -- Rank the relevance of the results
        ts_rank(content_tsv, to_tsquery('english', 'optimize & database & performance')) as relevance
    FROM articles
    WHERE
        content_tsv @@ to_tsquery('english', 'optimize & database & performance')
    ORDER BY
        relevance DESC
    LIMIT 50;

Analyzing the Result

  1. Superior Performance: Instead of a Seq Scan, the query will use the GIN index for a nearly instantaneous search, even on millions of rows. The execution time drops from seconds (or minutes) to just a few milliseconds.
  2. Smarter Search Results:
    • Thanks to stemming, a search for "optimize" will also return results for "optimizing" and "optimization."
    • The & operator ensures that all keywords are present.
  3. Ranking is Available: ts_rank allows you to order the results by relevance, bringing the best articles to the top and significantly improving the user experience.

Conclusion:

  • For text search functionality, LIKE and ILIKE are the wrong tools for the job.
  • PostgreSQL's Full-Text Search is a specialized, powerful solution that not only provides superior performance but also delivers much higher-quality search results.