Skip to content

Case 2: Text Search with a Leading Wildcard (LIKE '%...')

The Story

  • Table: logs (stores system logs)
  • Data: The table has 5,000,000 rows. The message column contains log messages, which can be very long.
  • Current State: To speed things up, an administrator has already created a standard B-Tree index on the message column: CREATE INDEX idx_logs_message ON logs(message);
  • The Problem: A developer needs to find all logs that contain a specific error ID string, for example, error_id=ABC-987-XYZ, which could appear anywhere in the message string. They use LIKE '%...%'.

The Problematic SQL Query

sql
EXPLAIN ANALYZE
SELECT *
FROM logs
WHERE message LIKE '%error_id=ABC-987-XYZ%';

The Query Plan (BEFORE a fix)

Even with the B-Tree index, this is the result you will see:

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on logs  (cost=0.00..87543.00 rows=250 width=350) (actual time=0.045..985.123 rows=12 loops=1)
   Filter: (message LIKE '%error_id=ABC-987-XYZ%')
   Rows Removed by Filter: 4999988
 Planning Time: 0.120 ms
 Execution Time: 985.201 ms

Why is this a problem?

  1. Look at this line: Again, it's a Seq Scan. Even though an index exists, PostgreSQL completely ignored it.

  2. Why didn't the B-Tree index work?

    • A B-Tree index works like the index of a dictionary, sorted alphabetically from left to right. It is only effective when you know the beginning of the string you are looking for (e.g., message LIKE 'User login failed...').
    • When you put a wildcard character (%) at the beginning (LIKE '%...'), you are telling the database: "Find a string, but I don't know what it starts with." The database cannot use the B-Tree "dictionary" to look it up and must fall back to the worst-case scenario: reading the entire table.
  3. The Warning Signs:

    • Execution Time: **985.201 ms**: Taking almost 1 second for a search query is unacceptable in most applications.
    • Rows Removed by Filter: **4999988**: The system read 5 million rows and threw away almost all of them.
  4. The Root Cause:

    • Using the wrong type of index for the query. A B-Tree index is not designed for searching for a substring in the middle of a text field.

The Solution

  • Action: Use a special type of index designed for text searching: a Trigram Index. This index breaks the text down into chunks of 3 characters (trigrams) and indexes them, which makes searching for substrings very fast.

  • Commands:

    sql
    -- Step 1: Enable the extension (only needs to be done once per database)
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    
    -- Step 2: Create a Trigram index on the `message` column
    -- (We use a GIN index for optimal read performance)
    CREATE INDEX idx_logs_message_trgm ON logs USING GIN (message gin_trgm_ops);

    (After this step, you can delete the old B-Tree index if it's no longer needed for other queries to save space.)

The Query Plan (AFTER the fix)

Run the EXPLAIN ANALYZE query again:

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on logs  (cost=324.50..1280.65 rows=250 width=350) (actual time=1.850..2.345 rows=12 loops=1)
   Recheck Cond: (message LIKE '%error_id=ABC-987-XYZ%')
   Heap Blocks: exact=12
   ->  Bitmap Index Scan on idx_logs_message_trgm  (cost=0.00..324.44 rows=250 width=0) (actual time=1.821..1.821 rows=12 loops=1)
         Index Cond: (message LIKE '%error_id=ABC-987-XYZ%')
 Planning Time: 0.250 ms
 Execution Time: 2.401 ms

Analyzing the Result

  1. The Big Change: The plan switched to a Bitmap Index Scan and used the new Trigram index (idx_logs_message_trgm).
  2. Performance Comparison:
    • Execution Time: Dropped dramatically from 985.201 ms down to just 2.401 ms. That's about 410 times faster!

Conclusion

This is a very important lesson. Just having an index doesn't guarantee a fast query. You must use the right type of index for the right type of query. For flexible text searches, especially LIKE patterns with a leading wildcard, a Trigram Index (pg_trgm) is an extremely powerful tool.