Case 2: Text Search with a Leading Wildcard (LIKE '%...')
The Story
- Table:
logs(stores system logs) - Data: The table has 5,000,000 rows. The
messagecolumn contains log messages, which can be very long. - Current State: To speed things up, an administrator has already created a standard B-Tree index on the
messagecolumn:CREATE INDEX idx_logs_message ON logs(message); - The Problem: A developer needs to find all logs that contain a specific error ID string, for example,
error_id=ABC-987-XYZ, which could appear anywhere in themessagestring. They useLIKE '%...%'.
The Problematic SQL Query
EXPLAIN ANALYZE
SELECT *
FROM logs
WHERE message LIKE '%error_id=ABC-987-XYZ%';The Query Plan (BEFORE a fix)
Even with the B-Tree index, this is the result you will see:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on logs (cost=0.00..87543.00 rows=250 width=350) (actual time=0.045..985.123 rows=12 loops=1)
Filter: (message LIKE '%error_id=ABC-987-XYZ%')
Rows Removed by Filter: 4999988
Planning Time: 0.120 ms
Execution Time: 985.201 msWhy is this a problem?
Look at this line: Again, it's a
Seq Scan. Even though an index exists, PostgreSQL completely ignored it.Why didn't the B-Tree index work?
- A B-Tree index works like the index of a dictionary, sorted alphabetically from left to right. It is only effective when you know the beginning of the string you are looking for (e.g.,
message LIKE 'User login failed...'). - When you put a wildcard character (
%) at the beginning (LIKE '%...'), you are telling the database: "Find a string, but I don't know what it starts with." The database cannot use the B-Tree "dictionary" to look it up and must fall back to the worst-case scenario: reading the entire table.
- A B-Tree index works like the index of a dictionary, sorted alphabetically from left to right. It is only effective when you know the beginning of the string you are looking for (e.g.,
The Warning Signs:
Execution Time: **985.201 ms**: Taking almost 1 second for a search query is unacceptable in most applications.Rows Removed by Filter: **4999988**: The system read 5 million rows and threw away almost all of them.
The Root Cause:
- Using the wrong type of index for the query. A B-Tree index is not designed for searching for a substring in the middle of a text field.
The Solution
Action: Use a special type of index designed for text searching: a Trigram Index. This index breaks the text down into chunks of 3 characters (trigrams) and indexes them, which makes searching for substrings very fast.
Commands:
sql-- Step 1: Enable the extension (only needs to be done once per database) CREATE EXTENSION IF NOT EXISTS pg_trgm; -- Step 2: Create a Trigram index on the `message` column -- (We use a GIN index for optimal read performance) CREATE INDEX idx_logs_message_trgm ON logs USING GIN (message gin_trgm_ops);(After this step, you can delete the old B-Tree index if it's no longer needed for other queries to save space.)
The Query Plan (AFTER the fix)
Run the EXPLAIN ANALYZE query again:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on logs (cost=324.50..1280.65 rows=250 width=350) (actual time=1.850..2.345 rows=12 loops=1)
Recheck Cond: (message LIKE '%error_id=ABC-987-XYZ%')
Heap Blocks: exact=12
-> Bitmap Index Scan on idx_logs_message_trgm (cost=0.00..324.44 rows=250 width=0) (actual time=1.821..1.821 rows=12 loops=1)
Index Cond: (message LIKE '%error_id=ABC-987-XYZ%')
Planning Time: 0.250 ms
Execution Time: 2.401 msAnalyzing the Result
- The Big Change: The plan switched to a
Bitmap Index Scanand used the new Trigram index (idx_logs_message_trgm). - Performance Comparison:
- Execution Time: Dropped dramatically from 985.201 ms down to just 2.401 ms. That's about 410 times faster!
Conclusion
This is a very important lesson. Just having an index doesn't guarantee a fast query. You must use the right type of index for the right type of query. For flexible text searches, especially LIKE patterns with a leading wildcard, a Trigram Index (pg_trgm) is an extremely powerful tool.