Case 1: Searching on a Column Without an Index
The Story
- Table:
orders - Data: The table has 2,000,000 rows.
- Scenario: The
customer_idcolumn stores the ID of the customer. Developers often need to find all orders for a specific customer. - The Problem: Currently, there is no index on the
customer_idcolumn.
The Problematic SQL Query
sql
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 12345;The Query Plan (BEFORE a fix)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..38455.00 rows=55 width=102) (actual time=0.025..289.431 rows=55 loops=1)
Filter: (customer_id = 12345)
Rows Removed by Filter: 1999945
Planning Time: 0.098 ms
Execution Time: 289.512 msWhy is this a problem?
Look at this line:
Seq Scan on orders.- A
Seq Scan(Sequential Scan) means that PostgreSQL had to read all 2 million rows of theorderstable, one by one, from start to finish.
- A
The Warning Signs:
Execution Time: **289.512 ms**: The query took a very long time (about 0.3 seconds) just to find one customer's orders.Rows Removed by Filter: **1999945**: This is the clearest sign of a problem! The system had to read 2 million rows and then throw away almost all of them to keep only the 55 rows it needed. This is a huge waste of resources.
The Root Cause:
- Because there is no "table of contents" (an index) for the
customer_idcolumn, the database has no choice but to "read the whole book" (scan the entire table) to find the pages it needs.
- Because there is no "table of contents" (an index) for the
The Solution
- Action: Create an index on the
customer_idcolumn to speed up searches. - Command:sql
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
The Query Plan (AFTER the fix)
Now, let's run the same EXPLAIN ANALYZE query again:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on orders (cost=15.25..359.85 rows=55 width=102) (actual time=0.065..0.123 rows=55 loops=1)
Recheck Cond: (customer_id = 12345)
Heap Blocks: exact=52
-> Bitmap Index Scan on idx_orders_customer_id (cost=0.00..15.24 rows=55 width=0) (actual time=0.045..0.045 rows=55 loops=1)
Index Cond: (customer_id = 12345)
Planning Time: 0.155 ms
Execution Time: 0.151 msAnalyzing the Result
The Big Change: The plan changed from a
Seq Scanto aBitmap Heap Scancombined with aBitmap Index Scan.Bitmap Index Scan: The system used our new index,idx_orders_customer_id, to quickly find the locations of all 55 rows wherecustomer_id = 12345.Bitmap Heap Scan: After getting the list of locations, it efficiently went to the table to get the data for only those rows.
Performance Comparison:
- Execution Time: Dropped from 289.512 ms down to just 0.151 ms. That's 1900 times faster!
- Cost (Estimated Cost): The maximum estimated cost dropped from
38455.00to just359.85.
Conclusion
By adding a simple index to a column that is frequently used for filtering, we completely fixed the bottleneck, making the query thousands of times faster.