Skip to content

Case 1: Searching on a Column Without an Index

The Story

  • Table: orders
  • Data: The table has 2,000,000 rows.
  • Scenario: The customer_id column stores the ID of the customer. Developers often need to find all orders for a specific customer.
  • The Problem: Currently, there is no index on the customer_id column.

The Problematic SQL Query

sql
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 12345;

The Query Plan (BEFORE a fix)

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..38455.00 rows=55 width=102) (actual time=0.025..289.431 rows=55 loops=1)
   Filter: (customer_id = 12345)
   Rows Removed by Filter: 1999945
 Planning Time: 0.098 ms
 Execution Time: 289.512 ms

Why is this a problem?

  1. Look at this line: Seq Scan on orders.

    • A Seq Scan (Sequential Scan) means that PostgreSQL had to read all 2 million rows of the orders table, one by one, from start to finish.
  2. The Warning Signs:

    • Execution Time: **289.512 ms**: The query took a very long time (about 0.3 seconds) just to find one customer's orders.
    • Rows Removed by Filter: **1999945**: This is the clearest sign of a problem! The system had to read 2 million rows and then throw away almost all of them to keep only the 55 rows it needed. This is a huge waste of resources.
  3. The Root Cause:

    • Because there is no "table of contents" (an index) for the customer_id column, the database has no choice but to "read the whole book" (scan the entire table) to find the pages it needs.

The Solution

  • Action: Create an index on the customer_id column to speed up searches.
  • Command:
    sql
    CREATE INDEX idx_orders_customer_id ON orders(customer_id);

The Query Plan (AFTER the fix)

Now, let's run the same EXPLAIN ANALYZE query again:

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on orders  (cost=15.25..359.85 rows=55 width=102) (actual time=0.065..0.123 rows=55 loops=1)
   Recheck Cond: (customer_id = 12345)
   Heap Blocks: exact=52
   ->  Bitmap Index Scan on idx_orders_customer_id  (cost=0.00..15.24 rows=55 width=0) (actual time=0.045..0.045 rows=55 loops=1)
         Index Cond: (customer_id = 12345)
 Planning Time: 0.155 ms
 Execution Time: 0.151 ms

Analyzing the Result

  1. The Big Change: The plan changed from a Seq Scan to a Bitmap Heap Scan combined with a Bitmap Index Scan.

    • Bitmap Index Scan: The system used our new index, idx_orders_customer_id, to quickly find the locations of all 55 rows where customer_id = 12345.
    • Bitmap Heap Scan: After getting the list of locations, it efficiently went to the table to get the data for only those rows.
  2. Performance Comparison:

    • Execution Time: Dropped from 289.512 ms down to just 0.151 ms. That's 1900 times faster!
    • Cost (Estimated Cost): The maximum estimated cost dropped from 38455.00 to just 359.85.

Conclusion

By adding a simple index to a column that is frequently used for filtering, we completely fixed the bottleneck, making the query thousands of times faster.