Case 3: Joining on Large Datasets Before Filtering
The Story
- Table 1:
event_logs(records every user event), very large with 20,000,000 rows. Columns:user_id,event_type,event_timestamp,details. - Table 2:
users(user information), with 500,000 rows. Columns:id,registration_date,country. - Current State: Indexes exist on
event_logs(user_id),event_logs(event_timestamp), andusers(id). - The Problem: The marketing team wants to analyze 'purchase' events for users who registered in July 2025 and are from 'Vietnam'.
The Problematic SQL Query
A natural but potentially inefficient way to write the query:
EXPLAIN ANALYZE
SELECT count(*)
FROM event_logs el
JOIN users u ON el.user_id = u.id
WHERE el.event_type = 'purchase'
AND u.country = 'Vietnam'
AND u.registration_date >= '2025-07-01'
AND u.registration_date < '2025-08-01';The Query Plan (BEFORE a fix)
In some cases, the query planner might choose a non-optimal plan that joins the large dataset first.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=450123.45..450123.46 rows=1 width=8) (actual time=5123.456..5123.457 rows=1 loops=1)
-> Hash Join (cost=15821.50..450098.21 rows=10100 width=0) (actual time=350.123..5110.987 rows=15234 loops=1)
Hash Cond: (el.user_id = u.id)
-> Seq Scan on event_logs el (cost=0.00..425432.00 rows=250000 width=4) (actual time=0.025..4567.890 rows=250000 loops=1)
Filter: (event_type = 'purchase'::text)
-> Hash (cost=15800.25..15800.25 rows=1050 width=4) (actual time=3.456..3.456 rows=1050 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 48kB
-> Seq Scan on users u (cost=0.00..15800.25 rows=1050 width=4) (actual time=0.015..2.987 rows=1050 loops=1)
Filter: ((country = 'Vietnam'::text) AND (registration_date >= '2025-07-01'::date) AND (registration_date < '2025-08-01'::date))
Planning Time: 0.450 ms
Execution Time: 5123.678 msWhy is this a problem?
Look at the steps from the inside out:
Seq Scan on users u: First, it filters theuserstable. Let's say it finds 1,050 users from 'Vietnam' who registered in July. This step is quite fast.Seq Scan on event_logs el: It filters theevent_logstable to get all 'purchase' events. Let's say there are 250,000 such events. This step has to read a large part of the table and is quite slow (actual time=...4567.890).Hash Join: This is the main bottleneck. It performs a join between 250,000 rows fromevent_logsand 1,050 rows fromusers. Even though 1,050 is a small number, having to process 250,000 rows in the join is very expensive.
The Root Cause:
- The order of operations is not optimal. The plan filtered both tables but still left a very large dataset (250,000 'purchase' events) to be used in the
JOIN. - A better approach would be to find the small list of users (1,050 people) first, and then find the 'purchase' events for only those specific users in the
event_logstable.
- The order of operations is not optimal. The plan filtered both tables but still left a very large dataset (250,000 'purchase' events) to be used in the
The Solution
- Action: Rewrite the query to force the database to filter the
userstable first, and then use that result (the list of user IDs) to efficiently query theevent_logstable using its index. Using a CTE (Common Table Expression) or a Subquery works well. - Command with CTE:sql
EXPLAIN ANALYZE WITH filtered_users AS ( SELECT id FROM users WHERE country = 'Vietnam' AND registration_date >= '2025-07-01' AND registration_date < '2025-08-01' ) SELECT count(*) FROM event_logs el JOIN filtered_users fu ON el.user_id = fu.id WHERE el.event_type = 'purchase';
The Query Plan (AFTER the fix)
The new plan will be much smarter.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18975.34..18975.35 rows=1 width=8) (actual time=245.812..245.813 rows=1 loops=1)
-> Nested Loop (cost=0.57..18950.10 rows=10100 width=0) (actual time=0.067..240.123 rows=15234 loops=1)
-> Seq Scan on users u (cost=0.00..15800.25 rows=1050 width=4) (actual time=0.021..2.899 rows=1050 loops=1)
Filter: ((country = 'Vietnam'::text) AND (registration_date >= '2025-07-01'::date) AND (registration_date < '2025-08-01'::date))
-> Index Scan using idx_event_logs_user_id on event_logs el (cost=0.57..2.98 rows=10 width=4) (actual time=0.025..0.225 rows=15 loops=1050)
Index Cond: (user_id = u.id)
Filter: (event_type = 'purchase'::text)
Planning Time: 0.512 ms
Execution Time: 246.015 msAnalyzing the Result
The Big Change: The plan switched to a
Nested Loop.- Outer Loop: It scans the
userstable to find the 1,050 matching users. This step is the same and is still fast. - Inner Loop: This is the magic. Instead of scanning the
event_logstable, it performs 1,050Index Scans on theidx_event_logs_user_idindex. EachIndex Scanto find events for a singleuser_idis an extremely fast operation.
- Outer Loop: It scans the
Why is it efficient?: Instead of joining two large datasets, we take one small set (1,050 users) and perform 1,050 small, high-speed lookups on the large table. The total cost of these 1,050 small operations is much cheaper than one giant
Hash Join.Performance Comparison:
- Execution Time: Dropped from 5123.678 ms (~5.1 seconds) down to 246.015 ms (~0.25 seconds). That's about 20 times faster!
Conclusion
When joining a large table and a small table, try to filter the small table as much as possible first. Then, use that result to query the large table via an index. Rewriting the query with a CTE or subquery can "guide" the planner to this optimal path when it can't figure it out on its own.