Case 7: OR on Different Indexed Columns vs. UNION ALL
The Story
- Table:
notifications, a massive table with 50,000,000 rows logging all user notifications. - Columns:
id,actor_id(who did the action),recipient_id(who got the notification),message,created_at. - Current State: To speed things up, there are two separate indexes:
- One on
actor_id. - One on
recipient_id.
- One on
- The Problem: We need to build an "Activity Feed" that shows all notifications where a user (e.g.,
user_id = 555) was either the one who acted OR the one who received the notification.
The Problematic SQL Query
The most natural way to write this is with an OR condition.
EXPLAIN ANALYZE
SELECT id, message, created_at
FROM notifications
WHERE actor_id = 555 OR recipient_id = 555
ORDER BY created_at DESC
LIMIT 50;The Query Plan (BEFORE a fix)
When PostgreSQL sees an OR on different columns, it can get confused. It might:
- Use only one of the two indexes and then filter the other condition manually.
- Do a full
Seq Scanif it thinks combining indexes is too expensive. - Use a
BitmapOrto combine the results of two separate index scans. This can be okay, but it's not always the fastest.
Let's assume the planner chooses the BitmapOr strategy:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15345.67..15345.80 rows=50 width=45) (actual time=185.34..185.45 rows=50 loops=1)
-> Sort (cost=15345.67..15395.21 rows=19816 width=45) (actual time=185.32..185.39 rows=50 loops=1)
Sort Key: created_at DESC
-> Bitmap Heap Scan on notifications (cost=465.80..14850.12 rows=19816 width=45) (actual time=12.45..160.78 rows=19816 loops=1)
Recheck Cond: ((actor_id = 555) OR (recipient_id = 555))
-> BitmapOr (cost=465.80..465.80 rows=19816 width=0) (actual time=11.98..11.99 rows=0 loops=1)
-> Bitmap Index Scan on idx_notifications_actor_id (cost=0.00..85.45 rows=3540 width=0) (actual time=2.34..2.34 rows=3540 loops=1)
Index Cond: (actor_id = 555)
-> Bitmap Index Scan on idx_notifications_recipient_id (cost=0.00..380.12 rows=16276 width=0) (actual time=9.64..9.64 rows=16276 loops=1)
Index Cond: (recipient_id = 555)
Planning Time: 0.35 ms
Execution Time: 185.62 ms185ms might not seem terrible, but we can do much better. The problem is that it has to fetch two sets of row locations (bitmaps), merge them, then go to the table to get the actual data, then sort it, and only then can it apply the LIMIT.
The Solution: Use UNION ALL
The Logic: Divide and conquer. Instead of making the planner handle a complex
OR, we split the query into two simpleSELECTstatements. EachSELECTis optimized to use one index perfectly. Then, we combine the results.The Optimized Query:
EXPLAIN ANALYZE
(
-- Get notifications where user 555 was the actor
SELECT id, message, created_at
FROM notifications
WHERE actor_id = 555
)
UNION ALL
(
-- Get notifications where user 555 was the recipient
SELECT id, message, created_at
FROM notifications
WHERE recipient_id = 555
)
ORDER BY created_at DESC
LIMIT 50;UNION ALLvs.UNION:UNIONremoves duplicate rows, which requires an expensive sort or hash step.UNION ALLsimply sticks the two result sets together, which is much faster. In our case, if a user sends a notification to themselves, it will appear twice, but this is often a rare and acceptable edge case.
The Query Plan (AFTER the fix)
The new plan will involve two very fast, independent index scans, whose results are then combined, sorted, and limited.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=215.84..215.97 rows=50 width=45) (actual time=4.35..4.45 rows=50 loops=1)
-> Merge Append (cost=0.57..230.45 rows=570 width=45) (actual time=0.05..3.98 rows=420 loops=1)
Sort Key: created_at DESC
-> Index Scan on notifications (actor_id=555) ... (very fast)
-> Index Scan on notifications (recipient_id=555) ... (very fast)
Planning Time: 0.45 ms
Execution Time: 4.65 ms(The exact plan might vary, but the core idea is appending two efficient index scans.)
Analyzing the Result
- The Big Change: The plan is now a simple
Appendof two highly efficient, separateIndex Scanoperations. Each scan does one simple job that it's perfectly designed for. - Performance Comparison:
- Execution Time: Dropped from 185.62 ms to just 4.65 ms. A massive improvement.
- The new plan is simpler and more predictable for the database.
Conclusion
When you have a WHERE clause with an OR condition across different columns, and each of those columns has its own index, consider rewriting the query using UNION ALL. Breaking a complex problem into simpler ones often helps the database optimizer find a much more efficient path.