Skip to content

Case 7: OR on Different Indexed Columns vs. UNION ALL

The Story

  • Table: notifications, a massive table with 50,000,000 rows logging all user notifications.
  • Columns: id, actor_id (who did the action), recipient_id (who got the notification), message, created_at.
  • Current State: To speed things up, there are two separate indexes:
    • One on actor_id.
    • One on recipient_id.
  • The Problem: We need to build an "Activity Feed" that shows all notifications where a user (e.g., user_id = 555) was either the one who acted OR the one who received the notification.

The Problematic SQL Query

The most natural way to write this is with an OR condition.

sql
EXPLAIN ANALYZE
SELECT id, message, created_at
FROM notifications
WHERE actor_id = 555 OR recipient_id = 555
ORDER BY created_at DESC
LIMIT 50;

The Query Plan (BEFORE a fix)

When PostgreSQL sees an OR on different columns, it can get confused. It might:

  • Use only one of the two indexes and then filter the other condition manually.
  • Do a full Seq Scan if it thinks combining indexes is too expensive.
  • Use a BitmapOr to combine the results of two separate index scans. This can be okay, but it's not always the fastest.

Let's assume the planner chooses the BitmapOr strategy:

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15345.67..15345.80 rows=50 width=45) (actual time=185.34..185.45 rows=50 loops=1)
   ->  Sort  (cost=15345.67..15395.21 rows=19816 width=45) (actual time=185.32..185.39 rows=50 loops=1)
         Sort Key: created_at DESC
         ->  Bitmap Heap Scan on notifications  (cost=465.80..14850.12 rows=19816 width=45) (actual time=12.45..160.78 rows=19816 loops=1)
               Recheck Cond: ((actor_id = 555) OR (recipient_id = 555))
               ->  BitmapOr  (cost=465.80..465.80 rows=19816 width=0) (actual time=11.98..11.99 rows=0 loops=1)
                     ->  Bitmap Index Scan on idx_notifications_actor_id  (cost=0.00..85.45 rows=3540 width=0) (actual time=2.34..2.34 rows=3540 loops=1)
                           Index Cond: (actor_id = 555)
                     ->  Bitmap Index Scan on idx_notifications_recipient_id  (cost=0.00..380.12 rows=16276 width=0) (actual time=9.64..9.64 rows=16276 loops=1)
                           Index Cond: (recipient_id = 555)
 Planning Time: 0.35 ms
 Execution Time: 185.62 ms

185ms might not seem terrible, but we can do much better. The problem is that it has to fetch two sets of row locations (bitmaps), merge them, then go to the table to get the actual data, then sort it, and only then can it apply the LIMIT.

The Solution: Use UNION ALL

  • The Logic: Divide and conquer. Instead of making the planner handle a complex OR, we split the query into two simple SELECT statements. Each SELECT is optimized to use one index perfectly. Then, we combine the results.

  • The Optimized Query:

sql
EXPLAIN ANALYZE
(
    -- Get notifications where user 555 was the actor
    SELECT id, message, created_at
    FROM notifications
    WHERE actor_id = 555
)
UNION ALL
(
    -- Get notifications where user 555 was the recipient
    SELECT id, message, created_at
    FROM notifications
    WHERE recipient_id = 555
)
ORDER BY created_at DESC
LIMIT 50;
  • UNION ALL vs. UNION: UNION removes duplicate rows, which requires an expensive sort or hash step. UNION ALL simply sticks the two result sets together, which is much faster. In our case, if a user sends a notification to themselves, it will appear twice, but this is often a rare and acceptable edge case.

The Query Plan (AFTER the fix)

The new plan will involve two very fast, independent index scans, whose results are then combined, sorted, and limited.

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=215.84..215.97 rows=50 width=45) (actual time=4.35..4.45 rows=50 loops=1)
   ->  Merge Append  (cost=0.57..230.45 rows=570 width=45) (actual time=0.05..3.98 rows=420 loops=1)
         Sort Key: created_at DESC
         ->  Index Scan on notifications (actor_id=555) ... (very fast)
         ->  Index Scan on notifications (recipient_id=555) ... (very fast)
 Planning Time: 0.45 ms
 Execution Time: 4.65 ms

(The exact plan might vary, but the core idea is appending two efficient index scans.)

Analyzing the Result

  1. The Big Change: The plan is now a simple Append of two highly efficient, separate Index Scan operations. Each scan does one simple job that it's perfectly designed for.
  2. Performance Comparison:
    • Execution Time: Dropped from 185.62 ms to just 4.65 ms. A massive improvement.
    • The new plan is simpler and more predictable for the database.

Conclusion

When you have a WHERE clause with an OR condition across different columns, and each of those columns has its own index, consider rewriting the query using UNION ALL. Breaking a complex problem into simpler ones often helps the database optimizer find a much more efficient path.