Case 13: The NULL Trap: NOT IN vs. NOT EXISTS vs. LEFT JOIN
This case covers one of the most subtle and dangerous traps in SQL, where a query that looks correct can return the wrong result or perform terribly. We'll compare NOT IN, NOT EXISTS, and LEFT JOIN / IS NULL.
The Story
- Table 1:
customers, with 5,000,000 customers. - Table 2:
orders, with 100,000,000 orders. Thecustomer_idcolumn in this table can beNULL(for example, for guest checkouts). - The Problem: We need to find all customers who have never placed an order.
The Different Query Methods and Their Analysis
Method 1: Using NOT IN (Dangerous and Not Recommended)
This is the most natural way to write the query, but it hides a deadly trap.
sql
-- THIS QUERY CAN RETURN AN EMPTY RESULT!
EXPLAIN ANALYZE
SELECT name
FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);- The
NULLTrap:- The logic of
NOT INis interpreted as a series ofANDconditions:id != value1 AND id != value2 AND id != value3 AND .... - If the subquery
(SELECT customer_id FROM orders)returns even oneNULLvalue, the entire condition becomesid != value1 AND id != NULL. - In SQL, the result of any comparison with
NULL(except forIS NULL) isUNKNOWN(nottrueorfalse). - Because of this, the entire
WHEREclause will never evaluate totruefor any customer.
- The logic of
- The Result: The query will return 0 rows, which is completely wrong and makes you think that all customers have placed an order.
Method 2: Using NOT EXISTS (Safe and Efficient) ✅
NOT EXISTS is designed to handle these cases correctly and efficiently.
sql
EXPLAIN ANALYZE
SELECT name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.id
);- Why it works:
EXISTSandNOT EXISTShandleNULLs intuitively. The subquery is executed for each customer.- If the subquery finds an order that matches the
customer_id(even if other orders areNULL), it returnstrue, andNOT EXISTSbecomesfalse. - If the subquery doesn't find any matching orders, it returns
false, andNOT EXISTSbecomestrue. - It is never affected by
NULLvalues in thecustomer_idcolumn of unrelated rows.
Method 3: Using LEFT JOIN / IS NULL (Also Very Safe and Efficient) ✅
This is another common pattern that often has similar, and sometimes even better, performance than NOT EXISTS.
- The Logic:
LEFT JOINall customers to their orders.- Customers who have never placed an order will have
NULLvalues for all columns from theorderstable. - Filter for the rows where a key from the
orderstable isNULL.
sql
EXPLAIN ANALYZE
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;Execution Plan Comparison
NOT INPlan: Can be very bad, or ifNULLis present, it returns nothing.NOT EXISTSandLEFT JOIN / IS NULLPlans: PostgreSQL is very smart. It will often translate both of these queries into the same, highly efficient execution plan using an operator called aHash Anti Joinor aNested Loop Anti Join.- An
Anti Joinis a highly optimized operation designed specifically to find rows in the left table that do not have a match in the right table.
- An
| Method | NULL Handling | Performance | Recommendation |
|---|---|---|---|
NOT IN | Wrong, Dangerous ☠️ | Poor, Unreliable | Never use if the subquery can contain NULLs |
NOT EXISTS | Correct, Safe | Very Good | Very clear intent, a top choice |
LEFT JOIN / IS NULL | Correct, Safe | Very Good, Sometimes Fastest | An excellent alternative, sometimes better optimized |
Conclusion:
- Stay away from
NOT INwhen your subquery might returnNULLvalues. - Both
NOT EXISTSandLEFT JOIN / IS NULLare correct, safe, and efficient ways to find records that do not have a match. Choose the one you find clearer and more readable; the PostgreSQL planner will usually handle both of them optimally.