Skip to content

Case 13: The NULL Trap: NOT IN vs. NOT EXISTS vs. LEFT JOIN

This case covers one of the most subtle and dangerous traps in SQL, where a query that looks correct can return the wrong result or perform terribly. We'll compare NOT IN, NOT EXISTS, and LEFT JOIN / IS NULL.

The Story

  • Table 1: customers, with 5,000,000 customers.
  • Table 2: orders, with 100,000,000 orders. The customer_id column in this table can be NULL (for example, for guest checkouts).
  • The Problem: We need to find all customers who have never placed an order.

The Different Query Methods and Their Analysis

This is the most natural way to write the query, but it hides a deadly trap.

sql
-- THIS QUERY CAN RETURN AN EMPTY RESULT!
EXPLAIN ANALYZE
SELECT name
FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);
  • The NULL Trap:
    • The logic of NOT IN is interpreted as a series of AND conditions: id != value1 AND id != value2 AND id != value3 AND ....
    • If the subquery (SELECT customer_id FROM orders) returns even one NULL value, the entire condition becomes id != value1 AND id != NULL.
    • In SQL, the result of any comparison with NULL (except for IS NULL) is UNKNOWN (not true or false).
    • Because of this, the entire WHERE clause will never evaluate to true for any customer.
  • The Result: The query will return 0 rows, which is completely wrong and makes you think that all customers have placed an order.

Method 2: Using NOT EXISTS (Safe and Efficient) ✅

NOT EXISTS is designed to handle these cases correctly and efficiently.

sql
EXPLAIN ANALYZE
SELECT name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.id
);
  • Why it works:
    • EXISTS and NOT EXISTS handle NULLs intuitively. The subquery is executed for each customer.
    • If the subquery finds an order that matches the customer_id (even if other orders are NULL), it returns true, and NOT EXISTS becomes false.
    • If the subquery doesn't find any matching orders, it returns false, and NOT EXISTS becomes true.
    • It is never affected by NULL values in the customer_id column of unrelated rows.

Method 3: Using LEFT JOIN / IS NULL (Also Very Safe and Efficient) ✅

This is another common pattern that often has similar, and sometimes even better, performance than NOT EXISTS.

  • The Logic:
    1. LEFT JOIN all customers to their orders.
    2. Customers who have never placed an order will have NULL values for all columns from the orders table.
    3. Filter for the rows where a key from the orders table is NULL.
sql
EXPLAIN ANALYZE
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;

Execution Plan Comparison

  • NOT IN Plan: Can be very bad, or if NULL is present, it returns nothing.
  • NOT EXISTS and LEFT JOIN / IS NULL Plans: PostgreSQL is very smart. It will often translate both of these queries into the same, highly efficient execution plan using an operator called a Hash Anti Join or a Nested Loop Anti Join.
    • An Anti Join is a highly optimized operation designed specifically to find rows in the left table that do not have a match in the right table.
MethodNULL HandlingPerformanceRecommendation
NOT INWrong, Dangerous ☠️Poor, UnreliableNever use if the subquery can contain NULLs
NOT EXISTSCorrect, SafeVery GoodVery clear intent, a top choice
LEFT JOIN / IS NULLCorrect, SafeVery Good, Sometimes FastestAn excellent alternative, sometimes better optimized

Conclusion:

  • Stay away from NOT IN when your subquery might return NULL values.
  • Both NOT EXISTS and LEFT JOIN / IS NULL are correct, safe, and efficient ways to find records that do not have a match. Choose the one you find clearer and more readable; the PostgreSQL planner will usually handle both of them optimally.