Skip to content

Of course. We'll continue the series with a very common mistake, not due to complex techniques but to a lack of attention: The Unnecessary JOIN.


Case 21: The Unnecessary JOIN

The Story

  • System: An online learning platform.
  • Table 1: courses, containing 50,000 courses (id, title, author_id).
  • Table 2: enrollments, a very large table recording student enrollments, with 50,000,000 rows (id, user_id, course_id, enrolled_at).
  • The Problem: We need to get a list of all enrollments for a specific course, for example, course_id = 101.

The Problematic SQL Query

A developer might write this query out of habit, starting from the courses table and then JOINing to enrollments.

sql
EXPLAIN ANALYZE
SELECT
    e.user_id,
    e.enrolled_at
FROM courses c
JOIN enrollments e ON c.id = e.course_id
WHERE c.id = 101;

The Bottleneck

  1. Extra Work: The database has to perform a JOIN. Even if it has indexes and the planner chooses an efficient plan (like a Nested Loop), it still has to perform these steps:
    • a. Find the course with id = 101 in the courses table (using the courses index).
    • b. Take that id and use the enrollments table's index to find the corresponding enrollments.
    • Although fast, it had to work with two tables and two indexes.
  2. Redundant Logic:
    • If you look closely, we don't need any information from the courses table in the final SELECT list (e.user_id, e.enrolled_at).
    • The condition WHERE c.id = 101 can be completely replaced by an equivalent condition directly on the enrollments table: WHERE e.course_id = 101.
    • This JOIN is entirely redundant.

The Solution: Remove the Unnecessary JOIN

  • The Logic: If your query doesn't SELECT any columns from a joined table, and the WHERE condition can be applied directly to the original table via a foreign key, then that JOIN is unnecessary.

  • The Optimized Query:

    sql
    EXPLAIN ANALYZE
    SELECT
        e.user_id,
        e.enrolled_at
    FROM enrollments e
    WHERE e.course_id = 101;

Execution Plan Comparison

  • Before Optimization: The plan will have a Nested Loop Join (or Hash Join) node, showing that two tables are being processed.
    ->  Nested Loop ...
          ->  Index Scan on courses c ... (WHERE id = 101)
          ->  Index Scan on enrollments e ... (Index Cond: course_id = c.id)
  • After Optimization: The plan will be extremely simple, just a single Index Scan on the enrollments table.
    ->  Index Scan on enrollments e ... (Index Cond: course_id = 101)

Analyzing the Result

  1. Less Work: The database only has to do one thing: scan the index on the enrollments table. It doesn't need to touch the courses table at all.
  2. Higher Performance: By eliminating an operation (even a small one), the query will run faster, reduce I/O, and lower the CPU load. The difference might not be huge in a single query, but when called thousands of times per minute, it has a significant impact.
  3. Simpler Code: The new query is shorter, easier to read, and more clearly expresses the intent.

Conclusion:

  • Before writing a JOIN, always ask yourself: "Do I actually need data from this joined table in my final result?"
  • Eliminating unnecessary JOINs is one of the simplest yet most effective optimizations. It's a "low-hanging fruit" during code reviews and performance tuning.