BanTool Documentation

Of course. We'll continue the series with a very common mistake, not due to complex techniques but to a lack of attention: The Unnecessary JOIN.

Case 21: The Unnecessary `JOIN`

The Story

System: An online learning platform.
Table 1: courses, containing 50,000 courses (id, title, author_id).
Table 2: enrollments, a very large table recording student enrollments, with 50,000,000 rows (id, user_id, course_id, enrolled_at).
The Problem: We need to get a list of all enrollments for a specific course, for example, course_id = 101.

The Problematic SQL Query

A developer might write this query out of habit, starting from the courses table and then JOINing to enrollments.

sql

EXPLAIN ANALYZE
SELECT
    e.user_id,
    e.enrolled_at
FROM courses c
JOIN enrollments e ON c.id = e.course_id
WHERE c.id = 101;

The Bottleneck

Extra Work: The database has to perform a JOIN. Even if it has indexes and the planner chooses an efficient plan (like a Nested Loop), it still has to perform these steps:
- a. Find the course with id = 101 in the courses table (using the courses index).
- b. Take that id and use the enrollments table's index to find the corresponding enrollments.
- Although fast, it had to work with two tables and two indexes.
Redundant Logic:
- If you look closely, we don't need any information from the courses table in the final SELECT list (e.user_id, e.enrolled_at).
- The condition WHERE c.id = 101 can be completely replaced by an equivalent condition directly on the enrollments table: WHERE e.course_id = 101.
- This JOIN is entirely redundant.

The Solution: Remove the Unnecessary `JOIN` ✅

The Logic: If your query doesn't SELECT any columns from a joined table, and the WHERE condition can be applied directly to the original table via a foreign key, then that JOIN is unnecessary.

The Optimized Query:

sql

EXPLAIN ANALYZE
SELECT
    e.user_id,
    e.enrolled_at
FROM enrollments e
WHERE e.course_id = 101;

Execution Plan Comparison

Before Optimization: The plan will have a Nested Loop Join (or Hash Join) node, showing that two tables are being processed.

->  Nested Loop ...
      ->  Index Scan on courses c ... (WHERE id = 101)
      ->  Index Scan on enrollments e ... (Index Cond: course_id = c.id)

After Optimization: The plan will be extremely simple, just a single Index Scan on the enrollments table.
```
->  Index Scan on enrollments e ... (Index Cond: course_id = 101)
```

Analyzing the Result

Less Work: The database only has to do one thing: scan the index on the enrollments table. It doesn't need to touch the courses table at all.
Higher Performance: By eliminating an operation (even a small one), the query will run faster, reduce I/O, and lower the CPU load. The difference might not be huge in a single query, but when called thousands of times per minute, it has a significant impact.
Simpler Code: The new query is shorter, easier to read, and more clearly expresses the intent.

Conclusion:

Before writing a JOIN, always ask yourself: "Do I actually need data from this joined table in my final result?"
Eliminating unnecessary JOINs is one of the simplest yet most effective optimizations. It's a "low-hanging fruit" during code reviews and performance tuning.

Case 21: The Unnecessary JOIN ​

The Story ​

The Problematic SQL Query ​

The Bottleneck ​

The Solution: Remove the Unnecessary JOIN ✅ ​

Execution Plan Comparison ​