Of course. We'll continue the series with a very common mistake, not due to complex techniques but to a lack of attention: The Unnecessary JOIN.
Case 21: The Unnecessary JOIN
The Story
- System: An online learning platform.
- Table 1:
courses, containing 50,000 courses (id,title,author_id). - Table 2:
enrollments, a very large table recording student enrollments, with 50,000,000 rows (id,user_id,course_id,enrolled_at). - The Problem: We need to get a list of all enrollments for a specific course, for example,
course_id = 101.
The Problematic SQL Query
A developer might write this query out of habit, starting from the courses table and then JOINing to enrollments.
sql
EXPLAIN ANALYZE
SELECT
e.user_id,
e.enrolled_at
FROM courses c
JOIN enrollments e ON c.id = e.course_id
WHERE c.id = 101;The Bottleneck
- Extra Work: The database has to perform a
JOIN. Even if it has indexes and the planner chooses an efficient plan (like aNested Loop), it still has to perform these steps:- a. Find the course with
id = 101in thecoursestable (using thecoursesindex). - b. Take that
idand use theenrollmentstable's index to find the corresponding enrollments. - Although fast, it had to work with two tables and two indexes.
- a. Find the course with
- Redundant Logic:
- If you look closely, we don't need any information from the
coursestable in the finalSELECTlist (e.user_id,e.enrolled_at). - The condition
WHERE c.id = 101can be completely replaced by an equivalent condition directly on theenrollmentstable:WHERE e.course_id = 101. - This
JOINis entirely redundant.
- If you look closely, we don't need any information from the
The Solution: Remove the Unnecessary JOIN ✅
The Logic: If your query doesn't
SELECTany columns from a joined table, and theWHEREcondition can be applied directly to the original table via a foreign key, then thatJOINis unnecessary.The Optimized Query:
sqlEXPLAIN ANALYZE SELECT e.user_id, e.enrolled_at FROM enrollments e WHERE e.course_id = 101;
Execution Plan Comparison
- Before Optimization: The plan will have a
Nested Loop Join(orHash Join) node, showing that two tables are being processed.-> Nested Loop ... -> Index Scan on courses c ... (WHERE id = 101) -> Index Scan on enrollments e ... (Index Cond: course_id = c.id) - After Optimization: The plan will be extremely simple, just a single
Index Scanon theenrollmentstable.-> Index Scan on enrollments e ... (Index Cond: course_id = 101)
Analyzing the Result
- Less Work: The database only has to do one thing: scan the index on the
enrollmentstable. It doesn't need to touch thecoursestable at all. - Higher Performance: By eliminating an operation (even a small one), the query will run faster, reduce I/O, and lower the CPU load. The difference might not be huge in a single query, but when called thousands of times per minute, it has a significant impact.
- Simpler Code: The new query is shorter, easier to read, and more clearly expresses the intent.
Conclusion:
- Before writing a
JOIN, always ask yourself: "Do I actually need data from this joined table in my final result?" - Eliminating unnecessary
JOINs is one of the simplest yet most effective optimizations. It's a "low-hanging fruit" during code reviews and performance tuning.