BanTool Documentation

I'm glad you're finding the series helpful! We'll continue with a more architectural topic, where we intentionally "break" standard design rules to achieve extreme performance: Denormalization.

Case 15: Denormalization for Extreme Read Performance

The Story

System: A social network with massive traffic, similar to Twitter or Facebook.
Normalized Schema:
- users (100 million users): id, username, avatar_url, ...
- posts (billions of posts): id, user_id (foreign key to users.id), content, created_at.
The Problem: The "news feed" is the most frequently accessed page. To display 50 posts on the feed, the system has to run a query that JOINs the posts and users tables to get the author's username and avatar_url.

The SQL Query in a Normalized World

sql

SELECT
    p.content,
    p.created_at,
    u.username,
    u.avatar_url
FROM posts p
JOIN users u ON p.user_id = u.id
ORDER BY p.created_at DESC
LIMIT 50;

The Bottleneck

JOIN at Massive Scale: Although a single JOIN is very fast, when it's executed millions of times per minute by all online users, it creates a huge load on the database. The users table becomes a "hotspot," constantly being accessed.
Latency: Every JOIN has a certain cost. At a large scale, this small cost adds up to noticeable latency.

The Solution: Denormalization ✅

The Logic: Instead of JOINing to get the data every time we read, we will duplicate the less frequently changed data and store it directly where it's needed.
The New Schema Design: Modify the posts table to include the author's information.
- posts: id, user_id, content, created_at, author_username, author_avatar_url.
How it Works:
- On Write: When a user creates a new post, the application fetches their username and avatar_url and saves them directly into the new columns in the posts table.
- On Read: The query to get the news feed now becomes incredibly simple and fast.
sql
```
-- NO JOIN NEEDED!
SELECT
    content,
    created_at,
    author_username,
    author_avatar_url
FROM posts
ORDER BY created_at DESC
LIMIT 50;
```
This query only needs to scan the created_at index of a single table, completely eliminating the burden of the JOIN.

The Trade-off

This is a significant architectural decision with a clear trade-off.

The Gain:
- Extremely Fast Reads: The system's most critical query (reading the feed) becomes as fast as possible.
- Reduced Database Load: Significantly reduces the number of JOINs, helping the database handle more read requests.
The Cost:
- More Complex Writes:
  - The logic for INSERTing a new post now has to do the extra work of copying data.
  - The Biggest Problem: When a user changes their username or avatar, you must find all of their old posts to update the copied information. This can be a very heavy and complex background job.
- Redundant and Potentially Inconsistent Data:
  - You are storing the same data in multiple places, which uses more disk space.
  - There is a risk of data becoming inconsistent if the background update job fails (e.g., old posts showing the old username).

Conclusion

Denormalization is a high-level optimization technique used when read performance is the absolute priority and a specific JOIN has been identified as the main system bottleneck under high load.
It is a conscious trade-off: making writes more complex to make reads much, much simpler and faster.
This is not something you should do by default, but rather a solution for performance problems at a massive scale.

Case 15: Denormalization for Extreme Read Performance ​

The Story ​

The SQL Query in a Normalized World ​

The Bottleneck ​

The Solution: Denormalization ✅ ​

The Trade-off ​