Chắc chắn rồi! Chúng ta sẽ tiếp tục series với một kỹ thuật tối ưu hóa liên quan đến việc lựa chọn đúng loại index cho các bảng siêu lớn có dữ liệu được sắp xếp tự nhiên: BRIN Index.
Case 26: BRIN Indexes for Massive, Naturally Ordered Tables
The Scenario 📝
- System: A centralized logging system that collects billions of events every day.
- Table:
log_events, an extremely large table with 10 billion rows. This table is "append-only," meaning new data is always inserted at the end. - Columns:
event_id,event_timestamp(TIMESTAMPTZ),log_level,message. - Key Characteristic: Because data is always inserted in chronological order, the physical order of the rows on the disk almost perfectly matches the values in the
event_timestampcolumn. - The Problem: Engineers frequently need to run queries to investigate events within a small time range (a few minutes or hours) in the past.
The Standard Approach and Its Bottleneck
The usual approach is to create a standard B-Tree index on the timestamp column.
sql
CREATE INDEX idx_events_btree_timestamp ON log_events(event_timestamp);- Analyzing the Bottleneck:
- Huge Index Size: A
B-Treeindex on a 10-billion-row table would be enormous, potentially hundreds of gigabytes or even terabytes. It has to store a pointer for every single row. - High Maintenance Cost: Every
INSERToperation has to update this massive B-Tree structure, which reduces the system's write throughput. - High I/O: Even reading from such a large index requires a significant amount of I/O.
- Huge Index Size: A
The Solution: Use a BRIN Index ✅
The Logic:
BRINstands for Block Range Index. Instead of indexing every row, aBRINindex works at a much higher level. It only stores the minimum and maximum values for a range of adjacent data blocks/pages on the disk.- For example, it might record a single entry like: "In the next 1000 physical pages, the
event_timestampis between 10:00:00 and 10:15:00."
- For example, it might record a single entry like: "In the next 1000 physical pages, the
The Index Creation Command:
sqlCREATE INDEX idx_events_brin_timestamp ON log_events USING BRIN(event_timestamp);How It Works: When you run a query like
WHERE event_timestamp BETWEEN '10:05:00' AND '10:10:00', PostgreSQL will:- Scan the extremely small
BRINindex. - Quickly identify which "block ranges" could possibly contain data in this time frame (based on their min/max values). For example, it will immediately skip the ranges for 9 AM or 11 AM.
- Then, it only performs a scan on these targeted blocks, instead of the entire table.
- Scan the extremely small
Analyzing the Results ✨
- Tiny Size: A
BRINindex can be 100 to 1000 times smaller than aB-Treeindex. This saves a massive amount of disk space. - Faster Writes (
INSERT): Updating a compactBRINindex is much less expensive, which speeds up data writing. - Read Speed:
- A
BRINindex is a "lossy" index. It isn't precise down to the row like a B-Tree; it just narrows down the search area. - For finding a few specific rows, a B-Tree will be faster.
- However, for analytical queries over a small range of a huge table,
BRINprovides a massive performance improvement over aSequential Scanat a tiny fraction of the cost (in terms of space and maintenance) of a B-Tree.
- A
| Criteria | B-Tree Index | BRIN Index |
|---|---|---|
| Size | Very Large | Extremely Small |
| Write Speed | Medium | Very Fast |
| Read Speed | Fastest (Precise) | Fast (Narrows Down) |
| Best For | Most use cases | Massive, naturally ordered tables |
Conclusion:
- A
BRINindex is not a replacement for aB-Treein every situation. - It is a specialized tool designed for extremely large tables where the data has a high correlation between its physical order and the values in a column (classic examples are time-series data and logs).
- In these scenarios,
BRINoffers a fantastic trade-off: huge savings in storage and write costs in exchange for still-excellent read performance.