Skip to content

Chắc chắn rồi! Chúng ta sẽ tiếp tục series với một kỹ thuật tối ưu hóa liên quan đến việc lựa chọn đúng loại index cho các bảng siêu lớn có dữ liệu được sắp xếp tự nhiên: BRIN Index.


Case 26: BRIN Indexes for Massive, Naturally Ordered Tables

The Scenario 📝

  • System: A centralized logging system that collects billions of events every day.
  • Table: log_events, an extremely large table with 10 billion rows. This table is "append-only," meaning new data is always inserted at the end.
  • Columns: event_id, event_timestamp (TIMESTAMPTZ), log_level, message.
  • Key Characteristic: Because data is always inserted in chronological order, the physical order of the rows on the disk almost perfectly matches the values in the event_timestamp column.
  • The Problem: Engineers frequently need to run queries to investigate events within a small time range (a few minutes or hours) in the past.

The Standard Approach and Its Bottleneck

The usual approach is to create a standard B-Tree index on the timestamp column.

sql
CREATE INDEX idx_events_btree_timestamp ON log_events(event_timestamp);
  • Analyzing the Bottleneck:
    1. Huge Index Size: A B-Tree index on a 10-billion-row table would be enormous, potentially hundreds of gigabytes or even terabytes. It has to store a pointer for every single row.
    2. High Maintenance Cost: Every INSERT operation has to update this massive B-Tree structure, which reduces the system's write throughput.
    3. High I/O: Even reading from such a large index requires a significant amount of I/O.

The Solution: Use a BRIN Index

  • The Logic: BRIN stands for Block Range Index. Instead of indexing every row, a BRIN index works at a much higher level. It only stores the minimum and maximum values for a range of adjacent data blocks/pages on the disk.

    • For example, it might record a single entry like: "In the next 1000 physical pages, the event_timestamp is between 10:00:00 and 10:15:00."
  • The Index Creation Command:

    sql
    CREATE INDEX idx_events_brin_timestamp ON log_events USING BRIN(event_timestamp);
  • How It Works: When you run a query like WHERE event_timestamp BETWEEN '10:05:00' AND '10:10:00', PostgreSQL will:

    1. Scan the extremely small BRIN index.
    2. Quickly identify which "block ranges" could possibly contain data in this time frame (based on their min/max values). For example, it will immediately skip the ranges for 9 AM or 11 AM.
    3. Then, it only performs a scan on these targeted blocks, instead of the entire table.

Analyzing the Results ✨

  • Tiny Size: A BRIN index can be 100 to 1000 times smaller than a B-Tree index. This saves a massive amount of disk space.
  • Faster Writes (INSERT): Updating a compact BRIN index is much less expensive, which speeds up data writing.
  • Read Speed:
    • A BRIN index is a "lossy" index. It isn't precise down to the row like a B-Tree; it just narrows down the search area.
    • For finding a few specific rows, a B-Tree will be faster.
    • However, for analytical queries over a small range of a huge table, BRIN provides a massive performance improvement over a Sequential Scan at a tiny fraction of the cost (in terms of space and maintenance) of a B-Tree.
CriteriaB-Tree IndexBRIN Index
SizeVery LargeExtremely Small
Write SpeedMediumVery Fast
Read SpeedFastest (Precise)Fast (Narrows Down)
Best ForMost use casesMassive, naturally ordered tables

Conclusion:

  • A BRIN index is not a replacement for a B-Tree in every situation.
  • It is a specialized tool designed for extremely large tables where the data has a high correlation between its physical order and the values in a column (classic examples are time-series data and logs).
  • In these scenarios, BRIN offers a fantastic trade-off: huge savings in storage and write costs in exchange for still-excellent read performance.