Case 11: Data Type Mismatch Disables Index Usage

The Story

Table 1: users, a modern table with 2,000,000 users. The id column is a BIGINT (a large number) and is the primary key (which is indexed).
Table 2: user_actions_legacy, a very old and large table with 80,000,000 rows. Due to a mistake in the past, the user_id column here was stored as a VARCHAR(255) (a string of text). This column also has an index on it.
The Problem: We need to get the 10 most recent actions for a specific user, which requires joining these two tables.

The Problematic SQL Query

The JOIN statement looks completely normal.

sql

EXPLAIN ANALYZE
SELECT
    u.name,
    e.action,
    e.created_at
FROM users u
JOIN user_actions_legacy e ON u.id = e.user_id
WHERE u.id = 987654
ORDER BY e.created_at DESC
LIMIT 10;

The Query Plan (BEFORE a fix)

You would expect the database to find the user with id = 987654 and then use the index on user_actions_legacy to find their actions. But here is what really happens:

                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1450000.85..145000.87 rows=10 width=55) (actual time=9850.12..9850.15 rows=10 loops=1)
   ->  Sort  (cost=1450000.85..145000.90 rows=20 width=55) (actual time=9850.10..9850.12 rows=10 loops=1)
         Sort Key: e.created_at DESC
         ->  Nested Loop  (cost=0.43..1450000.15 rows=20 width=55) (actual time=0.05..9845.50 rows=40 loops=1)
               ->  Index Scan using users_pkey on users u  (cost=0.43..8.45 rows=1 width=25) (actual time=0.02..0.03 rows=1 loops=1)
                     Index Cond: (id = 987654)
               ->  Seq Scan on user_actions_legacy e  (cost=0.00..1449991.50 rows=20 width=38) (actual time=0.02..9845.30 rows=40 loops=1)
                     Filter: (987654 = (user_id)::bigint)
 Planning Time: 0.28 ms
 Execution Time: 9850.25 ms

The Bottleneck

Look at this: The Seq Scan on user_actions_legacy line, and especially the Filter: (987654 = (user_id)::bigint) part.
The Core Problem:
- To compare u.id (a BIGINT) with e.user_id (a VARCHAR), PostgreSQL must convert one of them to match the other's type. Following its data type precedence rules, it chooses the safer option: it converts the VARCHAR to a BIGINT.
- It applied a casting function (::bigint) to the e.user_id column for every single row in the user_actions_legacy table.
- As we learned in Case 5, applying a function to an indexed column completely disables that index.
The Consequence: Instead of using the index for a fast lookup, PostgreSQL was forced to perform a Seq Scan—reading all 80 million rows of the user_actions_legacy table. A performance disaster.

The Solution

Method 1: Fix the Schema (The best long-term solution)
- Change the data type of the user_actions_legacy.user_id column to BIGINT so it matches users.id.
- ALTER TABLE user_actions_legacy ALTER COLUMN user_id TYPE BIGINT USING (user_id::bigint);
- Warning: This is the correct fix, but it can be a risky and slow operation on a live production system and requires careful planning.
Method 2: Fix the Query (The immediate workaround)
- If you can't change the schema, you have to "guide" the planner by casting in the other direction: cast the single value, not the column from the giant table.
sql
```
EXPLAIN ANALYZE
SELECT
    u.name,
    e.action,
    e.created_at
FROM users u
-- Cast u.id (just one value) to TEXT to match the type of e.user_id
JOIN user_actions_legacy e ON u.id::text = e.user_id
WHERE u.id = 987654
ORDER BY e.created_at DESC
LIMIT 10;
```
Now, the comparison is between two text strings, and the index on the e.user_id (VARCHAR) column can be used.

The Query Plan (AFTER the fix - with Method 2)

                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.50..10.52 rows=10 width=55) (actual time=0.08..0.10 rows=10 loops=1)
   ->  Sort  (cost=10.50..10.55 rows=20 width=55) (actual time=0.07..0.08 rows=10 loops=1)
         Sort Key: e.created_at DESC
         ->  Nested Loop  (cost=0.86..10.05 rows=20 width=55) (actual time=0.04..0.06 rows=40 loops=1)
               ->  Index Scan using users_pkey on users u  (cost=0.43..8.45 rows=1 width=25) (actual time=0.01..0.01 rows=1 loops=1)
                     Index Cond: (id = 987654)
               ->  Index Scan using idx_user_actions_legacy_user_id on user_actions_legacy e  (cost=0.43..1.55 rows=20 width=38) (actual time=0.02..0.04 rows=40 loops=1)
                     Index Cond: (user_id = (987654)::text)
 Planning Time: 0.35 ms
 Execution Time: 0.15 ms

Analyzing the Result

The Big Change: The Seq Scan on the 80-million-row table is gone, replaced by a super-fast Index Scan on idx_user_actions_legacy_user_id.
Performance Comparison:
- Execution Time: Dropped from 9850.25 ms (~9.8 seconds) to just 0.15 ms. An exponential improvement in performance.

Conclusion

Data type consistency between columns used in JOIN or WHERE clauses is critically important for performance.
An "invisible" mismatch can cause the database to silently cast a value and disable your index.
When you have a query that is unexpectedly slow and should be using an index, immediately check the data types of the columns involved.

Case 11: Data Type Mismatch Disables Index Usage ​

The Story ​

The Problematic SQL Query ​

The Query Plan (BEFORE a fix) ​

The Bottleneck ​

The Solution ​

The Query Plan (AFTER the fix - with Method 2) ​

Analyzing the Result ​

Conclusion ​