C# Core Optimization - Part 4: The "Performance Trap" of Boxing and Unboxing β
Of course. This time, we'll dissect a classic performance trap in C# that directly relates to our lessons on the Stack, Heap, struct, and class: Boxing and Unboxing.
The Scenario π β
- System: A piece of code needs to process a large list of integers (
int). Due to old habits or working with legacy code, a developer usesArrayListinstead ofList<int>. - The Problem: This code runs surprisingly slowly and puts a lot of pressure on the Garbage Collector (GC).
What Are Boxing and Unboxing? π§ β
This is the process of converting between a value type (like int, double, struct) and a reference type (object).
Boxing:
- This is the process of converting a
value type(which lives on the Stack) into areference type(object). - Analogy: You have a small diamond (
int). To send it through a shipping service that only accepts standard packages (object), you first have to put the diamond inside a box. - The Cost:
- Heap Allocation: This "box" is a new
objectthat gets allocated on the Heap. - Data Copying: The value of the diamond is copied into the box.
- GC Pressure: This box later becomes garbage that the GC has to clean up.
- Heap Allocation: This "box" is a new
- This is the process of converting a
Unboxing:
- This is the reverse process: converting the
object(the box on the Heap) back into avalue type. - The Cost:
- Type Checking: The runtime has to check if the thing inside the box is actually a diamond.
- Data Copying: The value is copied from the Heap back to the Stack.
- This is the reverse process: converting the
The Problematic Code (Using ArrayList) β
ArrayList is an old, non-generic collection that only works with object.
csharp
// ArrayList stores items of type `object`.
var list = new ArrayList();
for (int i = 0; i < 1_000_000; i++)
{
// BOXING happens here!
// Each `int` (a value type) must be "boxed" into an `object`
// to be added to the ArrayList.
// -> This creates 1 million objects on the HEAP.
list.Add(i);
}
long sum = 0;
foreach (object item in list)
{
// UNBOXING happens here!
// Each `object` must be "unboxed" to get the `int` value out.
// -> This requires 1 million type checks and data copies.
sum += (int)item;
}The Solution: Always Use Generic Collections β β
The Logic: Generic collections (like
List<T>), introduced in .NET 2.0, were created specifically to solve this problem. They are strongly-typed and know exactly what kind of data they are storing.The Optimized Code:
csharp// List<int> knows it is storing `int` values. var list = new List<int>(); for (int i = 0; i < 1_000_000; i++) { // NO BOXING. The `int` is stored directly. list.Add(i); } long sum = 0; foreach (int item in list) { // NO UNBOXING. The `item` is already an `int`. sum += item; }
Analyzing the Results β¨ β
- No Heap Allocations: The generic version does not create 1 million "boxes" on the heap.
- No GC Pressure: Because no garbage is created, the GC doesn't have to do any work, which significantly reduces CPU usage.
- Faster Execution: The code runs much faster because it completely eliminates the cost of allocating, copying, and type-checking.
Conclusion:
- The Golden Rule: "Avoid boxing and unboxing in performance-sensitive code."
- The easiest way to do this is to always prefer generic collections (
List<T>,Dictionary<TKey, TValue>) over their old, non-generic counterparts (ArrayList,Hashtable). - Be wary of any API that requires you to pass a
value typeinto a parameter of typeobject, as this is where boxing will silently occur.