Of course. Now that we understand how memory is managed, let's move on to another core topic in C# that has a huge impact on algorithm performance: choosing the right data structure (Collection).
C# Core Optimization - Part 2: Choosing the Right Collection
After understanding memory management, let's dive into another fundamental topic in C# that greatly affects performance: choosing the right collection.
The Scenario 📝
- System: A service needs to process data. It loads a list of 10,000 products from a database.
- Problem: After loading the list, the service needs to perform many lookups to find products by their
ID. The developer stored this list in aList<Product>, and these lookups are slowing down the entire process.
Comparing Common Collections
To understand why this happens, we need to know about Big O Notation. It's a way to measure how the performance of an operation changes as the amount of data grows.
1. List<T> (A Simple List)
- Structure: An ordered list of items.
- Strength:
- Getting an item by its index:
myList[100]-> O(1) (extremely fast, constant time). - Weakness (The Bottleneck):
- Searching for an item by its value:
myList.FirstOrDefault(x => x.Id == 123)-> O(n) (linear time). - To find a product, it has to check the list from the beginning until it finds a match. In the worst case, it has to look through all 10,000 products. If the list had 1 million products, it would be 1 million checks.
2. Dictionary<TKey, TValue> (A Dictionary)
- Structure: A collection of key-value pairs, optimized for lookups using the key. It works based on a
hash table. - Strength:
- Finding an item by its key:
myDict[123]-> O(1) (extremely fast, constant time). - It can "jump" directly to the location of the value almost instantly, whether theDictionaryhas 100 or 10 million items. - Weakness:
- Uses slightly more memory than a
List<T>.
3. HashSet<T> (A Set)
- Structure: A collection of unique values, with no specific order. It is also based on a
hash table. - Strength:
- Checking if an item exists:
mySet.Contains(123)-> O(1) (extremely fast, constant time). This is the fastest way to answer the question, "Is this item in the collection?". - Weakness:
- It does not allow duplicate items.
Applying This to Our Problem
The Problematic Code (using List<T>):
// Assume products is a List<Product> with 10,000 items
List<Product> products = GetAllProductsFromDb();
// This operation is very slow because it has to scan the list (O(n))
Product findProduct(int id) {
return products.FirstOrDefault(p => p.Id == id);
}The Optimized Code (using Dictionary<TKey, TValue>):
// Pay a small, one-time cost to build the Dictionary
Dictionary<int, Product> productMap = GetAllProductsFromDb()
.ToDictionary(p => p.Id);
// All future lookups are super fast (O(1))
Product findProduct(int id) {
productMap.TryGetValue(id, out Product product);
return product;
}Conclusion (A Simple Rule):
- Need to access items by index and loop through them in order? ➡️ Use
List<T>. - Need to quickly look up a value based on a unique key? ➡️ Use
Dictionary<TKey, TValue>. - Need to check if an item exists and ensure all items are unique? ➡️ Use
HashSet<T>.
Choosing the right data structure is one of the most basic yet powerful optimization techniques. It can turn a slow, unusable algorithm (O(n)) into one that runs almost instantly (O(1)).