Skip to content

Of course. Now that we understand how memory is managed, let's move on to another core topic in C# that has a huge impact on algorithm performance: choosing the right data structure (Collection).


C# Core Optimization - Part 2: Choosing the Right Collection

After understanding memory management, let's dive into another fundamental topic in C# that greatly affects performance: choosing the right collection.


The Scenario 📝

  • System: A service needs to process data. It loads a list of 10,000 products from a database.
  • Problem: After loading the list, the service needs to perform many lookups to find products by their ID. The developer stored this list in a List<Product>, and these lookups are slowing down the entire process.

Comparing Common Collections

To understand why this happens, we need to know about Big O Notation. It's a way to measure how the performance of an operation changes as the amount of data grows.

1. List<T> (A Simple List)

  • Structure: An ordered list of items.
  • Strength:
  • Getting an item by its index: myList[100] -> O(1) (extremely fast, constant time).
  • Weakness (The Bottleneck):
  • Searching for an item by its value: myList.FirstOrDefault(x => x.Id == 123) -> O(n) (linear time). - To find a product, it has to check the list from the beginning until it finds a match. In the worst case, it has to look through all 10,000 products. If the list had 1 million products, it would be 1 million checks.

2. Dictionary<TKey, TValue> (A Dictionary)

  • Structure: A collection of key-value pairs, optimized for lookups using the key. It works based on a hash table.
  • Strength:
  • Finding an item by its key: myDict[123] -> O(1) (extremely fast, constant time). - It can "jump" directly to the location of the value almost instantly, whether the Dictionary has 100 or 10 million items.
  • Weakness:
  • Uses slightly more memory than a List<T>.

3. HashSet<T> (A Set)

  • Structure: A collection of unique values, with no specific order. It is also based on a hash table.
  • Strength:
  • Checking if an item exists: mySet.Contains(123) -> O(1) (extremely fast, constant time). This is the fastest way to answer the question, "Is this item in the collection?".
  • Weakness:
  • It does not allow duplicate items.

Applying This to Our Problem

The Problematic Code (using List<T>):

csharp
// Assume products is a List<Product> with 10,000 items
List<Product> products = GetAllProductsFromDb();

// This operation is very slow because it has to scan the list (O(n))
Product findProduct(int id) {
  return products.FirstOrDefault(p => p.Id == id);
}

The Optimized Code (using Dictionary<TKey, TValue>):

csharp
// Pay a small, one-time cost to build the Dictionary
Dictionary<int, Product> productMap = GetAllProductsFromDb()
                      .ToDictionary(p => p.Id);

// All future lookups are super fast (O(1))
Product findProduct(int id) {
  productMap.TryGetValue(id, out Product product);
  return product;
}

Conclusion (A Simple Rule):

  • Need to access items by index and loop through them in order? ➡️ Use List<T>.
  • Need to quickly look up a value based on a unique key? ➡️ Use Dictionary<TKey, TValue>.
  • Need to check if an item exists and ensure all items are unique? ➡️ Use HashSet<T>.

Choosing the right data structure is one of the most basic yet powerful optimization techniques. It can turn a slow, unusable algorithm (O(n)) into one that runs almost instantly (O(1)).