Careful with those enumerables

Posted by Sergio on 2010-05-09

Ever since .Net 2.0 introduced the yield keyword for creating enumerators, and even more after the introduction of LINQ in C# 3.0, I've seen more and more APIs return IEnumerable<T> instead of IList<T> or ICollection<T> or their older cousins the ArrayList and the array object.

That makes sense most of the time, especially for collections that aren't meant to be modified, and choosing between those different return types is not what I'm about to discuss here. You can find endless articles and threads about that.

The caller's perspective

What has caused me some trouble recently was being caught off guard by some unexpected performance penalties when using one of those IEnumerable<T>.

Not that IEnumerable<T> has any performance issue by itself but the way we deliver it can misguide the caller. Let me try to make that statement a little clearer with a small example.

Consider this ProductCatalog class that basically wraps a collection of Product objects.

public class ProductCatalog
{
  private readonly IInventoryService _inventoryService;
  private List<Product> _products;

  public ProductCatalog(IInventoryService inventoryService)
  {
    _inventoryService = inventoryService;
    _products = new List<Product>();
  }

  public void Populate()
  {
    //imagine this will populate from a database
    _products.Add(new Product {Id = 1, Price = 12.34m});
    _products.Add(new Product {Id = 2, Price = 11.22m});
    _products.Add(new Product {Id = 3, Price = 7.99m});
    _products.Add(new Product {Id = 4, Price = 3.49m});
	//...
    _products.Add(new Product {Id = 10000, Price = 75.99m});
  }

  public IEnumerable<Product> Products { get { return _products; } }

  public IEnumerable<Product> AvailableProducts
  {
    get
    {
      return Products
        .Where(product => _inventoryService.IsProductInStock(product));
    }
  }
}

And here's the code using it.

var catalog = new ProductCatalog(new InventoryService());
catalog.Populate();

var available = catalog.AvailableProducts;

foreach (var product in available)
{
  Console.Out.WriteLine("Product Id " + product.Id + " is available.");
}

var priceSum = 0m;

priceSum = available.Sum(product => product.Price);
Console.Out.WriteLine(
  "If I buy one of each product I'll pay: " + priceSum.ToString("c"));

The InventoryService is something like this:

public class InventoryService : IInventoryService
{
  public bool IsProductInStock(Product product)
  {
    Console.Out.WriteLine("Expensive verification for prod id: " + product.Id);
    //imagine something a little more complex and lengthy is happening here.
    return true;
  }
}

It seems trivial enough, but let's look at what the output gives us:

Expensive verification for prod id: 1
Product Id 1 is available.
Expensive verification for prod id: 2
Product Id 2 is available.
Expensive verification for prod id: 3
Product Id 3 is available.
Expensive verification for prod id: 4
Product Id 4 is available.
Expensive verification for prod id: 10000
Product Id 10000 is available.
Expensive verification for prod id: 1
Expensive verification for prod id: 2
Expensive verification for prod id: 3
Expensive verification for prod id: 4
Expensive verification for prod id: 10000
If I buy one of each product I'll pay: $111.03

See? If I didn't know how ProductCatalog.AvailableProducts was implemented I would be stumped by this behavior. Looking at this from the caller's point of view, I was just trying to use an object's collection property twice, probably thinking the return value would be the same collection of objects each time.

Well, they were the same individual Product objects in each call but they most definitely were not packaged in the same collection structure, and I don't know about you but I would never, in a million years, think that accessing that property would cause the collection to be rebuilt each time.

What's an API designer to do?

My suggestion in situations like the above, where the collection is closer to an object feed than a fixed list, is to implement that member as a method, not a property. Callers are more likely to associate a method call to something expensive than in the property access case.

Properties carry a historical expectation of being cheap to use and consistent. If your property's design includes the chance of not honoring this expectation, like in my example which depended on an injected IInventoryService, then use a method instead.

If you can't use a method for whatever reason, try to at least lazy load or cache the returned collection.

The last thing you want is requiring your callers to know implementation details of all your collection-returning members to decide how they should use the collection.

So this is not about the return type, huh?

As you may have noticed, the problem I went into had nothing to do with IEnumerable<T>, it could have happened with any of the other mentioned types — they're all enumerable anyway. I could certainly convert my example to IList<T> and still have the same problem.

The reason I called out IEnumerable<T> more explicitly is that it seems to happen more with it than the other ones. Maybe that's because the LINQ extension methods return IEnumerable<T> or because an IList<T> property backed by a fixed List<T> collection is a very common implementation choice.