Careful with those enumerables
Ever since .Net 2.0 introduced the yield
keyword
for creating enumerators, and even more after the introduction
of LINQ in C# 3.0, I've seen more and more APIs return
IEnumerable<T>
instead of IList<T>
or ICollection<T>
or their older cousins
the ArrayList
and the array object.
That makes sense most of the time, especially for collections that aren't meant to be modified, and choosing between those different return types is not what I'm about to discuss here. You can find endless articles and threads about that.
The caller's perspective
What has caused me some trouble recently was being caught
off guard by some unexpected performance penalties when
using one of those IEnumerable<T>
.
Not that IEnumerable<T>
has any performance
issue by itself but the way we deliver it can misguide the caller.
Let me try to make that statement a little clearer with a
small example.
Consider this ProductCatalog
class that
basically wraps a collection of Product
objects.
public class ProductCatalog { private readonly IInventoryService _inventoryService; private List<Product> _products; public ProductCatalog(IInventoryService inventoryService) { _inventoryService = inventoryService; _products = new List<Product>(); } public void Populate() { //imagine this will populate from a database _products.Add(new Product {Id = 1, Price = 12.34m}); _products.Add(new Product {Id = 2, Price = 11.22m}); _products.Add(new Product {Id = 3, Price = 7.99m}); _products.Add(new Product {Id = 4, Price = 3.49m}); //... _products.Add(new Product {Id = 10000, Price = 75.99m}); } public IEnumerable<Product> Products { get { return _products; } } public IEnumerable<Product> AvailableProducts { get { return Products .Where(product => _inventoryService.IsProductInStock(product)); } } }
And here's the code using it.
var catalog = new ProductCatalog(new InventoryService()); catalog.Populate(); var available = catalog.AvailableProducts; foreach (var product in available) { Console.Out.WriteLine("Product Id " + product.Id + " is available."); } var priceSum = 0m; priceSum = available.Sum(product => product.Price); Console.Out.WriteLine( "If I buy one of each product I'll pay: " + priceSum.ToString("c"));
The InventoryService
is something like this:
public class InventoryService : IInventoryService { public bool IsProductInStock(Product product) { Console.Out.WriteLine("Expensive verification for prod id: " + product.Id); //imagine something a little more complex and lengthy is happening here. return true; } }
It seems trivial enough, but let's look at what the output gives us:
Expensive verification for prod id: 1 Product Id 1 is available. Expensive verification for prod id: 2 Product Id 2 is available. Expensive verification for prod id: 3 Product Id 3 is available. Expensive verification for prod id: 4 Product Id 4 is available. Expensive verification for prod id: 10000 Product Id 10000 is available. Expensive verification for prod id: 1 Expensive verification for prod id: 2 Expensive verification for prod id: 3 Expensive verification for prod id: 4 Expensive verification for prod id: 10000 If I buy one of each product I'll pay: $111.03
See? If I didn't know how ProductCatalog.AvailableProducts
was implemented I would be stumped by this behavior. Looking at
this from the caller's point of view, I was just trying to use an
object's collection property twice, probably thinking the return
value would be the same collection of objects each time.
Well, they were the same individual Product
objects
in each call but they most definitely were not packaged in the
same collection structure, and I don't know about you but I
would never, in a million years, think that accessing that
property would cause the collection to be rebuilt each time.
What's an API designer to do?
My suggestion in situations like the above, where the collection is closer to an object feed than a fixed list, is to implement that member as a method, not a property. Callers are more likely to associate a method call to something expensive than in the property access case.
Properties carry a historical expectation of being
cheap to use and consistent. If your property's design
includes the chance of not honoring this expectation, like
in my example which depended on an injected IInventoryService
,
then use a method instead.
If you can't use a method for whatever reason, try to at least lazy load or cache the returned collection.
The last thing you want is requiring your callers to know implementation details of all your collection-returning members to decide how they should use the collection.
So this is not about the return type, huh?
As you may have noticed, the problem I went into had
nothing to do with IEnumerable<T>
, it
could have happened with any of the other mentioned types
— they're all enumerable anyway. I could
certainly convert my example to IList<T>
and still have the same problem.
The reason I called out IEnumerable<T>
more
explicitly is that it seems to happen more with it than the
other ones. Maybe that's because the LINQ extension methods
return IEnumerable<T>
or because an
IList<T>
property backed by a fixed
List<T>
collection is a very common implementation
choice.