Sergio and the sigil

Events do not a community make

Posted by Sergio on 2010-06-15

It's a recurring theme. The relationship between the .Net developers community, the community leaders, Microsoft products, and Microsoft itself is an endless debate.

These last couple of weeks I've seen it come back again, strong, with the usual rants, departure notices, and rebuttals (which are getting fewer and further between).

The Microsoft .Net User Group Leaders

I run a small .Net UG in Chicago and when I have a chance to talk with other user group organizers, no matter how truly dedicated and valuable they are, I invariably come to the same sad conclusion.

The majority of the UG's act as mere outlets for showing off introductory talks on whatever is the shiny new technology from Microsoft. I wonder if all user groups have a clear (or even unclear) mission statement.

There's nothing intrinsically wrong with providing 100-level content about new products. The problem I see is that this doesn't really contribute much to build any kind of community and we can't make our UGs be restricted to just that.

In many UGs there's a lot of emphasis put in "Microsoft" and ".Net" but almost none in "User" or "Group." I think this is completely backwards. The UG leaders should have their focus on their members 100% of the time, thinking how the UG can be used to truly benefit their members, making all of us better developers, not just better .Net developers (Microsoft or ALT) and much less better MS developers.

The .Net Developer Community

We all know this community isn't exactly like other developer communities. I'd even go out on a limb and say that the use of the word community is a bit of a stretch, it's more like a demographic.

We grew up simply waiting what comes out of One Microsoft Way and attended MS-produced events and conferences. I think there are very few occurrences of organically formed community manifestations in .Net, like the Code Camps and ALT.NET.

That's something that can't be changed by just talking about it. As community members there are things we could do.

  • Look for developer events even if you weren't forwarded one of those MSDN emails (there's Community Megaphone, Meetup, EventBrite, just to name a few).
  • Attend non-MS user group meetings. Seriously. If you're a web developer, look for a JavaScript or Flash UG. If you're curious about the Windows Phone, check out the iPhone and Android users groups too. There's a lot to learn and much networking to be done.
  • Attend Code Camps, even if costs you a small road trip. There's no way you'll ever regret doing this.

We will only be a community when we start acting as such.

Microsoft

At least in my region Microsoft does a very commendable work in trying to connect the .Net developers with non-.Net ones. More and more I see MS interested in leaving the community organization to community itself, providing some level of infrastructure, access to good speakers and sponsorship when possible.

I'm not convinced MS knows how to measure the health of .Net communities. For me, even looking from MS' standpoint, a successful community is one where a novices have easy access to experts and where knowledge and opportunities are exchanged. It doesn't matter that you have tens of thousands of .Net developers in your region if you don't know where to find them for a question or business proposition.

Microsoft Products

Like any other technology company, Microsoft releases products that range from tremendously successful to niche applications to complete failures.

We can never dispose of critical eyes when analyzing a new product. The community members invest their time when attending meetings and events. We need to always demand perspective in addition to purely technical content.

Here I want to draw the line and separate developer tools from core technologies. As a developer I'm ok to be seen as a consumer of developer tools, like Visual Studio and Blend, but when it comes to the platform technologies, like IIS, SQL Server, Azure, Silverlight, etc I much rather be treated as a coworker that is trying to create high quality software with these products. Forget that there's money to be made by both of us in this process when we talk about core technologies.

Can we fix this?

I used to be more optimistic about this situation. I still hope we can stir the will to participate, produce, and consume all things .Net.

The one thing I know for sure is that I won't sit and watch until it happens. I try to do the tiny bit I can but I like this stuff too much and I have no problem in carrying my energy over to another platform.

But the question stands — Is there a way to make it work?

Careful with those enumerables

Posted by Sergio on 2010-05-09

Ever since .Net 2.0 introduced the yield keyword for creating enumerators, and even more after the introduction of LINQ in C# 3.0, I've seen more and more APIs return IEnumerable<T> instead of IList<T> or ICollection<T> or their older cousins the ArrayList and the array object.

That makes sense most of the time, especially for collections that aren't meant to be modified, and choosing between those different return types is not what I'm about to discuss here. You can find endless articles and threads about that.

The caller's perspective

What has caused me some trouble recently was being caught off guard by some unexpected performance penalties when using one of those IEnumerable<T>.

Not that IEnumerable<T> has any performance issue by itself but the way we deliver it can misguide the caller. Let me try to make that statement a little clearer with a small example.

Consider this ProductCatalog class that basically wraps a collection of Product objects.

public class ProductCatalog
{
  private readonly IInventoryService _inventoryService;
  private List<Product> _products;

  public ProductCatalog(IInventoryService inventoryService)
  {
    _inventoryService = inventoryService;
    _products = new List<Product>();
  }

  public void Populate()
  {
    //imagine this will populate from a database
    _products.Add(new Product {Id = 1, Price = 12.34m});
    _products.Add(new Product {Id = 2, Price = 11.22m});
    _products.Add(new Product {Id = 3, Price = 7.99m});
    _products.Add(new Product {Id = 4, Price = 3.49m});
	//...
    _products.Add(new Product {Id = 10000, Price = 75.99m});
  }

  public IEnumerable<Product> Products { get { return _products; } }

  public IEnumerable<Product> AvailableProducts
  {
    get
    {
      return Products
        .Where(product => _inventoryService.IsProductInStock(product));
    }
  }
}

And here's the code using it.

var catalog = new ProductCatalog(new InventoryService());
catalog.Populate();

var available = catalog.AvailableProducts;

foreach (var product in available)
{
  Console.Out.WriteLine("Product Id " + product.Id + " is available.");
}

var priceSum = 0m;

priceSum = available.Sum(product => product.Price);
Console.Out.WriteLine(
  "If I buy one of each product I'll pay: " + priceSum.ToString("c"));

The InventoryService is something like this:

public class InventoryService : IInventoryService
{
  public bool IsProductInStock(Product product)
  {
    Console.Out.WriteLine("Expensive verification for prod id: " + product.Id);
    //imagine something a little more complex and lengthy is happening here.
    return true;
  }
}

It seems trivial enough, but let's look at what the output gives us:

Expensive verification for prod id: 1
Product Id 1 is available.
Expensive verification for prod id: 2
Product Id 2 is available.
Expensive verification for prod id: 3
Product Id 3 is available.
Expensive verification for prod id: 4
Product Id 4 is available.
Expensive verification for prod id: 10000
Product Id 10000 is available.
Expensive verification for prod id: 1
Expensive verification for prod id: 2
Expensive verification for prod id: 3
Expensive verification for prod id: 4
Expensive verification for prod id: 10000
If I buy one of each product I'll pay: $111.03

See? If I didn't know how ProductCatalog.AvailableProducts was implemented I would be stumped by this behavior. Looking at this from the caller's point of view, I was just trying to use an object's collection property twice, probably thinking the return value would be the same collection of objects each time.

Well, they were the same individual Product objects in each call but they most definitely were not packaged in the same collection structure, and I don't know about you but I would never, in a million years, think that accessing that property would cause the collection to be rebuilt each time.

What's an API designer to do?

My suggestion in situations like the above, where the collection is closer to an object feed than a fixed list, is to implement that member as a method, not a property. Callers are more likely to associate a method call to something expensive than in the property access case.

Properties carry a historical expectation of being cheap to use and consistent. If your property's design includes the chance of not honoring this expectation, like in my example which depended on an injected IInventoryService, then use a method instead.

If you can't use a method for whatever reason, try to at least lazy load or cache the returned collection.

The last thing you want is requiring your callers to know implementation details of all your collection-returning members to decide how they should use the collection.

So this is not about the return type, huh?

As you may have noticed, the problem I went into had nothing to do with IEnumerable<T>, it could have happened with any of the other mentioned types — they're all enumerable anyway. I could certainly convert my example to IList<T> and still have the same problem.

The reason I called out IEnumerable<T> more explicitly is that it seems to happen more with it than the other ones. Maybe that's because the LINQ extension methods return IEnumerable<T> or because an IList<T> property backed by a fixed List<T> collection is a very common implementation choice.

ANN: The Second Chicago Code Camp

Posted by Sergio on 2010-03-22

After a successful first Chicago Code Camp last year, we're back to announce the second edition of this unique technical event.

The Chicago Code Camp 2 will happen on May 1st. In this event we are addressing one obvious and recurring feedback: Make it closer to the city.

We're thrilled to announce that our Code Camp will take place at the IIT Campus, just South of downtown Chicago, easily accessible by car and public transportation.

What is the Chicago Code Camp?

Just like last year, we want to host an event where any platform or programming language can have its space, as long as there's community interest in talking and hearing about it.

The code camp is a great opportunity to learn about and network with developers of different walks of life and technologies. Last year we had diverse topics such as .NET, Python, iPhone, Ruby, XUL, JavaScript, Scala, etc. We hope to have even more this time around.

To ensure the numerous technical communities are fairly represented, we are inviting all local technical community leaders to get involved and provide speakers and attendees.

The event is free to attend but everyone needs to register. Registration will open soon Registration is open and it's limited due to the venue capacity.

Call for Speakers

The Chicago Code Camp website is up and ready to receive talk proposals.

The Code Camp Manifesto calls for sessions that privilege code over theory or slides, but it doesn't mean a good presentation will be immediately turned down because of that.

Stay tuned

We will have more exciting news and announcements to share about this event. We will do so as soon are they are confirmed.

Keep an eye on the website (or this blog) to learn about registrations, volunteering, and getting involved.

Code coverage reports with NCover and MSBuild

Posted by Sergio on 2010-02-09

I've been doing a lot of static analysis on our projects at work lately. As part of that task we added NCover to our automated build process. Our build runs on Team Build (TFS) and is specified in an MSBuild file.

We wanted to take code metrics very seriously and we purchased the complete version of the product to take full advantage of its capabilities.

Getting NCover to run in your build is very simple and the online documentation will be enough to figure it out. The problem comes when you begin needing to create more and more variations of the reports. The online documentation is a little short on this aspect, especially on how to use the MSBuild or NAnt custom tasks. I hear they plan to update the site with better docs for the next version of the product.

NCover Complete comes with 23 different types of reports and a ton of parameters that can be configured to produce far more helpful reports than just sticking to the defaults.

For example, we are working on a new release of our product and we are pushing ourselves to produce more testable code and write more unit tests for all the new code. The problem is that the new code is a just tiny fraction of the existing code and the metrics get averaged down by the older code.

The key is to separate the code coverage profiling (which is done by NCover while it runs all the unit tests with NUnit) from the rendering of the reports. That way we only run the code coverage once; and that can sometimes take a good chunk of time to produce the coverage data. Rendering the reports is much quicker since the NCover reporting engine can feed off the coverage data as many times as we need, very quickly.

Once we have the coverage data we can choose which report types we want to create, the thresholds for sufficient coverage (or to fail the build), which assemblies/types/methods we want to include/exclude from each report and where to save each of them.

Example

To demonstrate what I just described in practice, I decided to take an existing open source project and add NCover reporting to it. The project I selected was AutoMapper mostly because it's not very big and has decent test coverage.

I downloaded the project's source code from the repository and added a file named AutoMapper.msbuild to its root directory. You can download this entire file but I'll go over it piece by piece.

We start by just importing the MSBuild tasks that ship with NCover into our script and declaring a few targets, including one to collect coverage data and one to generate the reports. I added the NCover tasks dll to the project directory tools/NCoverComplete.

<Project DefaultTargets="RebuildReports" 
  xmlns="http://schemas.microsoft.com/developer/msbuild/2003" >
  <UsingTask  TaskName="NCover.MSBuildTasks.NCover" 
        AssemblyFile="$(ProjectDir)tools\NCoverComplete\NCover.MSBuildTasks.dll"/>
  <UsingTask  TaskName="NCover.MSBuildTasks.NCoverReporting" 
        AssemblyFile="$(ProjectDir)tools\NCoverComplete\NCover.MSBuildTasks.dll"/>

  <PropertyGroup>
    <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
    <BuildDir>$(MSBuildProjectDirectory)\build\$(Configuration)</BuildDir>
    <NUnitBinDirectoryPath>$(MSBuildProjectDirectory)\tools\NUnit</NUnitBinDirectoryPath>
  </PropertyGroup>

  <Target Name="RebuildReports" DependsOnTargets="RunCoverage;ExportReports" >
    <Message Text="We will rebuild the coverage data than refresh the reports." 
          Importance="High" />
  </Target>

  <Target Name="RunCoverage" >
    <!-- snip -->
  </Target>

  <Target Name="ExportReports" >
    <!-- snip -->
  </Target>
</Project>

Now let's look closely at the target that gathers the coverage data. All it does is tell NCover (NCover console, really) to run NUnit over the AutoMapper.UnitTests.dll and save all the output to well-known locations.

<Target Name="RunCoverage" >
  <Message Text="Starting Code Coverage Analysis (NCover) ..." Importance="High" />
  <PropertyGroup>
    <NCoverOutDir>$(MSBuildProjectDirectory)\build\NCoverOut</NCoverOutDir>
    <NUnitResultsFile>build\NCoverOut\automapper-nunit-result.xml</NUnitResultsFile>
    <NUnitOutFile>build\NCoverOut\automapper-nunit-Out.txt</NUnitOutFile>
    <InputFile>$(BuildDir)\UnitTests\AutoMapper.UnitTests.dll</InputFile>
  </PropertyGroup>

  <NCover ToolPath="$(ProgramFiles)\NCover"
    ProjectName="$(Scenario)"
    WorkingDirectory="$(MSBuildProjectDirectory)"   
    TestRunnerExe="$(NUnitBinDirectoryPath)\nunit-console.exe"

    TestRunnerArgs="$(InputFile) /xml=$(NUnitResultsFile) /out=$(NUnitOutFile)"

    AppendTrendTo="$(NCoverOutDir)\automapper-coverage.trend"
    CoverageFile="$(NCoverOutDir)\automapper-coverage.xml"
    LogFile="$(NCoverOutDir)\automapper-coverage.log"
    IncludeTypes="AutoMapper\..*"
    ExcludeTypes="AutoMapper\.UnitTests\..*;AutoMapper\.Tests\..*"
    SymbolSearchLocations="Registry, SymbolServer, BuildPath, ExecutingDir"
  />
</Target>

Of special interest in the NCover task above are the output files named automapper)-coverage.xml and automapper-coverage.trend, which contain the precious coverage data and historical trending respectively. In case you're curious, the trend file is actually a SQLite3 database file that you can report directly from or export to other database formats if you want.

Also note the IncludeTypes and ExcludeTypes parameters, which guarantee that we are not tracking coverage on code that we don't care about.

Now that we have our coverage and trend data collected and saved to files we know, we can run as many reports as we want without needing to execute the whole set of tests again. That's in the next target.

<Target Name="ExportReports" >
  <Message Text="Starting Producing NCover Reports..." Importance="High" />
  <PropertyGroup>
    <Scenario>AutoMapper-Full</Scenario>
    <NCoverOutDir>$(MSBuildProjectDirectory)\build\NCoverOut</NCoverOutDir>
    <RptOutFolder>$(NCoverOutDir)\$(Scenario)Coverage</RptOutFolder>
    <Reports>
      <Report>
        <ReportType>FullCoverageReport</ReportType>
        <OutputPath>$(RptOutFolder)\Full\index.html</OutputPath>
        <Format>Html</Format>
      </Report>
      <Report>
        <ReportType>SymbolModuleNamespaceClass</ReportType>
        <OutputPath>$(RptOutFolder)\ClassCoverage\index.html</OutputPath>
        <Format>Html</Format>
      </Report>
      <Report>
        <ReportType>Trends</ReportType>
        <OutputPath>$(RptOutFolder)\Trends\index.html</OutputPath>
        <Format>Html</Format>
      </Report>
    </Reports>
    <SatisfactoryCoverage>
      <Threshold>
        <CoverageMetric>MethodCoverage</CoverageMetric>
        <Type>View</Type>
        <Value>80.0</Value>
      </Threshold>
      <Threshold>
        <CoverageMetric>SymbolCoverage</CoverageMetric>
        <Value>80.0</Value>
      </Threshold>
      <Threshold>
        <CoverageMetric>BranchCoverage</CoverageMetric>
        <Value>80.0</Value>
      </Threshold>
      <Threshold>
        <CoverageMetric>CyclomaticComplexity</CoverageMetric>
        <Value>8</Value>
      </Threshold>
    </SatisfactoryCoverage>

  </PropertyGroup>

  <NCoverReporting 
    ToolPath="$(ProgramFiles)\NCover"
    CoverageDataPaths="$(NCoverOutDir)\automapper-coverage.xml"
    LoadTrendPath="$(NCoverOutDir)\automapper-coverage.trend"
    ProjectName="$(Scenario) Code"
    OutputReport="$(Reports)"
    SatisfactoryCoverage="$(SatisfactoryCoverage)"
  />
</Target>

What you can see in this target is that we are creating three different reports, represented by the Report elements and that we are changing the satisfactory threshold to 80% code coverage (down from the default of 95%) and the maximum cyclomatic complexity to 8. These two blocks of configuration are passer to the NCoverReporting task via the parameters OutputReport and SatisfactoryCoverage, respectively.

The above reports are shown in the images below.


Focus on specific areas

Let's now say that, in addition to the reports for the entire source code, we also want to keep a closer eye on the classes under the AutoMapper.Mappers namespace. We can get that going with another reporting target, filtering the reported data down to just the code we are interested in:

<Target Name="ExportReportsMappers" >
  <Message Text="Reports just for the Mappers" Importance="High" />
  <PropertyGroup>
    <Scenario>AutoMapper-OnlyMappers</Scenario>
    <NCoverOutDir>$(MSBuildProjectDirectory)\build\NCoverOut</NCoverOutDir>
    <RptOutFolder>$(NCoverOutDir)\$(Scenario)Coverage</RptOutFolder>
    <Reports>
      <Report>
        <ReportType>SymbolModuleNamespaceClass</ReportType>
        <OutputPath>$(RptOutFolder)\ClassCoverage\index.html</OutputPath>
        <Format>Html</Format>
      </Report>
      <!-- add more Report elements as desired -->
    </Reports>
    <CoverageFilters>
      <Filter>
        <Pattern>AutoMapper\.Mappers\..*</Pattern>
        <Type>Class</Type>
        <IsRegex>True</IsRegex>
        <IsInclude>True</IsInclude>
      </Filter>
      <!-- include/exclude more classes, assemblies, namespaces, 
      methods, files as desired -->
    </CoverageFilters>

  </PropertyGroup>

  <NCoverReporting 
    ToolPath="$(ProgramFiles)\NCover"
    CoverageDataPaths="$(NCoverOutDir)\automapper-coverage.xml"
    ClearCoverageFilters="true"
    CoverageFilters="$(CoverageFilters)"
    LoadTrendPath="$(NCoverOutDir)\automapper-coverage.trend"
    ProjectName="$(Scenario) Code"
    OutputReport="$(Reports)"
  />
</Target/>

Now that we have this basic template our plan is to identify problem areas in the code and create reports aimed at them. The URLs of the reports will be included in the CI build reports and notification emails.

It's so easy to add more reports that we will have reports that will live for a single release cycle or even less if we need it.

I hope this was helpful for more people because it did take a good amount of time to get it all sorted out. Even if you're using NAnt instead of MSBuild, the syntax is similar and I'm sure you can port the idea easily.

How to detect the text encoding of a file

Posted by Sergio on 2010-01-26

Today I needed a way to identify ANSI (Windows-1252) and UTF-8 files in a directory filled with files of these two types. I was surprised to not find a simple way of doing this via a property of method somewhere under the System.IO namespace.

Not that it's that hard to identify the encoding programmatically, but it's always better when you don't need to write a method yourself. Anyway, here's what I came up with. It detects UTF-8 encoding based on the encoding signature added to the beginning of the file.

The code below is specific to UTF-8 but shouldn't be too hard to extend the example to detect more encodings.

public static bool IsUtf8(string fname){
  using(var f = File.Open(fname, FileMode.Open)){
    var sig = new byte[Encoding.UTF8.GetPreamble().Length];
    f.Read(sig, 0, sig.Length);
    return sig.SequenceEqual(Encoding.UTF8.GetPreamble());
  }
}

Maybe I just looked in the wrong places. Does anyone know a simpler way in the framework to accomplish this?