Sergio and the sigil

Excel is the business format, automate it

Posted by Sergio on 2008-06-09

One thing I realized when working in large companies is that Excel is the true data exchange file format. That what the business types exchange among themselves and how they like to persist any kind of lists or anything that needs to be formatted like a table (be it tabular data or not.)

All too often one of this documents is forwarded to me containing some form of business data that will become all sorts of things: lookup data, configuration values, updates to an existing table, etc.

I dread these conversion tasks when I have to do them manually so I tend to automate the task as soon as it smells like a candidate for recurrence. That way I'll take a hit on the first time and reap the benefits from then on.

For some reason I like writing these automation scripts in Ruby. I could probably almost as easily write them in VBScript, C#, PowerShell or even VBA but my experience with Ruby is that it tends to be shorter (as in more concise) and easier to develop. Of course that varies with your level of familiarity with the Ruby libraries, but the learning curve isn't bad at all.

Consider the following hypothetical layout for an Excel file dropped in your inbox for import.

DescriptionPriceCategory
1DVD Player76.49Electronics
2Rain Coat35.10Men's apparel
3Code Complete49.99Books
............
12024" Monitor499.99Computer Accessories

My typical script to process such file would be the obvious line-by-line read, maybe processing each cell before ultimately converting the row to its final format (or skipping that row.) The skeleton of that kind of script is shown below.

require 'win32ole'

begin 
  excel_app = WIN32OLE.new('Excel.Application')
  book = excel_app.Workbooks.Open("c:\\project\\data\\datafile.xls")
  sheet = book.Worksheets(1)
  #first row contains only the headers
  row = 2 #start at the 2nd line (excel is 1-based)

  #stop when we find an empty row
  while sheet.Cells(row, 1).Value
    data = []
    (1..4).each {|c| data[c] = sheet.Cells(row, c).Value }
    #at this point the 'data' array contains the entire row
    #use it as desired *********
    
    row += 1
  end
  
rescue 
  #oops...
  puts "There was an error: #{$!}"
ensure
  #cleanup
  excel_app.DisplayAlerts = 0
  book.Close if book
  excel_app.Quit
end

But, no matter if you use Ruby or any other language, the value in automating these tasks can be tremendous. Taking the time to learn the basics of Excel automation is, in my opinion, very important and will let your users continue to use the tool that they like.

In a kind of reverse example, I have had occasions when one of the business users was repeatedly requesting for some of the data in Excel format during the application development (while we didn't have a web page to show that data yet) so, after he asked for a updated report for the second time, I created a script to read the data from the database, create an Excel file, and send the file attached in an email message every Monday morning. Easy way to gain a happy user.

Update: As noted in the comments, one nice reference for automating with Ruby is Ruby on Windows

Unescape my Strings

Posted by Sergio on 2008-05-07

Since I started using Visual Studio 2005 and now 2008 it bugs me that strings in the debug windows (Immediate, Watch, Locals, Autos, etc) are automatically escaped in C# syntax (well, at least in C# projects.)

Let's assume we are writing some code that produces HTML (or XML, or SQL, or JSON, etc) where the text identation is important for understanding the contents. If we find ourselves in the middle of debugging that code and check the value of a variable containing such type of text, here's what we would see.

System.Diagnostics.Debug.WriteLine(html);
Console.WriteLine(html); // BREAKPOINT HERE
Console.ReadLine();

The Immediate window becomes less useful. The line breaks below do not exist in the Immediate window, I added them here for clarity.

?html
"<html>\r\n\t<head>\r\n\t\t<title>Monthly Report</title>\r\n\t\t<style>
\r\n\t\t\t.results td {color:#999;}\r\n\t\t\t/*etc*/\r\n\t\t</style>\r
\n\t</head>\r\n\t<body>\r\n\t\t<h1>Report for April 2008</h1>\r\n\t\t<d
iv id=\"content\">\r\n\t\t\t<p>\r\n\t\t\t\ttext goes here...\r\n\t\t\t<p
>\r\n\t\t</div>\r\n\t</body>\r\n</html>"

Same problem in the Locals or Autos windows, but there at least you can use the visualizers.

Too many clicks for me, though. Besides, the visualizers are modal windows and that makes it harder when you want to compare things.

If you happen to know how to revert to the old behavior of keeping the text format, please let me know. Until I find the right way (or at least a better way) I'm using a little macro to output any selected escaped text, unescaped to a new output pane.

To create the macro:

  • View » Other Windows » Macro Explorer
  • Right-click "My Macros" » New Module
  • Call the new module Unescape
  • Add the following sub
Public Module Unescape
  Public Sub UnescapeText()
    Dim winName As String = "Unescaped"
    Dim win As Window
    Dim output As OutputWindow
    Dim pane As OutputWindowPane = Nothing

    If DTE.ActiveWindow IsNot Nothing AndAlso _
      DTE.ActiveWindow.Selection IsNot Nothing Then

      Dim text As String = DTE.ActiveWindow.Selection.Text

      If Not String.IsNullOrEmpty(text) Then

        text = text.Replace("\t", vbTab) _
          .Replace("\r\n", vbCrLf) _
          .Replace("\n", vbCrLf) _
          .Replace("\r", vbCrLf) _
          .Replace("\""", """")

        win = DTE.Windows.Item(EnvDTE.Constants.vsWindowKindOutput)
        output = win.Object

        For Each p As OutputWindowPane In output.OutputWindowPanes
          If p.Name = winName Then
            pane = p
            Exit For
          End If
        Next
        If pane Is Nothing Then _
          pane = output.OutputWindowPanes.Add(winName)

        win.Activate()
        pane.Activate()
        pane.OutputString(text)
      End If
    End If
  End Sub
End Module

I made a shortcul for this macro:

  • Tools » Options » Environment » Keyboard
  • Show commands containing: unescape
  • Select the macro when it gets listed
  • Add a Global shortcut, I used Ctrl+Alt+U

Now if I select the escaped text in the Immediate window and press the macro shortcut, I get the following:

If I keep selecting other escaped texts and unescaping them, they'll just be appended to that output, making it possible to do some visual inspection or comparisons.

XHTML Validation Script using Ruby

Posted by Sergio on 2008-04-02

In one of the projects I'm working on we produce a number of XHTML documents and we want these documents to be valid XHTML 1.0 Strict. As an example of how automation will set you free, I promptly thought that there was no way I would be submitting several dozens of documents to the W3C XHTML validator.

Instead of go looking for an web service or something that could provide that validation, I thought it would be more interesting and educational for me to try to automate the usage of the W3C validator using some Ruby goodness. I know, there are probably quicker ways of doing that, but I want to get better at Ruby, so sue me.

I'll do my best to explain how the script works, I don't think it turned out too complicated. If anyone has tips for improving it, I'll be glad to hear about and learn even more.

The W3C Markup Validation Service page as of this writing offers the option of uploading a file and have it validated. It's a simple HTTP form POST to the URL http://validator.w3.org/check . The only not-so-trivial task is how to post a file form field. This is when I though I will probably need this type of code again in the future, so let's just write it in a separate file to reuse later. After some research and trials I ended up with the following helper file, called form_post.rb (I'll dissect it in the sequence).

require 'rubygems'
require 'mime/types'
require 'net/http'
require 'CGI'
 
class FormField
  attr_accessor :name, :value
  def initialize( name, value )
    @name = name
    @value = value
  end
 
  def to_form_data
    field = CGI::escape(@name)
    "Content-Disposition: form-data; name=\"#{field}\"" + 
      "\r\n\r\n#{@value}\r\n"
  end
end
 
class FileField
  attr_accessor :name, :path, :content
  def initialize( name, path, content )
    @name = name
    @path = path
    @content = content
  end
 
  def to_form_data
    "Content-Disposition: form-data; " + 
    "name=\"#{CGI::escape(@name)}\"; " + 
    "filename=\"#{@path}\"\r\n" +
    "Content-Transfer-Encoding: binary\r\n" +
    "Content-Type: #{MIME::Types.type_for(@path)}" + 
    "\r\n\r\n#{@content}\r\n"
  end
end

class MultipartPost
  SEPARATOR = 'willvalidate-aaaaaabbbb0000'
  REQ_HEADER = {
      "Content-type" => "multipart/form-data, boundary=#{SEPARATOR} "
   }
 
  def self.build_form_data ( form_fields )
    fields = []
    form_fields.each do |key, value|
      if value.instance_of?(File)
        fields << FileField.new(key.to_s, value.path, value.read)
      else
        fields << FormField.new(key.to_s, value)
      end
    end
    fields.collect {|f| "--#{SEPARATOR}\r\n#{f.to_form_data}" }.join("") + 
         "--#{SEPARATOR}--"
  end
end

Right at the top, we see.

require 'rubygems'
require 'mime/types'
require 'net/http'
require 'CGI'

This is roughly equivalent to assembly references you have in your Visual Studio projects. We are just saying that we will need each of the listed libraries. Just like the .Net Framework, Ruby comes with a wealth of core and utility classes, organized in libraries. The rest of the code in this file will use classes and modules defined in these libraries.

Then comes the FormField class, which represents one simple form field, a name/value pair basically.

class FormField
  attr_accessor :name, :value
  def initialize( name, value )
    @name = name
    @value = value
  end
 
  def to_form_data
    field = CGI::escape(@name)
    "Content-Disposition: form-data; name=\"#{field}\"\r\n\r\n#{@value}\r\n"
  end
end

I won't explain the details of the class declaration syntax because I think Joe Ocampo already did a good job at that (link). Our FormField class has two properties FormField#name and FormField#value (see how we refer to the instance properties and methods in Ruby? We use the Class#method notation.), which represent a form field with name and its value, but only for simple input fields, not a file field yet.

The FormField#to_form_data method (again, note the Ruby convention to have methods in lower case, words separated by underscores). This method will convert the name/value pair into the appropriate HTTP form data POST format. The CGI::escape is simply a class method (static method in C# terms) that will escape any especial characters in the field name.

After that we just return a string with the expected form data layout. In Ruby, the return value of a method does not need to be provided by the return statement, it is optional. If no return statement is used, the return value will be the last evaluated expression — the string in our case. When But wait, there's something interesting in this string. Do you see #{field} and #{@value}? These will be automatically substituted by name and @value, respectively. You can use anything that is in scope and the substitution will be done via a process that is called String Interpolation. This only works with double-quoted strings (other delimiters can be used in Ruby to denote string literals.)

OK, now on to the next class, FileField.

class FileField
  attr_accessor :name, :path, :content
  def initialize( name, path, content )
    @name = name
    @path = path
    @content = content
  end
 
  def to_form_data
    "Content-Disposition: form-data; " + 
    "name=\"#{CGI::escape(@name)}\"; " + 
    "filename=\"#{@path}\"\r\n" +
    "Content-Transfer-Encoding: binary\r\n" +
    "Content-Type: #{MIME::Types.type_for(@path)}" + 
     "\r\n\r\n#{@content}\r\n"
  end
end

After seeing the FormField class, the FileField class becomes easier to understand. It represents one file that we want to include in the form posting as a file input field. It has the field name, the file path, and the file contents. The FileField#to_form_data also converts the file information to the appropriate posting format.

This leads us to the last class in this file.