XHTML Validation Script using Ruby
In one of the projects I'm working on we produce a number of XHTML documents and we want these documents to be valid XHTML 1.0 Strict. As an example of how automation will set you free, I promptly thought that there was no way I would be submitting several dozens of documents to the W3C XHTML validator.
Instead of go looking for an web service or something that could provide that validation, I thought it would be more interesting and educational for me to try to automate the usage of the W3C validator using some Ruby goodness. I know, there are probably quicker ways of doing that, but I want to get better at Ruby, so sue me.
I'll do my best to explain how the script works, I don't think it turned out too complicated. If anyone has tips for improving it, I'll be glad to hear about and learn even more.
The W3C Markup Validation Service page as of this writing offers the option of uploading a file and have it validated. It's a simple HTTP form POST to the URL http://validator.w3.org/check . The only not-so-trivial task is how to post a file form field. This is when I though I will probably need this type of code again in the future, so let's just write it in a separate file to reuse later. After some research and trials I ended up with the following helper file, called form_post.rb (I'll dissect it in the sequence).
require 'rubygems' require 'mime/types' require 'net/http' require 'CGI' class FormField attr_accessor :name, :value def initialize( name, value ) @name = name @value = value end def to_form_data field = CGI::escape(@name) "Content-Disposition: form-data; name=\"#{field}\"" + "\r\n\r\n#{@value}\r\n" end end class FileField attr_accessor :name, :path, :content def initialize( name, path, content ) @name = name @path = path @content = content end def to_form_data "Content-Disposition: form-data; " + "name=\"#{CGI::escape(@name)}\"; " + "filename=\"#{@path}\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: #{MIME::Types.type_for(@path)}" + "\r\n\r\n#{@content}\r\n" end end class MultipartPost SEPARATOR = 'willvalidate-aaaaaabbbb0000' REQ_HEADER = { "Content-type" => "multipart/form-data, boundary=#{SEPARATOR} " } def self.build_form_data ( form_fields ) fields = [] form_fields.each do |key, value| if value.instance_of?(File) fields << FileField.new(key.to_s, value.path, value.read) else fields << FormField.new(key.to_s, value) end end fields.collect {|f| "--#{SEPARATOR}\r\n#{f.to_form_data}" }.join("") + "--#{SEPARATOR}--" end end
Right at the top, we see.
require 'rubygems' require 'mime/types' require 'net/http' require 'CGI'
This is roughly equivalent to assembly references you have in your Visual Studio projects. We are just saying that we will need each of the listed libraries. Just like the .Net Framework, Ruby comes with a wealth of core and utility classes, organized in libraries. The rest of the code in this file will use classes and modules defined in these libraries.
Then comes the FormField
class, which represents one simple form field, a name/value pair basically.
class FormField attr_accessor :name, :value def initialize( name, value ) @name = name @value = value end def to_form_data field = CGI::escape(@name) "Content-Disposition: form-data; name=\"#{field}\"\r\n\r\n#{@value}\r\n" end end
I won't explain the details of the class declaration syntax because I think Joe Ocampo already did a good job at that (link). Our FormField
class has two properties FormField#name
and FormField#value
(see how we refer to the instance properties and methods in Ruby? We use the Class#method notation.), which represent a form field with name and its value, but only for simple input fields, not a file field yet.
The FormField#to_form_data
method (again, note the Ruby convention to have methods in lower case, words separated by underscores). This method will convert the name/value pair into the appropriate HTTP form data POST format. The CGI::escape
is simply a class method (static method in C# terms) that will escape any especial characters in the field name.
After that we just return a string with the expected form data layout. In Ruby, the return value of a method does not need to be provided by the return
statement, it is optional. If no return
statement is used, the return value will be the last evaluated expression — the string in our case. When But wait, there's something interesting in this string. Do you see #{field}
and #{@value}
? These will be automatically substituted by name
and @value
, respectively. You can use anything that is in scope and the substitution will be done via a process that is called String Interpolation. This only works with double-quoted strings (other delimiters can be used in Ruby to denote string literals.)
OK, now on to the next class, FileField
.
class FileField attr_accessor :name, :path, :content def initialize( name, path, content ) @name = name @path = path @content = content end def to_form_data "Content-Disposition: form-data; " + "name=\"#{CGI::escape(@name)}\"; " + "filename=\"#{@path}\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: #{MIME::Types.type_for(@path)}" + "\r\n\r\n#{@content}\r\n" end end
After seeing the FormField
class, the FileField
class becomes easier to understand. It represents one file that we want to include in the form posting as a file input field. It has the field name, the file path, and the file contents. The FileField#to_form_data
also converts the file information to the appropriate posting format.