Life Is Good
10 Gigabit Ethernet Costs Dropping

XSLT Workflow

Duncan Cockburn asked in a comment to a recent post what the output format was that I used for my XSLT transformation of a Framemaker document.  I dump the results of my XSLT transformation into a format that is close to the final output used in my workflow (i.e. I don't save it in a structured XML format first).  Then I post-process the whole thing with a Perl script to clean up some pieces that were difficult to  handle with XSLT.  I know that you can work with the XML tree directly from Perl but found two issues:

  1. I ended up using some XSLT 2.0 constructs  (or at least was playing with them) that required I use the Saxon XSLT query processor. If you know how to use XSLT 2.0 from Perl directly let me know.
  2. Time.  This was a side project I was working on - The task I was working on was to manually copy several registers worth of information from the Framemaker document to another, more structured, file.  I saved quite a bit of time (and was able to do the task much more accurately) using a more automated approach.  To make sure I didn't get stuck and waste time learning all the ins and outs of the way Perl and XML/XSLT parsers work I decided to separate the extraction and final cleanup tasks.


Duncan - do you have a lot of experience generating XML?  My understanding is that it can be a bit of a pain to get it right, even assuming you'll be using an API.  Depending on your application you may find it easier to skip the step of generating a structured XML file and instead dumping the output in a form that is easier for you to work with.  One suggestion I've heard is to use YAML.  Here's the description from the YAML homepage:

YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with           scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering.  YAML(tm) is a balance of the following design goals:       

  • YAML documents are very readable by humans.          
  • YAML interacts well with scripting languages.          
  • YAML uses host languages' native data structures.          
  • YAML has a consistent information model.          
  • YAML enables stream-based processing.          
  • YAML is expressive and extensible.          
  • YAML is easy to implement.

However, I would recommend the XML step if you were planning on using other tools that were built to handle XML.  For example, say you were going to move to a structured documentation flow and had created a stylesheet for Framemaker that understood registers defined in a structured way.  Alternatively you may want to make it possible for others to write stylesheets to take your XML to C++/Vera/e/whatever.  Of course, you don't need an XML intermediary step for that, you just need to come up with a predefined register file format.  As I've mentioned before, Denali has just such a format (RDL).  Might be worth a look.