XSLT for the Rest of Us
The eXtensible Stylesheet Language is a W3C standard for transforming arbitrary XML documents in a number of useful ways. Since it is a standard, there are a number of tools written for it and a number of software applications that incorporate it. An XSLT programmer can feel confident that her XSLT stylesheets can be run on almost any operating system, within a multitude of frameworks and applications and best of all, they do not require proprietary or costly software packages. So why are they so hard for ordinary people to use? The answer is, they needn't be. Not if just a bit of care and a little extra effort is taken on the part of the application's author. Nowdays, when I look at the variety of interesting and useful XSLT stylesheets written for the EAD community by archivists and programmers, I become distressed when I try to sort through the multitude of system requirements, Java environments, the depressing array of subsidiary programs and parsers that must be installed, and the uncomfortable configuration contortions users must attempt before they can even run the simplest of transformations. It is not surprising that so many archivists think XSLT is "too hard," or that it is intended for experienced Java programmers. Many people are surprised to learn that Java isn't even necessary to create or to run an XSLT application. The two are not as wedded as many people are led to believe. By removing Java from the equation, XSLT applications on Windows can be as easy to install and run as the simplest of Windows programs. Luckily XSLT is language neutral. Most popular programming languages now accomodate XSLT, so why not deploy XSLT applications using a Windows-friendly language when your target audience uses Windows? There are a number of ways to deploy "easy" XSLT to a Windows audience. There are excellent python packages, XSLT packages written in Visual Basic, all extraordinarily easy to install and use. The common denominator being none of them are Java. This tutorial demonstrates how to package your XSLT applications using perl. Such applications can be packaged in simple, entirely self contained install programs of a kind familiar to us all. Double click the setup program to install and you're ready to go. No need to install parsers and XSLT processors. You don't have to struggle to update your Java environment, twiddle with class paths, fret over your jar files, worry about having the correct version of Saxon, Xalan, etc. You don't have to open clumsy command prompt "DOS boxes" and type in long complicated options. You don't even have to install perl, it can all be packaged in one, simple, easy to use script! XML::LibXSLT
XML::LibXSLT, written by Matt Sergeant, is a perl wrapper around the gnome libxslt library. At its simplest, a basic LibXSLT script consists of the following:
This article will start with this basic 9-line script and adapt it to work in a typical Windows environment. We'll begin by supporting a "drag and drop" interface, where users drag the file they wish to convert from their Windows file menu onto a desktop icon (no opening messy black "DOS prompt" windows for us!). We'll then adapt it to convert a large number of files in a single directory. Then I'll show you how to pass user-configured options via a text file rather than typing them in as a confusing set of command line parameters. Finally, I'll demonstrate how you can deploy your application in an easy to install, double-click style setup file. Licensing Issues
The technique employed here is to include, as a part of the installation package, a copy of perl itself. This allows people to install and use your program without going through the added step of downloading, installing, and configuring perl and the proper packages, setting the PATH, learning to work on the command line, etc. I also usually pare down the standard perl distribution by removing modules, files, documentation, example programs, etc., which are not necessary for the application to function. For the sample setup package included here, the installation script is a mere 1.3Mb, small enough to fit on a floppy disk. However, when distributing applications such as this outside your organization you must be senisitive to the various licensing issues which may apply to embedded programs. Even free applications are usually governed by restrictions on where and how they can be distributed. ActiveState perl, the most widely known perl for Windows, cannot be distributed outside of one's own organization. Other versions of perl for Windows can be distributed freely, but only as "verbatim" copies, that is, they cannot be pared down or changed in any way. Still other versions limit how they can be distributed as part of commercial products. The version of perl I typically include is the one made available by the Apache Foundation (see http://www.apache.org/dyn/closer.cgi/perl/win32-bin. Apache places no restrictions on how its version of perl can be distributed and allows modifications such as removing unnecessary packages and files. This is a good thing since the complete perl installation weighs in at over 20 Mb. The only requirement is that the various copyright notices, licenses and disclaimers be included, and that any modifications be clearly documented. The same techniques can be employed for other scripting and programming languages. Technically, you can include minimal Python and even Java distributions—"technically" meaning it will work, but may be in violation of the license under which your particular version of Python or Java for Windows is governed. The application "skeleton" I am making available here can be safely customized and modified (as long as the licenses and disclaimers remain intact) and deplyed outside of your institution. However if you include other programs or interpreters with your application, be sure you are in compliance with all of the licensing conditions which govern them. A simple "drag and drop" interface
Creating an interface that lets users drag XML files from any location on their hard drive onto a desktop icon presents some challenges for programmers accustomed to working exclusively in a Unix or Windows command-line style interface. With a command line, you always know your current directory. The programmer can type in a program name or filename parameter using either a path relative to the current working directory, or an absolute path completely independent of where you happen to be sitting in your current shell. Working in a Windows-only interface, we must be careful to always specify program names and file parameters using absolute paths. This can present special problems for applications where the user can install them anyplace on their computer they wish. What is the absolute path to the program you are creating if you have no idea beforehand where the user will choose to install it? You could mandate an installation directory. Warn users that they must install the application to "c:\Program Files\My Application" but why don't we consider that a crass last resort and try to work out something more palatible? The trick is simple: by examining and parsing the value of $0, the perl variable containing the name of the program, we always know where the program has been installed. From this we can determine the absolute path of the XSLT files we are are running. Similarly, if we are providing a text file where users can type in configuration parameters, we can determine the absolute path of this by playing with $0.
The last thing we'll need to fix is the final "print" statement. As it is, it sends the output to STDOUT, completely useless when transforming a file unless the user can pipe it to some output filename on the command line, e.g., transform.pl myfile.xml > myfile_new.xml. Since command lines are taboo here, we'll have to decide on a reasonable location ourselves (or allow the user to specify it for herself in a configuration file, but we'll get to that a little later). We could modify the input filename a bit, tack on some extension, e.g., myfile.xml.transformed and save it in the same directory as the original file, that would be easiest, but it's also obnoxious. Why should your users go through the hundreds of files they have transformed and name them back to what they originally were? There are many options available but one I have relied on is to create a new subdirectory in the same directory as the file to be transformed, and then store the new transformed files there. So we parse apart the input filename from $ARGV[0] a bit to determine its directory and create our new subdirectory there. Let's call it transformed/. We can't automatically write to a directory in perl that doesn't exist so we create one with mkpath (). (Of course you could demand that your users first create this directory themselves within every folder containing files they wish to transform, but remember, we're using perl, not Java!)
Running the Program
The program above presents only one piece of the puzzle. In a typical command-line environment, a user would invoke a perl program by calling the perl interpretor, giving it the name of the perl file to run as the first parameter, then following that with a variety of command line options: C:\Program Files\My XSLT Application>perl simple.pl myfile.xml Here, the user must have installed perl on their PC and ensured that the interpretor is in their PATH. Not only that, but they must fetch the LibXSLT perl packages since they are not included in the standard perl distribution and install those too! But that's no better than the Java solutions we're struggling with now. Our solution is to include perl, with all necessary modules installed and configured, right in with our installation where it will be copied to the installation directory of the user's PC along with all the other scripts and files, completely invisibly and with no extra effort required by the user. The example install programs available here all include a small, minimal perl installation consisting only of the files and modules necessary to run LibXSLT. Our installation script will create a desktop icon batch file containing a command that looks something like this (all on one line): c:\Program Files\My XSLT Application\bin\perl\bin\perl The first parameter is the absolute path to the minimal perl interpretor included with the installation, the -I parameter tells this perl where to find its libraries and modules, the third parameter is the absolute path to the script perl should execute, and the %1 parameter at the end is how the batch file passes the name of the file dragged onto the icon to the script as $ARGV[0]. All of those absolute paths! How do we know what the absolute paths to these files are before the user has even installed them? We don't, so we create this batch file and the desktop icon dynamically during the installation process. The sample XSLT Template package avilable below takes care of all of this for you, so don't let those confusing fullpath commands discourage you. Transforming Multiple Files
Our next task is to add the ability to easily transform more than one file at a time. Having to drag and drop file after file—hundreds of files for large repositories—is a recipe for disgruntlement, not to mention carpal tunnel syndrome! There are a number of ways to accomplish this. One way might be to include a graphical interface package such as PerlTk, or GUI.pm with your minimal perl installation. The user could open an attractive application interface, select a file or a directory from a standard file browsing menu, check some options, then click "Run" to begin the transformation process. Hey, we're perl programmers, not saints. I think it's safe to consider something like that as beyond the scope of this article, though you are encouraged to explore an option like this if it interests you. Our path will be considerably more low-tech but hopefully not too burdensome for our users. We'll include an extra small batch file with our program. Users can copy this file and paste it into the folder containing the XML files they wish to transform. When they double-click it to run it, our program will transform every XML file in that folder. The program will be nearly identical to what we have so far, except the transformation process will be enclosed in a loop when the batch process option is selected. Note, we'll also move the XSLT processing portion into a subroutine for clarity and to avoid unnecessary duplication of code. When the batch option is not executed, it will transform a single file dragged on top of the desktop icon as usual. Our new "batch" batch file will be similar to the original, but with a little added difference: c:\Program Files\My XSLT Application\bin\perl\bin\perl As before, with all of these absolute paths, we'll create this second batch file dynamically when our application is installed, in case the user decides to install to some other folder than c:\Program Files. Our new program will look for a -dir command line option. Finding it will trigger batch mode. %0 is the fully qualified pathname of the batch file itself. This will be our very sneaky trick to discover which folder the batch file has been copied to and contains all of those juicy XML files to transform. Our new perl program is presented here. In addition to the check for a -dir command line option and the foreach loop, another added feature is to specify the desired file extension at the top which the tranformed file should be saved as. Thus XML files which are transformed to HTML should have a .html extension while XML->XML transformations should retain the original .xml extension.
Passing User-Defined Options
The last thing we will do to make our task complete, is to provide some method of allowing users to specify certain options. Since we have implemented a command line-free drag and drop interface, the usual methods will not work. If you are going full service by creating a visual user interface, say through TK or the far simpler Win32::GUI, this could be done by creating a custom dialog box where users can select choices or fill in text-box style fields. The method I will describe is the use of a text file called Options.cfg which users can open in a text editor and change whichever options they like. Even here, there are a great many strategies one can employ. I usually use my own custom configuration file code—it's a very simple bit of code actually, only a few lines—but for this example we'll avail ourselves of the Config::IniFiles perl module (which is also included in the minimal perl installation packaged with the setup script). This module allows us to read Windows .ini style files. The method by which user-defined runtime parameters are passed to the XSLT processor differ based on which processing software you are using. XML::LibXSLT uses the gnome libxslt library internally. Parameters are passed as a second argument to the stylesheet objects transform method. There are a couple of different ways to format this, all of them fairly ugly I'm afraid. Let's use this method (read the documentation for XML::LibXSLT if you want to know more):
$stylesheet->transform($source, XML::LibXSLT::xpath_to_string(
countrycode => "us",
mainagencycode => "CU-BANC"
));
Of course the example values "us" and "CU-BANC" given above would actually be pulled from the configuration file when we write our actual program. For this example we'll use the EAD version 1.0 to EAD 2002 conversion stylesheet used by the Online Archive of California, which was itself adapted from the 2002 conversion stylesheet hosted by the Society of American Archivists (see: http://www.archivists.org/saagroups/ead/resources/ead2002conv). This stylesheet is included here for illustrative purposes only. Non-OAC repositories should use the official stylesheets provided at the link above. Like the SAA conversion stylesheet, the OAC stylesheet provides for a number of runtime parameters. We'll place these in our .ini-style configuration file, Options.cfg so that our users can adjust them before they run the transformation:
Note, it might be convenient for perl to derive the convdate and isoconvdate rather than make the user supply this. That is left as an exercise for the reader. Now on to the code!
This is our (almost) complete code. Adapting this to any specific XSLT application is very simple:
We still lack a means to actually run this program, that is, some batch file copied to the user's desktop. As we mentioned, since we don't know where the user will install this program until they actually install it, the creation of the batch file or files must be done as part of the installation process. Creating an Install Package
Without a basic installation program, installing our perl application would be every bit as difficult as what we have to deal with now. If you are already familiar with creating Windows installs, you should use your favorite program. The program I use is called Inno Setup and this is what will be described here. Inno Setup is free of charge, enjoys immense popularity, and is blessed with an active and helpful user community, eager to offer advice to new users. I prefer Inno Setup because it is extremely easy to use. It rather follows the perl philosophy I think: Make simple things simple and difficult things possible. Setting up simple installations is trivially easy. But users can do more complex tasks by availing themselves of Inno Setup's built in Pascal scripting functionality. Our installs are simple though, so a diversion into Pascal will be unnecessary. Download and install the latest version of Inno Setup from http://www.jrsoftware.org/isinfo.php. Inno Setup uses a configuration file which you fill in to create your install. These files have a .iss extension. A sample file is included in the example program you can download at the bottom of this article. This .iss file is designed to be easily modified for your particular XSLT application. The bits listed in green below are the parts you should modify to match your application:
Thus, "My XSLT Application" should be changed to the name of your application, e.g., "EAD version 1.0 to 2002". DefaultDirName: This is where, by default, Inno Setup will install the application. Because of differences in the versions of Windows and the security privileges different users will run under, a directory directly in C:\ is a good default value. Users are free to specify a different (and more reasonable) installation directory during setup, e.g., d:\xml\xslt\EAD Version 1 to 2002. Don't set C:\Program Files as a default because many users are unable to install there unless they are running as administrators. OutputBaseFilename: This is simply the name Inno Setup will give to the finished install program. It will tack on a ".exe" extension. You can set a more reasonable name here, e.g., ead2html_setup, or simply rename it after it is generated by Inno Setup. Source: Be careful here. Source tells Inno Setup where it should find the files to be included in the install when you are actually compiling your new setup script. The user doesn't see this, it matches what is set up on your own computer as you are creating your setup file. When you install the example setup package at the end of this article, it will create, by default, a folder called "c:\My XSLT Application". This folder contains the example .iss script for you to customize and a subfolder called "myxsltapp". "myxsltapp" is where you will put your XSLT transformation file and where you will customize the accompanying perl script, Options.cfg file, etc. You don't have to rename this folder as your users will never see it. What they see is whatever you will put for DefaultDirName. If you do decide to rename this folder, say to create multiple XSLT applications, and want to make a different folder for each application, then create multiple .iss files and change their Source parameters accordingly. For example, consider the following setup for maintaining multiple XSLT applications. At the top level are four Inno Setup .iss script files. These correspond to four folders containing the specific files for each of these applications. There's also a custom setup image if you want to include one as part of your install. Each time you want to recreate an install program for your application, you'd simply double click the appropriate .iss file and compile it in Inno Setup. In each of these .iss files, the value in Source should be the folder containing the application files for that particular application. ![]() [Icons]: Specifies any icons which should be created on the user's desktop. The first parameter, after the {userdesktop}, should be changed to whatever label you want to display beneath this icon. Generating the Batch Files During the Installation Process The [Run] section tells the compiler to execute the mkbat.bat batch file which in turn executes a mkbat.pl perl script. This script silently creates two batch files before the installation program is complete. The first is for transforming single XML files. This is the batch file for which a shortcut will also be placed on the user's desktop. The second is for batch transforming an entire folder of XML files. You don't need to ever modify mkbat.bat or mkbat.pl if you don't want to. They should always generate your two batch files correctly for each XSLT application you create. Windows 95 and Windows 98
The final program distributed here has one additional enhancement, and that is a call to Win32::GetLongPathName(). Windows 95 and 98 both pass the Windows short name to applications through %1 when you drag a file onto a batch file, while later versions of Windows pass the long name. Perl, LibXSLT, etc. can read either long paths names or short ones with equal ease. However, where we derive the output filename from the input filename, these two earlier versions of windows would write files with the short path name. Your users will be irritated if they transform a file called mss_00000622.xml and end up with one named MSS_00~1.XML. So at an appropriate point when composing the name of the output file, we make a call to Win32::GetLongPathName() to guarantee the name of the output file is based on the long filename.
What about Macintosh?
Luckily XML::LibXSLT is available for MacPerl, the version of perl designed to run on Macs. To avoid confusion, my example code here is Windows-specific, but only a few minor changes need to be made to allow the same code to run on Macintosh (and Unix, Linux, etc.). For example, the Win32::GetLongPathName() is for Windows only. You'd get a compilation error if you tried to run it on Macintosh. You can make two simple modifications to your code to insure this bit doesn't spoil your plans for cross-platform glory:
You may need to hunt through the code (luckily it's very short) for other Windows-specific bits, mostly involving filenames and directory separators. For more information on XML::LibXSLT for the Macintosh, visit Libxml & Libxslt ports to Mac OSX available The Installation Package
You can download a ready-to-customize installation package here: Download: xsltapptemplate_setup.exe To use this package, download and install it wherever you like. The first task is to simply run it to create a sample setup and test it to get a feeling for how it works:
Now customize it for your specific XSLT application:
The Perl Installation
To save space and create the smallest installation program possible, most of the files, examples, documentation, and modules were removed from the standard installation of perl. If you want to do more with your perl script than the simple steps described here, you may find yourself attempting to make use of modules which have been removed (odd error messages mentioning "@INC" are a dead giveaway). If this happens, you can write down which modules are missing and copy them back in one by one from a complete perl installation. You can also simply delete the perl folder included here and copy in a complete perl installation. After all, diskspace is cheap nowdays, and most people have fast internet connections. If you deploy a lot of XSLT applications, each with its own perl installation included, you may begin to try the patience of even those users with lots of diskspace. The perl module Module::ScanDeps is the easiest way to determine which modules your perl program requires, but you still need to copy, delete, and generally manage them by hand. My technique is to maintain a small perl distribution which I copy into each of my application folders. When I test these applications, I record any error messages and add those modules in that are required for that particular application. There are other techniques for distributing small perls. perl2exe is an application which converts a perl script and all necessary modules into a single, self-contained executable. I haven't had a chance to work with this application very much but it holds the potential of making installation scripts even smaller, and possibly simplifying certain application tasks. Applications created using perl2exe can also be distributed freely without the need for written permission from the perl2exe's creators. Finally, another option is to use the Perl Archive Toolkit (PAR). This package is very similar to perl2exe but with many more options. Of the two, PAR seems the more useful, particularly since PAR can generate executables which will run on the Macintosh while perl2exe cannot. I still just manage my small perls by hand, mostly because I have become very fast at it and have become familiar with perl's module structure, but also because I simply haven't had time to investigate the other methods described above. If there is a demand perhaps I'll update this article someday and make a script or two available. April 30 2006 update: I've had reason to test both perl2exe and PAR to package standalone executables that do not require perl to be installed on the target system. Both are excellent systems and I intend to use both in the future. With both systems I was able to create complete Windows GUI applications, indistinguishable from applications written in Visual C++ or Visual Basic, packaged into a single executable that did not require an installation step (as opposed to this xsltapps perl package available here which requires an initial installation step). That is, the user downloads a single executable and double-clicks it to run the program. perl2exe is a commercial application which you must purchase. Its executables are about half the size of PAR applications, which is a big advantage (though I have yet to fully explore PAR optimizations which may bring executable size down somewhat). However, perl scripts must usually be modified in complicated ways before they will successfully compile and run and many types of scripts may simply never work as perl2exe executables. I was able to get many scripts to work as perl2exes but a few others I was never able to run. perl2exe applications also seem to start up much faster than PAR applications. This may have something to do with the size of the executable the loader must deal with, but I think not. As far as I can tell, by default pp packs an entire perl installation into the executable including modules not even used by your program. There are various filtering options to prevent the loading of unneeded modules which I haven't yet explored. Purchasing a license supposedly entitles the user technical support (though your results may vary!) PAR seems to be able to compile any script into a self-contained executable without any need to modify the original perl script before compilation. Unlike perl2exe it is able to hunt down all dependencies automatically without the need to add a lot of extraneous use statements. But as noted before, it also seems to pack a lot more than is needed by the program resulting in the larger executable. A word of clarification is in order. The application which converts perl scripts into executables is called 'pp'. PAR by itself does not do this. PAR is a utility for packaging modules into special "archive files" (like JAR files if you're familiar with Java). pp is installed automatically when you install PAR. For applications which can successfully compile and run under perl2exe, I will use perl2exe for its smaller executables and faster loading time. PAR I will reserve for those applications which cannot be made to work under perl2exe. If you decide to use perl2exe you should definitely test the trial version and attempt to compile and run your completed perl script before you decide to buy. Really, you should wait until your script is done before deciding. You may find, as I did, that as you add features to your program, you will eventually add something that breaks perl2exe. For me, installing and using PAR was trivially easy. Provided perl is installed on your system and is in your PATH:
More information about PAR can be found here: Perl2exe is available from IndigoStar: |