XML, the Perl Way

Creating XML Using Perl

by Michel Rodriguez
Boardwatch Magazine

My previous columns looked at ways to design the structure of XML documents, through homemade or off-the-shelf DTDs. Now it's time to get down-and-dirty about it, and to look a little closer at how to create this XML, using everybody?s favorite language: Perl.

One of the goals of XML was actually that a desperate Perl hacker would find it easy to deal with, even without tools or previous experience. Two years later lots of tools have been written. Creating XML from various sources can be done using some of the resources Perl offers to less and less desperate users.

Installing a Perl module is usually as simple as downloading it, un-taring it, then chanting the magic lines in the appropriate directory:

perl Makefile.PL
make
make test
su
make install

Once that's done, the module can be used by simply inserting the use Module; line in the code.

Creating XML can be done just as easily as creating HTML, using the XML::Writer module, and its CGI.pm- like interface, from a relational table, using DBIx-XML-RDB (don't you love those names?) or from a Web form using XML::CGI.

XML::Writer

XML::Writer uses an interface similar to the famous CGI.pm module, used by countless CGI scripts all over the World Wide Web. It nicely prints XML in a file, checking as much as it can, for example, that the document has only one root. It will also take care of encoding characters that could interfere with the XML syntax, such as '<' and '&', encoding them as XML entities:

#!/bin/perl -w
use strict;

use XML::Writer;

my $doc = new XML::Writer();

# print the open tag, including the attribute
$doc->startTag("doc", class => "simple");

# print an element containing only text
$doc->dataElement( title => "Simple Doc");
  $doc->startTag( "section");
    $doc->dataElement( title => "Introduction", no => 1, type => "intro");
      $doc->startTag( "para");
        $doc->characters( "a text with");
        $doc->dataElement( bold => "bold");
        $doc->characters( " words.");
      $doc->endTag( "para");
    $doc->endTag(); # close section
  $doc->endTag(); # close doc
$doc->end(); # final checks
Which will output the following XML:
<doc class="simple"><title>Simple Doc </title><section><title no="1" type="intro"> Introduction</title><para>a text with<bold> bold</bold> words.</para></section></doc>

DBIx-XML-RDB

A lot of times XML documents are created from relational tables, be it to exchange them between two incompatible database management systems (DBMSs) or just so the data can then be displayed or processed easily using XML tools.

DBIx-XML-RDB can help there. Based on Perl's wonderful generic database interface (DBI), it allows extraction of tables to an XML format from any database accessible by DBI, which includes CSV (comma separated values), Oracle, Sybase and Microsoft Access (through the OBDC driver).

Using DBIx-XML-RDB is really easy:

#!/bin/perl -w
use strict;

use DBI;
use DBIx::XML_RDB;

my $xmlout = DBIx::XML_RDB->new ($datasource, "ODBC", $userid, $password, $dbname)
  or die "Failed to make new xmlout";

$xmlout->DoSql("select * FROM MyTable");
print $xmlout->GetData;

This would produce an XML file such as:

<?xml version="1.0"?>
<DataSource>
<RESULTSET statement="select * from Table">
<ROW><Name>Mr Hanky</Name>
<Email>hanky679@aol.com</Email>
</ROW>
<ROW><Name>Eric Cardman</Name>
<Email>dude@boardwatch.com</Email>
</ROW>
</RESULTSET>
</DataSource>

Of course, instead of just selecting the whole table, you can use a query, the DoSql line then becoming, for example:

$xmlout->DoSql( qq{
SELECT last-name, first-name, expiration, email FROM customer
       WHERE expiration < DATE-ADD (CURRENT-DATA, INTERVAL 30 DAY)
       ORDER BY expiration}
              );

DBIx-XML-RDB also comes with a simple database extraction tool, called sql2xml. This tool simply dumps a table in a database to an XML file.

XML::CGI

In a lot of cases, XML is also created from user input, through a Web form. XML::CGI can then be used to save the form data to an XML file. It is a subclass of CGI.pm that adds two methods, toXML and toCGI, to the CGI module.

It is used this way:

use XML::CGI;

$q = new XML::CGI;

# convert all the form variables to XML
$xml = $q->toXML;
# this wraps the variables in a single element 'root'
$xml = $q->toXML( 'root');
# convert XML to CGI.pm variables
$q->toCGI($xml);

XML::TWIG

To do some serious processing and to integrate data from a database into an XML document, you need to use serious XML processing tools. Perl offers a vast number of choices of XML processing modules, among which XML::PYX or XML::Simple can be used for simple processing, XML::DOM or XML::XSLT implement the W3C-approved recommendations and the one presented here, and XML::Twig is probably the most Perlish of all (full disclosure: It is also written and supported by the author of this column).

From a simple HTML-like file like:

<html>
<head><title>Data Base Example</title></head>
<body>
<h1>Data Base Example</h1>
<p>the <plan field="name" code="P001"/>, is 
$<plan field="price" code="P001"/>) a month,
a much better deal than <plan field="name" code="P002"/>,
at $<plan field="price" code="P001"/>) per month.</p>
</body>
</html>

A simple script like the following one will create an XHTML file by replacing the plan elements by the appropriate values:

#!/bin/perl -w
use strict;

use DBI;
use XML::Twig;

my $file= shift;
my $dbh= connect_to_db();
my $twig= XML::Twig->new(
            TwigRoots => { plan => \&plan }, # only process include elements
            TwigPrintOutsideRoots => 1       # output the rest unchanged
                        );
# process the file
$twig->parsefile( $file);

$dbh->disconnect();
exit;

# connect to the data base
sub connect-to-db
  { my $driver = "mysql";
    my $dsn = "DBI:$driver:database=test;";
    my $dbh = DBI->connect($dsn, 'test', '', {AutoCommit=>1});
    my $drh = DBI->install_driver($driver);
    return( $dbh);
  }

sub plan
  { my( $twig, $plan)= @_;
    my $field= $plan->att( 'field');
    my $code= $plan->att( 'code');
    my $query= "SELECT $field FROM plan WHERE code='$code'";;
    # prepare then execute the select
    my $sth= $dbh->prepare( $query);
    $sth->execute();
    # there will be only one row
    my $row= $sth->fetchrow_arrayref();
    # and one field in the row
    print $row->[0];
  }

This could be done with other XML modules, and gives you a good example of the ease with which Perl combines its advanced database access features with its powerful XML processing abilities.

How to get Perl tools

This has been a brief overview of some Perl resources that may help you. There are plenty of resources on the Web to learn about XML in Perl, but probably the most important one is the Comprehensive Perl Archive Network (CPAN at http://cpan.org). This site will tell you everything from processing documents in all sorts of ways to various XML-related standard implementations such as Xpath.

Other resources include:

In any case Perl seems to be a natural choice for programmatically creating XML.


Note: this article was published in 2000 in Boardwatch magazine. More recent articles about XML and especially Perl & XML can be found on www.xmltwig.com