XML, the Perl Way

Previous
5. Data base integration
Table of Content
Table of Content
Next
7. Advanced features

6. Other features

Now let see some other features of XML::Twig, beyond the basic examples.

6.1 Using the finish and finish_print methods

Sometimes all we need is to extract or update part of the document. In this case there is no reason to bother with building the twig for the rest of the document. We just want to be done with it and exit or go through the rest of the document and just output it.That's what the finish and finish_print methods provide.

finish calls Expat finish method. It unsets all handlers (including internal ones that set context), but expat continues parsing to the end of the document or until it finds an error. It should finish up a lot faster than with the handlers set.

finish_print stops the twig processing, flushes the twig and proceed to finish printing the document as fast as possible.

So here is ex3_1.pl, which just displays a stat for a player then finishes parsing. Note that the document is still checked for well-formedness, the script will exit with an error if the document is not well-formed XML.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example displays the information for a single player            #
#  It uses the finish method                                            #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

my $name= shift;
my $stat= shift;

my $twig= new XML::Twig( twig_handlers => 
            { player => sub { player(@_, $name, $stat); } } # pass the additionnal args
                       );                                   # just to be extra clean

$twig->parsefile( "nba.xml");    # process the twig
exit;

sub player
  { my( $twig, $player, $name, $stat)= @_;
    my $player_name= $player->first_child( 'name')->text;
    if( $player_name=~ /$name/i)
      { my $stat_value= $player->first_child( $stat)->text;
        print "$player_name: $stat_value $stat\n"; 
        $twig->finish;
      }
    else
      { $twig->purge; }                                      # keep a low profile
  }

Probably more interesting is ex3_2.pl which updates the stats for a player.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example updates the information for a single player             #
#  It uses the finish_print method                                      #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

my $name= shift;
my $stat_name= shift;
my $stat_value= shift;

my $twig= new XML::Twig( twig_handlers => { player => \&player } );
if( $ARGV[0]) { $twig->parsefile( $ARGV[0]); }        # parse a file
else          { $twig->parse( \*STDIN);      }        # parse the standard input
exit;

sub player
  { my( $twig, $player)= @_;
    my $player_name= $player->first_child( 'name')->text;
    if( $player_name=~ /$name/i)
      { my $stat= $player->first_child( $stat_name);
        $stat->set_text( $stat_value);
        $twig->finish_print;                               # this is it 
      }
    else
      { $twig->flush; }                                    # print players before the right one
  }

6.2 Using set_id and elt_id methods

For some applications, especially when the whole document is loaded in memory, it can be very convenient to get direct access to elements through an ID attribute. XML::Twig provides such a feature. By default if an element has an attribute named id then a hash id => element is created. This hash can be accessed through the id, set_id and del_id methods on an element, and an element can be retrived from a twig using the elt_id method on the twig.

The name of the ID attribute can be changed when the twig is created by using the Id option.

The id attribute can still be accessed through the att, set_att and del_att methods on the element but in this case the id hash will not be updated.

ex3_3.pl is an example of the set_id method.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example adds an id to each player                               #
#  It uses the set_id method, by default the id attribute will be 'id'  #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

my $id="player001";

my $twig= new XML::Twig( twig_handlers => { player => \&player } );
$twig->parsefile( "nba.xml");    # process the twig
$twig->flush;
exit;

  sub player
    { my( $twig, $player)= @_;
      $player->set_id( $id++);
      $twig->flush;
    }

ex3_4.pl uses elt_id on the updated XML document to display the name of a player with a given id. perl ex3_3.pl | perl ex3_4.pl 050 will then display player050: Stojakovic, Predrag.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example displays the name of a player whose id is given         #
#  It uses the elt_id method, by default the id attribute will be 'id'  #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

my $id= "player" . shift;

my $twig= new XML::Twig();

if( $ARGV[0]) { $twig->parsefile( $ARGV[0]); }        # parse a file
else          { $twig->parse( \*STDIN);      }        # parse the standard input

my $player= $twig->elt_id( $id);                      # this gets the element

print "$id: " . $player->first_child( 'name')->text . "\n";

6.3 Comparing the order of 2 elements

XML::Twig also offers methods to compare the order of 2 elements in the document. before and after are based on the cmp method. An element is before an other one if its opening tag is before the opening tag of the other element. Otherwise it is after. The 2 elements are equal if they are... equal!

ex3_5.pl shows how to use those methods. You can run it on an ordered and "id'ed" document this way: perl ex1_1.pl blk | perl ex3_3.pl | perl ex3_5.pl 001 015.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example displays whether an element appears before another one  #
#  in the document                                                      #
#  It uses the elt_id method, by default the id attribute will be 'id'  #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

my $id1= "player" . shift;
my $id2= "player" . shift;

my $twig= new XML::Twig();

if( $ARGV[0]) { $twig->parsefile( $ARGV[0]); }        # parse a file
else          { $twig->parse( \*STDIN);      }        # parse the standard input

my $player1= $twig->elt_id( $id1);                    # get the players
my $player2= $twig->elt_id( $id2);                     

my $name1= $player1->first_child( 'name')->text;
my $name2= $player2->first_child( 'name')->text;

if( $player1->before( $player2) )
  { printf "$name1 is before $name2\n" }
elsif( $player1->after( $player2) )
  { printf "$name1 is after $name2\n" }
else
  { printf "$name1 is equal to $name2\n" }

6.4 The next_elt method

Although the next_sibling and first_child methods are often the most convenient way to navigate there are some cases where another method is easier to use: the next_elt method makes it easier to go through all the elements in a sub-tree.

The next_elt of an element is the first element opened after the open tag of the element. This is either the first child of the element, or its next sibling, or the next sibling of one of its ancestors. Note that as usual PCDATA is considered an element.

This method has 2 forms:

ex3_6.pl shows how to use next_elt to list all the methods in the html_plus.xml document.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example displays the text of all method elements                #
#  It uses the next_elt method                                          #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

my $twig= new XML::Twig();

$twig->parsefile( "html_plus.xml");         # parse the file

my $root= $twig->root;

my $method= $root;                          # loop through the document
while( $method= $method->next_elt( $root, 'method'))
     { print "method: " . $method->text . "\n"; }

6.5 Pretty printing

By popular demand I have included a number of pretty printing options, both for documents and for data.

The usefull options to pretty print a document are:

The NOT SAFE options can produce invalid XML (that would not conform to the original DTD) in some cases. I have included them anyway because it rarely happens with simple DTDs and they look good!

The ex3_7.pl example shows the pretty printer.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example prints a document using various pretty print options    #
#                                                                       #
#########################################################################
use strict;
use XML::Twig;

my $string=
'<doc><elt><subelt>text<inline>text</inline>text</subelt><subelt>text<inline><subinline/></inline></subelt></elt><elt att="val"><subelt>text<subinline/></subelt><subelt></subelt></elt></doc>';

my $t= new XML::Twig;
$t->parse( $string);

print "normal:\n";
$t->set_pretty_print( 'none');     # this is the default
$t->print;
print "\n\n";

print "nice:\n";
$t->set_pretty_print( 'nice');     # \n before tags not part of mixed content
$t->print;
print "\n\n";

print "indented:\n";               # nice + tags are indented
$t->set_pretty_print( 'indented');
$t->print;
print "\n\n";
                                   # alternate way to set the style
my $t2= new XML::Twig( PrettyPrint => 'nsgmls');
$t->parse( $string);
print "nice:\n";
$t->print;
print "\n\n";

The output is ex3_7.res.

normal:
<doc><elt><subelt>text<inline>text</inline>text</subelt><subelt>text<inline><subinline/></inline></subelt></elt><elt att="val"><subelt>text<subinline/></subelt><subelt></subelt></elt></doc>

nice:

<doc>
<elt>
<subelt>text<inline>text</inline>text</subelt>
<subelt>text<inline><subinline/></inline></subelt>
</elt>
<elt att="val">
<subelt>text<subinline/></subelt>
<subelt>
</subelt>
</elt>
</doc>

indented:

<doc>
  <elt>
    <subelt>text<inline>text</inline>text</subelt>
    <subelt>text<inline><subinline/></inline></subelt>
  </elt>
  <elt att="val">
    <subelt>text<subinline/></subelt>
    <subelt>
    </subelt>
  </elt>
</doc>

nice:
<doc
><elt
><subelt
>text<inline
>text<
/inline>text<
/subelt><subelt
>text<inline
><subinline
/><
/inline><
/subelt><
/elt><elt
att="val"
><subelt
>text<subinline
/><
/subelt><subelt
><
/subelt><
/elt><
/doc>

To pretty print tables 2 options can be used (besides the faithful none):

The ex3_8.pl example shows the pretty printer.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example prints a table using various pretty print options       #
#                                                                       #
#########################################################################
use strict;
use XML::Twig;

my $string=
'<table><record><field1>value1</field1><field2>value2</field2></record><record><field1>value1(2)</field1><field2>value2(2)</field2></record><record><field1>value1(3)</field1><field2>value2(3)</field2></record></table>';

my $t= new XML::Twig;
$t->parse( $string);

print "normal:\n";
$t->set_pretty_print( 'none');       # this is the default
$t->print;
print "\n\n";

print "record:\n";              
$t->set_pretty_print( 'record');     # one field per line
$t->print;
print "\n\n";

print "record_c:\n";               
$t->set_pretty_print( 'record_c');   # one record per line
$t->print;
print "\n\n";

print "record:\n";                   
$t->print( PrettyPrint => 'record'); # alterate way to set the style
print "\n\n";

The output is ex3_8.res.

normal:
<table><record><field1>value1</field1><field2>value2</field2></record><record><field1>value1(2)</field1><field2>value2(2)</field2></record><record><field1>value1(3)</field1><field2>value2(3)</field2></record></table>

record:

<table>
  <record>
    <field1>value1</field1>
    <field2>value2</field2>
  </record>
  <record>
    <field1>value1(2)</field1>
    <field2>value2(2)</field2>
  </record>
  <record>
    <field1>value1(3)</field1>
    <field2>value2(3)</field2>
  </record>
</table>

record_c:

<table>
  <record><field1>value1</field1><field2>value2</field2></record>
  <record><field1>value1(2)</field1><field2>value2(2)</field2></record>
  <record><field1>value1(3)</field1><field2>value2(3)</field2></record>
</table>

record:

<table>
  <record>
    <field1>value1</field1>
    <field2>value2</field2>
  </record>
  <record>
    <field1>value1(2)</field1>
    <field2>value2(2)</field2>
  </record>
  <record>
    <field1>value1(3)</field1>
    <field2>value2(3)</field2>
  </record>
</table>

These options can be set either when creating the twig, using the PrettyPrint option, by using the PrettyPrint option in the print method (on a twig or on an element) or by using the set_pretty_print method either on a twig or on an element. Note that the setting is actually global at the moment.


Previous
5. Data base integration
Table of Content
Table of Content
Next
7. Advanced features