XML, the Perl Way

Previous
6. Other features
Table of Content
Table of Content
Next
8. Under the hood

7. Advanced features

or "I hope you don't need those"

7.1 Using StartTagHandlers

Sometimes you might want to just change a tag name, or store some attributes, BEFORE the whole tree for the element is built. This is often the case when you need to flush the twig while in the element. Then changing the element name for example will only change the end tag, as the start tag will have been output by the time you try to change it.

In that case you can use the StartTagHandlers option when you create the twig, which will call a handler when the start tag of the element is found. The arguments passed to the handler will be the twig and the element. The element will be empty at that point but the attributes will be there.

ex4_0.pl demonstrates the use of StartTagHandlers to change the tags in an XML document.

#!/bin/perl -w

#########################################################################
#                                                                       #
#  This example shows how to use the start_tag_handlers option          #
#  It changes all tag names                                             #
#                                                                       #
#########################################################################

use strict;
use XML::Twig;

# the old_tag => new_tag table
my %change=
  ( stats => 'statistics',
    g     => 'games',
    ppg   => 'points_per_game',
    rpg   => 'rebounds_per_games',
    apg   => 'assists_per_games',
    blk   => 'blocks',
  );

# let's build the start_tag_handlers
my $handlers;
while( my( $old_tag, $new_tag)= each( %change) )                    # each handler
  { $handlers->{$old_tag}= sub { $_[1]->set_gi( $new_tag); }; }     # changes a tag


my $twig= new XML::Twig( start_tag_handlers => $handlers,
                         twig_handlers      => 
                             { '_all_' => sub { $_[0]->flush; } },  # flush all elements
                       ); 

$twig->parsefile( "nba.xml");    # process the twig

The other new feature used in this script is the _all_ keyword in the twig_handlers option. This calls the handler (which in this case just flushes the twig) for every single element in the document. Another keyword, _default_ calls a handler for each element that does not have a handler. _all_ and _default_ can be used both with StartTagHandlers and with the twig_handlers option.

7.2 Purging part of the tree

Sometimes, especially when converting an XML file to several HTML ones it is convenient to purge the twig only up to the next-to-last sibling, not up to the current one. Hence the purge_up_to and flush_up_to methods.

Here is an example of how to use them to list the difference in a given stat between 2 consecutive players. ex4_1.pl can receive the output from ex1_1.pl.

#!/bin/perl -w

use strict;
use XML::Twig;

my $field= shift;

my $twig= new XML::Twig( twig_handlers => { player => \&player } );

if( $ARGV[0]) { $twig->parsefile( $ARGV[0]); }        # parse a file
else          { $twig->parse( \*STDIN);      }        # parse the standard input

sub player
  { my( $t, $player)= @_;
    my $prev_player= $player->prev_sibling || return; # no previous player
    my $player_name= field( $player, 'name');         # get players info
    my $prev_player_name= field( $prev_player, 'name');
    my $player_ppg= field( $player, $field);
    my $prev_player_ppg= field( $prev_player, $field);
    my $diff= $prev_player_ppg - $player_ppg;         # compute the stat difference
    print "$field difference $prev_player_name - $player_name: $diff\n";
    $t->purge_up_to( $prev_player);                   # keep the current player
  }

sub field                                             # get a field for a player
  { my( $player, $field)= @_;
    return $player->first_child( $field)->text;
  }
    

7.3 Fun with overloading

I just thought I'd mention, because I think it's cool, that you can overload the comparison operators to use the cmp method to compare elements in a twig.

So just insert these lines in your script:

package XML::Twig::Elt;

use overload  cmp  => \&cmp,
             'lt'  => \&lt,
             'le'  => \&le,
             'gt'  => \&gt,
             'ge'  => \&ge,
             '+='  => \&suffix,
             '-='  => \&prefix,
             '>>'  => \&suffix,
             '<<'  => \&prefix,
             fallback => 1,
;

Then you will be able to write if( $elt1 le $elt2) { print "$elt1 is before $elt2\n"; }. As an added bonus you get 2 new ways to prefix or suffix an element:

$elt += "suffix";
$elt -= "prefix";
$elt << "prefix";
$elt >> "suffix";

This is just syntactic sugar, and IMHO pretty useless (hence it is not included in the module), plus it slows the module down by a good 30%. It's cute though, and if you don't care about speed and need to do a lot of comparisons of elements it can be handy.


Previous
6. Other features
Table of Content
Table of Content
Next
8. Under the hood