XML, the Perl Way

Simple Perl XML Benchmark

This article presents a simple benchmark using various Perl XML modules (and some non-Perl solutions).

You will need a whole bunch of modules installed to test all of them. The test framework itself needs Getopt::Long, Benchmark, Text::Diff, XML::SemanticDiff and XML::Twig (and probably some more I have forgotten!). In order to run the XSLT examples you will need libxml2 install (which you will also need to you run all the XML::LibXML-based examples).

Remember that performance is just one (small) part of why you choose a solution: ease of use and power are also very important (for example the regexp based solutions are not generic and would not work for different XML files).

The tests

4 different tests are performed:

  1. nothing: just parse the document and output it back,
  2. extract: extract the content (text only) of all elements with a given name (message),
  3. replace: prefix the content of all elements with a given name (message) with a text (including the element number), then output the document back,
  4. complex: an element (process) has an attribute action used to perform an action on the element: delete, duplicate, change_tag (for a new, fixed, one), erase (the tags, not their content), prefix (with a new element) or add_att (add a fixed new element), the document is then output back as XML.

Not all modules are used for all tests, but hey... there's already a good number of them.

The size is computed by taking the process size (VmSize from proc/$$/status in an END block.

Conclusions

No conclusion yet... but you can draw your own ;--)

Just note that I am not advocating any of the tested solutions. They all work for the problems at hand (except that XML::Simple extracts properly all messages but does not return them in the right order), but that's all. Especially a number of them (mostly those based on regexps, but some of the SAX and XML::Parser ones too) depend on the original XML file being very simple: no entities, no non-7-bit-ascii characters, no comments, PI's, CDATA sections, no DTD, no namespaces, no > in attribute values (which is legal!), no nested message or process elements...

Contributors

I would like to thank Alberto Sim�es for contributing the XML::DT examples (in record time!) and Robin Berjon, Matt Sergeant and Barrie Slaymaker for their comments.

TODO

Instructions

The test file (by default in test.xml) is generated by gen_benchmark, it size is about 3 Mb. You can tweak gen_benchmark to generate other sizes and other types of documents.

Note that you will need a whole bunch of modules and libraries installed to run all the tests:

Results

Results on my machine:

Module versions: XML::DOM 1.43 - XML::DT 0.24 - XML::Filter::BufferText 1.01 - XML::Filter::Dispatcher 0.52 - XML::LibXML has no version in module - XML::LibXML::SAX 1.00 - XML::Parser 2.34 - XML::Parser::Lite 0.55 - XML::SAX::Base 1.04 - XML::SAX::Expat 0.35 - XML::SAX::Machines 0.4 - XML::SAX::PurePerl 0.90 - XML::SAX::Writer 0.44 - XML::SemanticDiff 0.95 - XML::Simple 2.12 - XML::TreeBuilder 3.08 - XML::Twig 3.16

XML document test.xml (3.03M)

read and output the document

perlOK 0 wallclock secs ( 0.06 cusr 0.06 csys = 0.12 CPU)8 444 kB
xmllintOK 0 wallclock secs ( 0.28 cusr 0.03 csys = 0.32 CPU)size na
xsltOK 1 wallclock secs ( 0.51 cusr 0.10 csys = 0.61 CPU)size na
libxmlOK 1 wallclock secs ( 0.55 cusr 0.08 csys = 0.64 CPU)19 404 kB
parserOK 1 wallclock secs ( 0.67 cusr 0.04 csys = 0.71 CPU)6 788 kB
parser_liteOK 1 wallclock secs ( 0.94 cusr 0.09 csys = 1.03 CPU)8 636 kB
parser_streamOK 1 wallclock secs ( 0.99 cusr 0.05 csys = 1.04 CPU)6 792 kB
tree_builderOK 2 wallclock secs ( 2.45 cusr 0.07 csys = 2.53 CPU)23 392 kB
twigOK 3 wallclock secs ( 2.88 cusr 0.09 csys = 2.98 CPU)22 040 kB
dtOK 3 wallclock secs ( 2.83 cusr 0.22 csys = 3.06 CPU)43 180 kB
sax_base_libxmlOK 6 wallclock secs ( 5.51 cusr 0.10 csys = 5.62 CPU)10 964 kB
sax_base_expatOK 7 wallclock secs ( 6.78 cusr 0.14 csys = 6.93 CPU)9 064 kB
xml_ppOK 9 wallclock secs ( 8.19 cusr 0.18 csys = 8.37 CPU)size na
sax_base_pureperlOK33 wallclock secs ( 30.38 cusr 0.31 csys = 30.69 CPU)10 376 kB

extracting the text of all elements message

regexpOK 0 wallclock secs ( 0.10 cusr 0.04 csys = 0.14 CPU)8 356 kB
xsltOK 1 wallclock secs ( 0.24 cusr 0.06 csys = 0.31 CPU)size na
parserOK 1 wallclock secs ( 0.37 cusr 0.03 csys = 0.40 CPU)6 788 kB
libxmlOK 1 wallclock secs ( 0.37 cusr 0.06 csys = 0.43 CPU)16 708 kB
parser_liteOK 1 wallclock secs ( 0.85 cusr 0.04 csys = 0.89 CPU)8 636 kB
parser_streamOK 1 wallclock secs ( 0.89 cusr 0.05 csys = 0.94 CPU)6 796 kB
sax_libxmlOK 1 wallclock secs ( 1.17 cusr 0.01 csys = 1.18 CPU)9 856 kB
sax_base_libxmlOK 1 wallclock secs ( 1.17 cusr 0.02 csys = 1.19 CPU)9 908 kB
tree_builderOK 1 wallclock secs ( 1.20 cusr 0.05 csys = 1.26 CPU)16 244 kB
twigOK 2 wallclock secs ( 1.50 cusr 0.05 csys = 1.56 CPU)11 108 kB
xml_grepOK 1 wallclock secs ( 1.66 cusr 0.06 csys = 1.73 CPU)size na
sax_base_expatOK 3 wallclock secs ( 2.36 cusr 0.09 csys = 2.45 CPU)8 104 kB
sax_expatOK 3 wallclock secs ( 2.41 cusr 0.06 csys = 2.47 CPU)8 100 kB
filter_dispatcherOK 2 wallclock secs ( 2.45 cusr 0.05 csys = 2.50 CPU)size na
simpleNOK 3 wallclock secs ( 2.53 cusr 0.09 csys = 2.62 CPU)22 116 kB
dtOK 3 wallclock secs ( 2.73 cusr 0.15 csys = 2.88 CPU)37 772 kB
domOK 3 wallclock secs ( 3.60 cusr 0.12 csys = 3.72 CPU)41 140 kB
sax_base_pureperlOK23 wallclock secs ( 22.80 cusr 0.09 csys = 22.89 CPU)9 236 kB

prefixing the text of all element messages by the message number

regexpOK 1 wallclock secs ( 0.16 cusr 0.06 csys = 0.22 CPU)14 680 kB
libxmlOK 1 wallclock secs ( 0.65 cusr 0.09 csys = 0.75 CPU)19 528 kB
parser_liteOK 1 wallclock secs ( 0.94 cusr 0.09 csys = 1.03 CPU)8 636 kB
parser_streamOK 2 wallclock secs ( 1.11 cusr 0.05 csys = 1.16 CPU)6 796 kB
twigOK 2 wallclock secs ( 2.00 cusr 0.18 csys = 2.19 CPU)11 228 kB
tree_builderOK 3 wallclock secs ( 2.61 cusr 0.05 csys = 2.67 CPU)23 480 kB
dtOK 3 wallclock secs ( 2.91 cusr 0.14 csys = 3.06 CPU)43 680 kB
sax_base_libxmlOK 5 wallclock secs ( 5.57 cusr 0.15 csys = 5.73 CPU)10 968 kB
xsltOK 6 wallclock secs ( 5.97 cusr 0.12 csys = 6.10 CPU)size na
domOK 6 wallclock secs ( 6.01 cusr 0.20 csys = 6.22 CPU)44 176 kB
filter_dispatcherOK 9 wallclock secs ( 9.20 cusr 0.18 csys = 9.38 CPU)size na

complex transformation

regexpOK 0 wallclock secs ( 0.22 cusr 0.16 csys = 0.39 CPU)27 072 kB
xsltOK 1 wallclock secs ( 0.58 cusr 0.04 csys = 0.63 CPU)size na
libxmlOK 1 wallclock secs ( 0.62 cusr 0.08 csys = 0.70 CPU)20 172 kB
tree_builderOK 3 wallclock secs ( 2.57 cusr 0.11 csys = 2.68 CPU)23 308 kB
twig_smallOK 3 wallclock secs ( 2.97 cusr 0.10 csys = 3.07 CPU)18 968 kB
dt_compactOK 3 wallclock secs ( 2.94 cusr 0.16 csys = 3.11 CPU)43 804 kB
twig_smallestOK 4 wallclock secs ( 3.07 cusr 0.05 csys = 3.12 CPU)10 200 kB
dtOK 3 wallclock secs ( 2.98 cusr 0.18 csys = 3.16 CPU)44 296 kB
twigOK 4 wallclock secs ( 3.39 cusr 0.10 csys = 3.49 CPU)22 156 kB
libxml_by_stepOK 4 wallclock secs ( 3.37 cusr 0.44 csys = 3.81 CPU)30 200 kB
sax_base_libxmlOK 7 wallclock secs ( 6.85 cusr 0.09 csys = 6.94 CPU)12 728 kB

Notes:
[1] In the extraction test XML::Simple returns all the messages, but not in the right order.
[2] The version of XML::SAX::PurePerl is the CVS version, which performs about 3 times faster than the CPAN one.


updated Mon Nov 15 14:18:58 2004 home Copyright � 2003, Michel Rodriguez