Re: Benchmarks of XML Parsers

A while ago I had done a similar benchmark comparing (older versions of) XML::Bare (v0.11) and XML::LibXML (can't remember the version — whatever was current in summer 2007), because I had been looking for a parser for tiny, simple-structured XML that would be similarly fast as XML::LibXML, but easier to install/distribute. And XML::Bare explicitly did claim to be very fast.

The results at the time were that XML::Bare was in fact more than twice as fast as XML::LibXML. So I was interested in how they would compare these days. Here are the results:

info: Parsing with XML::Bare 0.43 appears successful
info: Parsing with XML::LibXML 1.69 appears successful

             Rate   XML::Bare XML::LibXML
XML::Bare   655/s          --        -31%
XML::LibXML 953/s         45%          --
[download]

In other words, either XML::LibXML has gotten significantly faster since then, or XML::Bare slower...

(It might be worth noting that - without a clear idea of what data to extract - this is kind of comparing apples and oranges, as XML::Bare creates a 'ready-to-use' Perl data structure similar to XML::Simple, while the doc object returned by XML::LibXML would need to be traversed using a variety of dedicated method calls. Similarly, both modules are hard to compare in that XML::LibXML is definitely a lot richer in features.)

For the record, here's the modified find_parsers() routine I used (otherwise I left ikegami's code as is):

sub find_parsers {
   my @parsers;

   if (!load_module('XML::Bare')) {
      warn("warn: XML::Bare not available\n");
   } else {
      push @parsers, [
         'XML::Bare',
         get_parser_desc_name('XML::Bare'),
         sub { XML::Bare->new(text => $xml)->parse() }
      ];
   }

   if (!load_module('XML::LibXML')) {
      warn("warn: XML::LibXML not available\n");
   } else {
      push @parsers, [
         'XML::LibXML',
         get_parser_desc_name('XML::LibXML'),
         sub { XML::LibXML->new()->parse_string($xml) }
      ];
   }
   return \@parsers;
}
[download]

(the XML::Bare object needs to be recreated for every parse, so to be fair I did same on the XML::LibXML side — which doesn't make a huge difference for XML::LibXML, btw, just 3%)

As XML input for the above results I used the book.xml file (23K, simple structure) from this collection of sample files. This doesn't seem to be crucial, though, as tests with other input did show a similar trend.

Comment on Re: Benchmarks of XML Parsers Select or Download Code

Replies are listed 'Best First'.
Re^2: Benchmarks of XML Parsers by ikegami (Patriarch) on Apr 28, 2009 at 21:43 UTC
I was going to add XML::Bare to the benchmark until I noticed it was returning garbage (as shown here). I think it expects to be handed decoded XML. That's odd, since you need to parse the XML doc to figure out the encoding that was used. Anyway, to make the benchmark fair, you'd have to include the necessary step of decoding the XML for XML::Bare. Encoding tests code	[reply] [d/l]
Re^3: Benchmarks of XML Parsers by almut (Canon) on Apr 28, 2009 at 22:34 UTC
I think it expects to be handed decoded XML. I've personally never used it with anything but ISO-Latin-1 (and haven't encountered any problems so far in this regard). But I think it's true it doesn't properly handle unicode, at least not multibyte encodings like UTF-16. OTOH, I just converted an ISO-Latin-1 XML file to UTF-8 (and changed the `"encoding=..."`, of course — though that simply appears to be ignored), and it seems to "work" at least in that - when I Data::Dumper the created object - the appropriate chars are passed through unmodified (encoded) — which probably is because it doesn't do any decoding at all, and simply treats everything as bytes... (part of the less-features-for-speed concept, I guess)	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks