http://www.perlmonks.org?node_id=760629

Below are benchmarks of a variety of XML parsers with real world data.

All parsers were used to build a tree, either using XML::Simple or directly in the case of XML::LibXML. No testing was performed on the speed of extracting data from the tree.

Testing aircan (Airfare search results)

info: Parsing with XML::LibXML 1.69 appears successful
info: Parsing with XML::Parser 2.34 (via XML::Simple 2.18) appears successful
info: Parsing with XML::LibXML::SAX 1.69 (via XML::Simple 2.18) appears successful
info: Parsing with XML::LibXML::SAX::Parser 1.69 (via XML::Simple 2.18) appears successful
info: Parsing with XML::SAX::ExpatXS 1.31 (via XML::Simple 2.18) appears successful
info: Parsing with XML::SAX::PurePerl 0.96 (via XML::Simple 2.18) appears successful

                            Rate XML::SAX::PurePerl XML::LibXML::SAX XML::LibXML::SAX::Parser XML::SAX::ExpatXS XML::Parser XML::LibXML
XML::SAX::PurePerl       0.440/s*                --             -51%                     -76%              -88%        -91%        -99%
XML::LibXML::SAX         0.898/s               104%               --                     -50%              -75%        -82%        -98%
XML::LibXML::SAX::Parser  1.80/s               309%             101%                       --              -50%        -64%        -96%
XML::SAX::ExpatXS         3.57/s               712%             298%                      98%                --        -29%        -93%
XML::Parser               5.04/s              1045%             461%                     180%               41%          --        -90%
XML::LibXML               50.8/s             11440%            5555%                    2720%             1321%        907%          --


Testing sysrates (Airfare search results)

info: Parsing with XML::LibXML 1.69 appears successful
info: Parsing with XML::Parser 2.34 (via XML::Simple 2.18) appears successful
info: Parsing with XML::LibXML::SAX 1.69 (via XML::Simple 2.18) appears successful
info: Parsing with XML::LibXML::SAX::Parser 1.69 (via XML::Simple 2.18) appears successful
info: Parsing with XML::SAX::ExpatXS 1.31 (via XML::Simple 2.18) appears successful
info: Parsing with XML::SAX::PurePerl 0.96 (via XML::Simple 2.18) appears successful

                           s/iter XML::LibXML::SAX XML::SAX::PurePerl XML::LibXML::SAX::Parser XML::SAX::ExpatXS XML::Parser XML::LibXML
XML::LibXML::SAX             2.67*              --                -7%                     -61%              -79%        -84%        -99%
XML::SAX::PurePerl           2.47*              8%                 --                     -58%              -77%        -83%        -99%
XML::LibXML::SAX::Parser     1.04             156%               137%                       --              -46%        -60%        -98%
XML::SAX::ExpatXS           0.563             374%               338%                      85%                --        -26%        -96%
XML::Parser                 0.417             541%               493%                     151%               35%          --        -95%
XML::LibXML                0.0223           11874%             10977%                    4582%             2426%       1769%          --


Testing jacob (Some small UTF-16le file)

info: Parsing with XML::LibXML 1.69 appears successful
info: Parsing with XML::Parser 2.34 (via XML::Simple 2.18) appears successful
info: Parsing with XML::LibXML::SAX 1.69 (via XML::Simple 2.18) appears successful
info: Parsing with XML::LibXML::SAX::Parser 1.69 (via XML::Simple 2.18) appears successful
info: Parsing with XML::SAX::ExpatXS 1.31 (via XML::Simple 2.18) appears successful
error: Unable to parse with XML::SAX::PurePerl 0.96 (via XML::Simple 2.18)! UTF-16:Unrecognised BOM 3f3e**

                           Rate XML::LibXML::SAX::Parser XML::LibXML::SAX XML::SAX::ExpatXS XML::Parser XML::LibXML
XML::LibXML::SAX::Parser  108/s                       --             -30%              -47%        -61%        -96%
XML::LibXML::SAX          153/s                      42%               --              -25%        -44%        -95%
XML::SAX::ExpatXS         206/s                      90%              34%                --        -25%        -93%
XML::Parser               275/s                     155%              79%               34%          --        -91%
XML::LibXML              3080/s                    2750%            1908%             1396%       1019%          --

* — Too few iterations for a reliable count. Suffice it to say they were slooooow.

** — Not only is XML::SAX::PurePerl very slow, it has some encoding-related bugs. Do yourself a favour and remove it from your $perl5lib/XML/SAX/ParserDetails.ini file!!

Benchmark code

Links to modules:

Update: Added XML::SAX::ExpatXS.

Replies are listed 'Best First'.
Re: Benchmarks of XML Parsers
by almut (Canon) on Apr 28, 2009 at 20:55 UTC

    A while ago I had done a similar benchmark comparing (older versions of) XML::Bare (v0.11) and XML::LibXML (can't remember the version — whatever was current in summer 2007), because I had been looking for a parser for tiny, simple-structured XML that would be similarly fast as XML::LibXML, but easier to install/distribute. And XML::Bare explicitly did claim to be very fast.

    The results at the time were that XML::Bare was in fact more than twice as fast as XML::LibXML.  So I was interested in how they would compare these days.  Here are the results:

    info: Parsing with XML::Bare 0.43 appears successful info: Parsing with XML::LibXML 1.69 appears successful Rate XML::Bare XML::LibXML XML::Bare 655/s -- -31% XML::LibXML 953/s 45% --

    In other words, either XML::LibXML has gotten significantly faster since then, or XML::Bare slower...

    (It might be worth noting that - without a clear idea of what data to extract - this is kind of comparing apples and oranges, as XML::Bare creates a 'ready-to-use' Perl data structure similar to XML::Simple, while the doc object returned by XML::LibXML would need to be traversed using a variety of dedicated method calls. Similarly, both modules are hard to compare in that XML::LibXML is definitely a lot richer in features.)

    For the record, here's the modified find_parsers() routine I used (otherwise I left ikegami's code as is):

    sub find_parsers { my @parsers; if (!load_module('XML::Bare')) { warn("warn: XML::Bare not available\n"); } else { push @parsers, [ 'XML::Bare', get_parser_desc_name('XML::Bare'), sub { XML::Bare->new(text => $xml)->parse() } ]; } if (!load_module('XML::LibXML')) { warn("warn: XML::LibXML not available\n"); } else { push @parsers, [ 'XML::LibXML', get_parser_desc_name('XML::LibXML'), sub { XML::LibXML->new()->parse_string($xml) } ]; } return \@parsers; }

    (the XML::Bare object needs to be recreated for every parse, so to be fair I did same on the XML::LibXML side — which doesn't make a huge difference for XML::LibXML, btw, just 3%)

    As XML input for the above results I used the book.xml file (23K, simple structure) from this collection of sample files.  This doesn't seem to be crucial, though, as tests with other input did show a similar trend.

      I was going to add XML::Bare to the benchmark until I noticed it was returning garbage (as shown here). I think it expects to be handed decoded XML. That's odd, since you need to parse the XML doc to figure out the encoding that was used. Anyway, to make the benchmark fair, you'd have to include the necessary step of decoding the XML for XML::Bare.

      Encoding tests code

        I think it expects to be handed decoded XML.

        I've personally never used it with anything but ISO-Latin-1 (and haven't encountered any problems so far in this regard).  But I think it's true it doesn't properly handle unicode, at least not multibyte encodings like UTF-16.

        OTOH, I just converted an ISO-Latin-1 XML file to UTF-8 (and changed the "encoding=...", of course — though that simply appears to be ignored), and it seems to "work" at least in that - when I Data::Dumper the created object - the appropriate chars are passed through unmodified (encoded) — which probably is because it doesn't do any decoding at all, and simply treats everything as bytes... (part of the less-features-for-speed concept, I guess)

Re: Benchmarks of XML Parsers
by Anonymous Monk on Apr 28, 2009 at 15:32 UTC
    Which version of expat (for XML::Parser) and xml2(for XML::LibXML) did you use?
      How do I find that out? (Debian system)

      Update: I'm guessing libexpat 1.0.0 and libxml2 2.6.27 from the file names I found in /usr/lib:

      /usr/lib/libexpat.so.0 -> libexpat.so.1 /usr/lib/libexpat.so.1 -> libexpat.so.1.0.0 /usr/lib/libexpat.so.1.0.0 /usr/lib/libxml2.so -> libxml2.so.2.6.27 /usr/lib/libxml2.so.2 -> libxml2.so.2.6.27 /usr/lib/libxml2.so.2.6.27

      Update: um.. The package manager says libexpat1 1.95.8(-3.4) and libxml2 2.6.27(-dfsg-6) were installed. Is libexpat1 something different?

        Not sure, I would check with ldd
        #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; use XML::Parser; for my $so( grep /xml/i, @DynaLoader::dl_shared_objects ){ print "$so\n"; print `ldd $so\n`; } __END__
Re: Benchmarks of XML Parsers
by samtregar (Abbot) on Apr 28, 2009 at 18:25 UTC
    Thanks, good stuff. If you get a chance you might throw in XML::SAX::ExpatXS, which I seem to remember was pretty fast and also has some nice SAX features the others don't.

    I recently working on a project where I replaced HTML::Parser with XML::LibXML using its html parsing mode. The speed improvement was something like 10x (not sure exactly since I made a lot of other changes too).

    -sam

      Added XML::SAX::ExpatXS.

        Cool. Looks like it's still the fastest SAX parser, but that doesn't help so much when you're trying to build a tree in memory. Much faster to do it directly.

        -sam