Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Is it wiser to move on from XML::Simple to XML::Compile

by mohan2monks (Sexton)
on Jan 27, 2014 at 07:46 UTC ( #1072179=perlquestion: print w/ replies, xml ) Need Help??
mohan2monks has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have been working with XML::Simple for a quite time now and am very happy using it. However as thing have moved on both in perl and my application i need to change to some other perl XML module. The xml that XML::Simple has to process now are consistently getting bigger in size upto 10MB and complexity causing the xml to hash conversion very slow.

Basically what my code does is connect to multiple vendors via web services with SOAP::Lite (another great perl module) and convert the xml to JSON for browser to display. (May be I could have used XML::XML2JSON for this but all vendors have different formats of xml)
XML -> PERL -> JSON
As the results have to be displayed to browser performance is a big issue.

I have been going through CPAN, and off course perlmonks trying to read about possible upgrade.
There are very good discussions on perlmonks and a great tutorial too Stepping up from XML::Simple to XML::LibXML for moving from XML::Simple to XML::LibXML. Also i found from perlmonks that XML::LibXML is best way forward.

As i am looking for a possible switch to avoid complete rewrite of code, i need to have a xml to perl hash structure.
My reasoning to consider XML::Compile is that it can give me a perl hash and also i read from XML::Compile documentation that it is based on XML::LibXML and complies to all xml standard.
Another possibility could be that create a template PERL hash of my format and convert it directly to it.
Haven't figured it out yet if that is possible, it would be great if somebody has idea about it.

As conversion has to be fast I tried to benchmark different perl modules which convert xml to hash structure (XML::LibXML is exception in bechmark just included it to see how fast it parses)
I found great info at Benchmarks of XML Parsers
From my test i found that XML::Compile is slower than XML::Simple.
Please see the code below

use XML::LibXML; use XML::Fast; use XML::Simple; use XML::Bare; use XML::Compile::Schema; use Data::Dumper; $XML::Simple::PREFERRED_PARSER='XML::Parser'; # Found this great tip f +rom perlmonks use Benchmark qw/cmpthese/; $doc='XML String '; my $schema = XML::Compile::Schema->new('./myschema.xsd'); my $reader = $schema->compile(READER => '{myns}mytype'); # I have done this outside to compile schema once but not sure if it w +orks like that cmpthese timethese -10, { libxml => sub { XML::LibXML->new->parse_string($doc) }, xmlfast => sub { XML::Fast::xml2hash($doc) }, xmlbare => sub { XML::Bare->new(text => $doc)->parse }, xmlsimple => sub { XML::Simple->new(ForceArray => 0, KeyAttr => + {})->XMLin($doc); }, xmlcompile => sub {my $hash = $reader->("$doc");}, };

Results on my machine

Rate xmlcompile xmlsimple xmlbare xmlfast libx +ml xmlcompile 51.5/s -- -66% -97% -97% -9 +7% xmlsimple 149/s 190% -- -91% -92% -9 +2% xmlbare 1651/s 3107% 1006% -- -11% -1 +1% xmlfast 1846/s 3487% 1137% 12% -- - +0% libxml 1846/s 3487% 1137% 12% 0% +--

I want to know if i am doing things correctly and is XML::Compile really slow?
Please advice..

Found one more probable alternative.

  • Recommended XML::LibXML can be used for parsing efficiency and standard compliance
  • Convert the individual nodes to hash using XML::Hash::LX for ease of use.
  • Cannot be used as plug in replacement to XML::Simple significant code rewrite required.

Module doco says

use XML::Hash::LX; # Usage with XML::LibXML my $doc = XML::LibXML->new->parse_string($xml); my $xp = XML::LibXML::XPathContext->new($doc); $xp->registerNs('rss', 'http://purl.org/rss/1.0/'); # then process xpath for ($xp->findnodes('//rss:item')) { # and convert to hash concrete nodes my $item = xml2hash($_); print Dumper+$item }

This module is a companion for XML::LibXML.
It operates with LibXML objects, could return or accept LibXML objects, and may be used for easy data transformations

It is faster in parsing then XML::Simple, XML::Hash, XML::Twig and of course much slower than XML::Bare ;)

It is faster in composing than XML::Hash, but slower than XML::Simple

Parse benchmark: Rate Simple Hash Twig Hash::LX Bare Simple 11.3/s -- -2% -16% -44% -97% Hash 11.6/s 2% -- -14% -43% -97% Twig 13.5/s 19% 16% -- -34% -96% Hash::LX 20.3/s 79% 75% 51% -- -95% Bare 370/s 3162% 3088% 2650% 1721% --

Comment on Is it wiser to move on from XML::Simple to XML::Compile
Select or Download Code
Re: Is it wiser to move on from XML::Simple to XML::Compile ( XML::Hash::XS )
by Anonymous Monk on Jan 27, 2014 at 08:59 UTC

    I want to know if i am doing things correctly and is XML::Compile really slow?

    You appear to be doing things correctly ... if all the modules do what you want with the code you included

    And yes, XML::Compile is probably that slow :) OTOH it does more than the other modules (and does it in perl code), so the slowness isn't unexpected

    See http://www.xmltwig.org/article/simple_benchmark/

    As the results have to be displayed to browser performance is a big issue.

    Blah blah blah ... "loading data" spin cursors were invented for a reason :D

    Please advice..

    My advice (that I'm advising you with) is to try XML::Rules its XML::Simple on steriods :)

    Also, profile your code (nytprof) to identify other bottlenecks :)

    Also try XML::Hash::XS -- its libxml2+xs -- should be as fast as you can possibly get

    See also http://xmlbench.sourceforge.net/results/benchmark200910/index.html

    Good luck

      Thanks for reply.
      I did check XML::Rules first, need to explore it further.
      Did profile and found two bottlenecks

      1. Waiting for web service to reply (if it can be called bottleneck of code Nytprof does not consider IO waiting time )
      2. Time Taken by XML::Simple
        # spent 66.0s (147ms+65.9) within XML::Simple::XMLin which was called: + # once (147ms+65.9s) by main::RUNTIME

        XML::Rules might be slightly quicker than XML::Simple, because it doesn't try to guess what would be the best way to convert the XML into a data structure, but that's not the main reason you should switch. The problem with XML::Simple is that (even if you do specify some options) the datastructure it generates is not consistent. Things like optional attributes and repeated tags may cause hard-to-handle inconsistencies. Well set XML::Rules will give you a consistent datastructure and allow you to ignore and skip data you are not interested in. This may lessen the memory footprint and speed things up.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Re: Is it wiser to move on from XML::Simple to XML::Compile
by markov (Scribe) on Jan 27, 2014 at 09:18 UTC

    XML::Compile does not belong in this list: it interprets the output of XML::LibXML, where the other modules are mainly only parsers.

    Have a look at XML::LibXML::Simple, which is XML::Simple based on XML::LibXML. How does that compare? And XML::Twig?

    It seems your benchmark hits some other resource limit for XML::Fast/XML::LibXML. Probably you need a larger XML source in memory to compare the three fasters parsers.

      Thanks for reply.
      I agree XML::Compile does not belong in this comparison.
      This comparison should not be considered as comparisons of performance of different parsers.
      I was just trying to compare these for my case and really think XML::Compile a great code. It supports all schema validations etc and fits well for my case too. I was looking for a possible plug in to XML::Simple with better performance.
      I had checked XML::LibXML::Simple too but not of much difference as both take much time in converting to perl hash than actual parsing.
      Here is result for two

      s/iter xmllibxmlsimple xmlsimple xmllibxmlsimple 22.8 -- -3% xmlsimple 22.2 3% --

        XML::Fast is also creating HASHes. The difference might be that it construct those in XS, where ::Simple need to cross the expensive XS <-> Perl border for each node.
Re: Is it wiser to move on from XML::Simple to XML::Compile
by stonecolddevin (Vicar) on Jan 29, 2014 at 19:46 UTC

    https://metacpan.org/pod/XML::SAX::ExpatXS is super duper fast at the cost of some syntax sugar.

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

      It is already covered here XML::Simple Benchmarks with various backends

      Added to my list

      ---------- perl ---------- Rate xmlcompile xmlsimpleExpatXS xmlsimpleParser xm +lbare xmlfast libxml xmlcompile 50.6/s -- -50% -65% + -97% -97% -97% xmlsimpleExpatXS 101/s 100% -- -30% + -94% -94% -94% xmlsimpleParser 144/s 186% 43% -- + -91% -92% -92% xmlbare 1632/s 3127% 1516% 1029% + -- -10% -11% xmlfast 1814/s 3487% 1696% 1156% + 11% -- -1% libxml 1835/s 3529% 1717% 1170% + 12% 1% -- Normal Termination Output completed (1 min 15 sec consumed).
      Still searching....

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1072179]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2014-07-23 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (133 votes), past polls