Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Fastest XML Parser for BIG files

by Doctrin (Beadle)
on Jul 21, 2013 at 13:30 UTC ( #1045503=perlquestion: print w/ replies, xml ) Need Help??
Doctrin has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear Monks. Can anyone tell which Perl module is the fastest one to parse really big xml file (about 5 GB, 6 million nodes)? I mean node-by-node parsers, of course. I think XML::Twig would do the trick, but I'm not sure it is the fastest one... Thanks

Comment on Fastest XML Parser for BIG files
Re: Fastest XML Parser for BIG files
by daxim (Chaplain) on Jul 21, 2013 at 13:34 UTC
Re: Fastest XML Parser for BIG files
by Preceptor (Chaplain) on Jul 21, 2013 at 19:10 UTC

    Can't comment on speed, but I find XML::Twig's capability to do twig->purge to free memory as you go to be invaluable, once you start parsing large files - I seem to recall the rule of thumb is that you need to assume 10x memory overhead when XML parsing.

Re: Fastest XML Parser for BIG files
by ambrus (Abbot) on Jul 21, 2013 at 20:07 UTC

    Could you tell a bit more about your task besides the size of the file? Do you want to process all or most of the data in the xml file in some way, such as making aggregate statistics or converting to a different format? Or do you instead want to find only a few nodes that are easy to recognize in the XML without too much extra processing?

      I need to make a complex processing of each node, retreiving some sub-nodes' values and attributes and then making db queries with them.
Re: Fastest XML Parser for BIG files
by Discipulus (Curate) on Jul 22, 2013 at 07:50 UTC
    I'm new in XML processing (really i'm new quite on everithing!) but I remember a 13 factor when the twig is created.

    I found also this speed comparison you could find interesting.
    The summaized results are:
    If you want high performance: XML::Parser
    If you want relatively easy, memory efficient parsing of huge files: XML::Twig
    If you want easy-to-implement for small files: XML::Simple
    If you want to have a bad deal: XML::Smart
    

    L*

    there are no rules, there are no thumbs..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045503]
Approved by Happy-the-monk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-07-31 04:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (244 votes), past polls