Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Fastest XML Parser for BIG files

by Doctrin (Beadle)
on Jul 21, 2013 at 13:30 UTC ( #1045503=perlquestion: print w/replies, xml ) Need Help??
Doctrin has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear Monks. Can anyone tell which Perl module is the fastest one to parse really big xml file (about 5 GB, 6 million nodes)? I mean node-by-node parsers, of course. I think XML::Twig would do the trick, but I'm not sure it is the fastest one... Thanks

Replies are listed 'Best First'.
Re: Fastest XML Parser for BIG files
by daxim (Chaplain) on Jul 21, 2013 at 13:34 UTC
Re: Fastest XML Parser for BIG files
by ambrus (Abbot) on Jul 21, 2013 at 20:07 UTC

    Could you tell a bit more about your task besides the size of the file? Do you want to process all or most of the data in the xml file in some way, such as making aggregate statistics or converting to a different format? Or do you instead want to find only a few nodes that are easy to recognize in the XML without too much extra processing?

      I need to make a complex processing of each node, retreiving some sub-nodes' values and attributes and then making db queries with them.
Re: Fastest XML Parser for BIG files
by Discipulus (Monsignor) on Jul 22, 2013 at 07:50 UTC
    I'm new in XML processing (really i'm new quite on everithing!) but I remember a 13 factor when the twig is created.

    I found also this speed comparison you could find interesting.
    The summaized results are:
    If you want high performance: XML::Parser
    If you want relatively easy, memory efficient parsing of huge files: XML::Twig
    If you want easy-to-implement for small files: XML::Simple
    If you want to have a bad deal: XML::Smart


    there are no rules, there are no thumbs..
Re: Fastest XML Parser for BIG files
by Preceptor (Deacon) on Jul 21, 2013 at 19:10 UTC

    Can't comment on speed, but I find XML::Twig's capability to do twig->purge to free memory as you go to be invaluable, once you start parsing large files - I seem to recall the rule of thumb is that you need to assume 10x memory overhead when XML parsing.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045503]
Approved by Happy-the-monk
[marto]: "no it's not that"...."weirdos "...
[marto]: List EXE_FILES installed by CPAN so a couple of people suggest that your code looks obfuscated. I'd have to ageree, from the perspective of those who can't follow all of that one liner, it doesn't read well
Veltro is a weirdo, obsessed with whitespace
[marto]: to use the word "obsession" when so few people have said so little about it is grasping at staws
marto wishes tye was around, he's so much better at this sort of thing
usemodperl likes tye!
usemodperl tye  too
[choroba]: Re^3: LiBXML: New markup while preserving earlier tags? would benefit from a couple of test cases
usemodperl meant tye 
usemodperl pokes Veltro with line noise

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2018-06-24 16:01 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.