Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

best xml parser to use

by ftumsh (Scribe)
on Jul 06, 2006 at 18:03 UTC ( #559636=perlquestion: print w/replies, xml ) Need Help??
ftumsh has asked for the wisdom of the Perl Monks concerning the following question:

Lo all,
Given that
1) the xml I have to use is quite simple (no cdata or PI etc)
2) but possibly large (< 5meg)
3) I need to know a tag before it's children are parsed
4) have a small memory footprint
5) be very fast
6) Linux only

Should I be using XML::Parser or XML::LibXML?

Thx
John

Replies are listed 'Best First'.
Re: best xml parser to use
by Tanktalus (Canon) on Jul 06, 2006 at 18:07 UTC

    My general decision tree goes sorta like this (non-XML parts removed):

    /----------\ +-----------+ < Parse XML? > --NO>-- | (removed) | \----------/ +-----------+ | YES V | +---------------+ | Use XML::Twig | +---------------+
    Hopefully this decision tree helps you decide what is best.

    ;-)

      heh :)
      I normally always use xml::twig, but I don't actually know the format of xml so I can't in this case. I've been googling and I'm going to try xml::sax.

      I appear to have only got xml::libxml::sax, is this good enough? the pod mention it might not be any good for production use. What others could I use?

        What do you mean by "I don't actually know the format of xml"? How will switching parsers (though, technically, XML::Twig is a front-end, not a parser itself) fix that?
      if ($xml->is_simple() and $xml->is_table_like()) { use XML::RAX; } elsif (size($xml) < too_big()) { use XML::Simple; } else { use XML::Twig; ... my $Data = $DataObj->simplify(forcearray => [...], keyattr =>{ ... } +, group_tags => {...}); }

      Update 2007-2-6: Tastes change. While I would probably still use XML::RAX for some tasks with table-like XML and XML::Simple for very simple XMLs, I'd most probably use my XML::Rules now. May look a bit twisted at first, but it's convenient and powerfull. IMHO of course ;-)

Re: best xml parser to use
by planetscape (Chancellor) on Jul 06, 2006 at 19:28 UTC

      Agreed, although I sometimes still use XML::TreeBuilder.

      Love Tanktalus's decision tree though. :)


      DWIM is Perl's answer to Gödel
Re: best xml parser to use
by xdg (Monsignor) on Jul 10, 2006 at 12:54 UTC

    I'm not sure how it stacks up against these criteria, but if you're not 100% sure that your incoming data is entirely valid, you might want to check out XML::Liberal (a standin for LibXML). Avoiding re-parsing a large XML file because of a small nit might be worthwhile form of efficiency.

    The author gave a nice Lightning Talk about it at YAPC::NA.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: best xml parser to use
by coreolyn (Parson) on Jul 06, 2006 at 19:01 UTC

    It seems to me that performance is so much better just parsing xml with regex's I quit caring that it's xml. I might add some coding time to my development but I really don't see the overall benifit to xml modules unless I have to provide xml output -- even then it's 'iffy'.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://559636]
Approved by Tanktalus
help
Chatterbox?
[marioroy]: I also have a Hobo driver for Forklift allowing folks to use in multiple classes, no conflicts with one another. That's not possible for P::FM.
[Discipulus]: congrats marioroy!
[marioroy]: CORE::wait works well eventhough multiple instances or classes using Hobo::Manager.
[Corion]: marioroy: I'm not sure what the normal use for the PID is in P:FM, but I guess that most programs just ignore or log it
[Corion]: Oh, yes, programs could call wait $pid, but if your $pid is an object, then you could add a ->wait method to it and wait $pid would call that automatically "thanks" to indirect object notation
[marioroy]: Just documentation edits is all that remains. Hobo::Simple provides foreach and forseq with identifier capability -- all transparently supporting array, hash, file handle, and seq 1 .. N.
[marioroy]: Corion Regarding PID, that's great. So will leave it so compatible with MCE::Hobo. e.g. ->create returns a Hobo object. Folks can get ->pid from it. So, that's not a problem.
[choroba]: ad readdir: 5.12 needed
[marioroy]: CORE::wait can block if another process reaps a worker from another class. MCE::Hobo takes care of that and transparently.
[Discipulus]: thanks choroba i'll update my answer

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2017-05-26 08:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?