Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

XML::LibXML::Reader - wish it had a Simplify function...

by KalTorak (Initiate)
on Jul 28, 2009 at 16:29 UTC ( #783965=perlquestion: print w/ replies, xml ) Need Help??
KalTorak has asked for the wisdom of the Perl Monks concerning the following question:

I've been parsing XML with XML::Simple. Loved it. But alas! I ran into XML that was too big to bring into memory that way.

So I've moved to XML::LibXML::Reader, as my task maps fairly well to pull-parsing. But there are a few elements I hit where it would be really convenient to have a function call that would return an XML::Simple-style hash for my element and everything below it.

Am I missing a way to do this - maybe by going through a DOM node? Or am I gonna have to roll my own if I want this behavior?

Comment on XML::LibXML::Reader - wish it had a Simplify function...
Re: XML::LibXML::Reader - wish it had a Simplify function...
by mirod (Canon) on Jul 28, 2009 at 17:20 UTC

    I can see a few ways to do this:

    • stringify your XML fragment and feed it to XML::Simple,
    • emit SAX elements from your fragments and have XML::Simple read them,
    • use XML::Twig and use the simplify method on fragments
    • "port" XML::Simple to XML::LibXML. That's what I did for XML::Twig and it is about 200 lines of code

        I am not sure that this would solve the problem of "XML that was too big to bring into memory that way.".

Re: XML::LibXML::Reader - wish it had a Simplify function...
by Jenda (Abbot) on Jul 31, 2009 at 12:17 UTC
    1. Install XML::Rules
    2. Run XML::Rules->inferRulesFromExample() on a (few) example(s) to obtain the base set of rules (using them you end up with a datastructure that looks very similar or exactly the same as the one you get from XML::Simple).
    3. Tweak the ruleset to skip the tags you are not interested in or convert a list of <field name="...">...value...</field> style tags into a hash etc.
    4. If the data are still too big, change the rules for some tags to process the data at whatever level that's convenient. The subroutines you specify as the rules for those tags will receive all the data from that tag and children "simplified" according to the rules of the child tags.

    It's "push" rather than "pull" as soon as you start using the subroutine rules, but it's pretty powerful once you get used to the style.

    Enoch was right!
    Enjoy the last years of Rome.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://783965]
Approved by psini
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2014-07-12 18:58 GMT
Find Nodes?
    Voting Booth?

    When choosing user names for websites, I prefer to use:

    Results (240 votes), past polls