Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: XML::Parser::Expat and non conforming XML

by nicholasrperez (Monk)
on Nov 09, 2006 at 04:35 UTC ( [id://583040]=note: print w/replies, xml ) Need Help??

in reply to XML::Parser::Expat and non conforming XML

It would probably make more sense to parse the file as though it were a stream instead of trying to swallow the beast whole. I recommend setting up a SAX handler, feeding X::P::E a root tag, and then start feeding it lines from the gzip'd file.

When I say "feeding it", that should read "use XML::Parser::ExpatNB and its parse_more() method." Then in the SAX handler, you can build up your own data structure, based on "depth" within the document. Depth meaning when you actually want data (ie. if you want to do processing on particular children nodes inside particular "top level" nodes) instead of filling up your ram with a giant DOM.

What you are essentially describing is a Jabber IM session (an immense XML document) and this is a solved problem. Not to pimp my own code, but you could take a look at POE::Filter::XML for how this whole feed-the-parser thing can be implemented (with the usual caveats: YMMV, HTH, etc).
  • Comment on Re: XML::Parser::Expat and non conforming XML

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://583040]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-05-30 20:18 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.