Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: XML::Parser::Expat and non conforming XML

by nicholasrperez (Monk)
on Nov 09, 2006 at 04:35 UTC ( [id://583040]=note: print w/replies, xml ) Need Help??


in reply to XML::Parser::Expat and non conforming XML

It would probably make more sense to parse the file as though it were a stream instead of trying to swallow the beast whole. I recommend setting up a SAX handler, feeding X::P::E a root tag, and then start feeding it lines from the gzip'd file.

When I say "feeding it", that should read "use XML::Parser::ExpatNB and its parse_more() method." Then in the SAX handler, you can build up your own data structure, based on "depth" within the document. Depth meaning when you actually want data (ie. if you want to do processing on particular children nodes inside particular "top level" nodes) instead of filling up your ram with a giant DOM.

What you are essentially describing is a Jabber IM session (an immense XML document) and this is a solved problem. Not to pimp my own code, but you could take a look at POE::Filter::XML for how this whole feed-the-parser thing can be implemented (with the usual caveats: YMMV, HTH, etc).
  • Comment on Re: XML::Parser::Expat and non conforming XML

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://583040]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-24 02:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found