Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

by Molt (Chaplain)
on May 16, 2005 at 15:25 UTC ( #457494=note: print w/ replies, xml ) Need Help??


in reply to Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

Okay, for a start I'm not mentioning the 2GB file size issue, that's been covered well enough already. I'm just touching XML::Twig itself.

Looking at the docs for XML::Twig it looks like it is capable of handling very large XML files by not reading them into memory in one go. Unfortunately I don't think your code does this, you don't set up the handlers and hence it tries to load the entire XML tree into memory. Boom, that'd need 20GB of memory.

Reread the docs on XML::Twig, look at the bit on "Processing an XML document chunk by chunk". You need to guarantee you don't have too much in memory at any one time, I hope this is a document built up of lots of small chunks or you're in for an even larger challenge.

I'll admit that personally I'd be using a full SAX parser at this point in any case, from what I've seen from my cursory look at XML::Twig does it doesn't look much simpler than trying to do it that way. It's all just handlers and callbacks at the end of the day.

As for which SAX parser I'd use I really don't know. I'd normally use >XML::LibXML, but I'm not sure how that'll work on Windows so I can't comment there.


Comment on Re: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://457494]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2014-09-16 22:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (51 votes), past polls