http://www.perlmonks.org?node_id=457494


in reply to Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

Okay, for a start I'm not mentioning the 2GB file size issue, that's been covered well enough already. I'm just touching XML::Twig itself.

Looking at the docs for XML::Twig it looks like it is capable of handling very large XML files by not reading them into memory in one go. Unfortunately I don't think your code does this, you don't set up the handlers and hence it tries to load the entire XML tree into memory. Boom, that'd need 20GB of memory.

Reread the docs on XML::Twig, look at the bit on "Processing an XML document chunk by chunk". You need to guarantee you don't have too much in memory at any one time, I hope this is a document built up of lots of small chunks or you're in for an even larger challenge.

I'll admit that personally I'd be using a full SAX parser at this point in any case, from what I've seen from my cursory look at XML::Twig does it doesn't look much simpler than trying to do it that way. It's all just handlers and callbacks at the end of the day.

As for which SAX parser I'd use I really don't know. I'd normally use >XML::LibXML, but I'm not sure how that'll work on Windows so I can't comment there.

  • Comment on Re: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000