|We don't bite newbies here... much|
updating big XML filesby dHarry (Abbot)
|on Jul 18, 2008 at 15:02 UTC||Need Help??|
dHarry has asked for the wisdom of the Perl Monks concerning the following question:
A tool used to validate scientific data sets (many GB’s) spits out an XML file full of stuff. The XML file can get rather big (hundreds of MB’s). Typically several sessions are needed to validate a data set and all is recorded in the XML file. When for example errors are fixed in the data set and the tool is rerun the XML file gets updated, at least that was the idea.
Most (if not all?) solutions use the DOM approach. Slurp everything in memory into some data structure, manipulate the data structure and write it back to disk. But with big files this is not workable.
Some of the options mentioned/thought-up:
Long ago, in the distant past, I created a Java based solution, parsing large files with SAX and generating DOM trees on-the-fly which were manipulated. I must be getting senile because it seems to have vanished from my memory.
Does anybody know off a more memory friendly (read non-DOM), preferably XML-like, solution? I would like to use an event based parser and update the XML file when needed. Maybe I am asking for too much?