good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
The easiest way is to just bring everything into memory and deal with it. In CB, you said that you don't have a TB of RAM, so I'm assuming these files are GB+ in size. At which point, I'm wondering WTF they're doing in XML :-) I also don't quite follow how you want to do the comparison. Is it just the text of certain nodes? The text of all nodes? XML::Twig allows you to flush the in-memory representation, freeing up all the memory used thus far, but whether you can do that really depends on how you're thinking of doing the comparison. With line-record-based text, it's fairly obvious. With XML, the definition of "record" is much less clear in general - only you know the specifics. As I said in CB, I'd consider turning XML::Twig on its head with Coro. It looks like you should be able to turn XML::Parser on its head, too. But, either way, you'll likely have to turn them on their heads. Warning, the following code is COMPLETELY untested. Channels may be required instead of rouse_wait'ing all the time. I'm not sure if this properly deals with end-of-files, but I think so. Like I said, UNTESTED. Be sure to have proper twig flushing (I think the [1] items will be the twig reference) so that you don't use all your RAM (if this isn't a problem, then don't use this at all - just suck the whole files in!). In reply to Re: Processing Two XML Files in Parallel
by Tanktalus
|
|