|Perl: the Markov chain saw|
Re^6: How to Truncate Corrupt Document.xml Files?by socrtwo (Sexton)
|on Feb 16, 2012 at 18:08 UTC||Need Help??|
I constructed the beginnings of a script that is supposed to keep a running total non-ended tags with the XML::PARSER. The problem is that XML::PARSER errors out when XML is defective, which is exactly when I want the the rest of the script to work. So I'm assuming that I have to switch to SAX so the script will run as a stream and add and subtract to the array until it hits XML corruption as you were originally suggesting I expect.
So here's the script with XML::PARSER that doesn't run when validation problems exist. When they don't exist it returns nothing for the @tags array which should be correct.
On another crucial for me subject I'd expect...why are externally initiated arrays available outside a subroutine like the @tags available above in a script but not in a module like below?:
The print @tags line doesn't return anything when outside the subroutine, but it would if it were in a script.
It looks like I was reinventing the wheel. Xmllint will reliably putting the correct ending tags on corrupt XML with --recover command. I did find a case though where its truncation and ending tag solutions didn't suit MS Word. So what I want to do know is figure out how to truncate an XML file a configurable amount of characters before the first error, and then apply the command line xmllint --recover.