Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: How to Truncate Corrupt Document.xml Files?

by educated_foo (Vicar)
on Feb 16, 2012 at 01:27 UTC ( #954117=note: print w/replies, xml ) Need Help??

in reply to How to Truncate Corrupt Document.xml Files?

I would start by using a streaming (SAX) parser and maintaining a stack of unclosed tags. Have you tried that yet?
  • Comment on Re: How to Truncate Corrupt Document.xml Files?

Replies are listed 'Best First'.
Re^2: How to Truncate Corrupt Document.xml Files?
by socrtwo (Sexton) on Feb 16, 2012 at 02:11 UTC
    I haven't tried that yet. Thanks for heads up. I'm looking at streaming SAX parsing now. I see the Ruby Gem Nokogiri may be well suited for this but there are a lot of SAX modules in Perl and I don't know anything about Ruby at the moment, but I know a little of Perl.
      I don't parse much XML (thank God), but XML::Parser (originally written by Larry Wall) has always been pretty straightforward to use -- just define Start() and End() handlers for a start.

        I read that the SAX parser is not so good for rebuilding the XML document which is what I want to do, unless I use 2 parsing instances, one as a SAX parser to analyze the document.xml file and the other with XML::Parser to actually add the intended end tags and rebuild the document.xml.

        However is there any real benefit to this use of SAX? Can't I just define say a start handler with XML::Parser that adds non self ending tags to an array and then define an end handler that removes tags from the same array. Then maybe at the end of parsing all that would be left in the array would be those tags not found by the end handler and these tags could be added to the end of the xml file in reverse order with last in first out?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://954117]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2018-06-18 08:14 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (109 votes). Check out past polls.