http://www.perlmonks.org?node_id=1011780

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perlmonks,
I would like the below script to outpout all XML errors and not stop at the first instance only:
#!/usr/bin/perl use XML::Parser; my $xmlfile = shift @ARGV; # the file to parse # initialize parser object and parse the string my $parser = XML::Parser->new( ErrorContext => 2 ); eval { $parser->parsefile( $xmlfile ); }; # report any error that stopped parsing, or announce success if( $@ ) { $@ =~ s/at \/.*?$//s; # remove module line number print STDERR "\nERROR in '$xmlfile':\n$@\n"; } else { print STDERR "'$xmlfile' is well-formed\n"; }

How could this be achieved? Thanks

Replies are listed 'Best First'.
Re: XML::Parser XML validation
by choroba (Cardinal) on Jan 05, 2013 at 11:32 UTC
    The parser cannot know how to fix the first error it finds. The other errors, therefore, might lack information value. Consider the following document:
    <r> <a> <b> <c> </b> </a> </r>
    The problem is the missing </c>. But the parser might tell you:
    Start and end tag mismatch: c and b, line 5. Start and end tag mismatch: b and a, line 6. Start and end tag mismatch: a and r, line 7. Unexpected end of the document, line 7.
    BTW, that's roughly what xmllint would tell you.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thanks for your quick answer. So basically, I will have to fix any error found by the script and then re-run, fix any new error and repeat the procedure until there are no more errors?
        Yes. All bets are off for an XML document with errors. The first error found may actually not be the first error in the document: it is just the place where the parser got stuck.

        Consider the following:

        <a> <d> <e> </e> </b> </a>
        Where is the error? Should <d> be <b> or should </b> be </d>? Or perhaps they should both be <f> ... </f> tags? How is the parser to know? To be sure the XML is not only well formed but also correct, you will have to validate the XML according to a DTD or Schema definition.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
Re: XML::Parser XML validation
by Anonymous Monk on Jan 05, 2013 at 12:43 UTC
    Don't go looking to reinvent xmllint, go looking how to install xmllint
Re: XML::Parser XML validation
by karlgoethebier (Abbot) on Jan 05, 2013 at 17:25 UTC

    See also XML validation.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»