Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
If you are trying "to find if there are any mismatched tags", that sounds like you are looking for errors that would cause an XML parser to fail (and it appears that the sample xml data you posted has this kind of problem, so I understand your goal now).

But what that really means is that you can't really use an XML parser at all to solve this problem. As pointed out above, it's easy enough to check for xml errors using xmllint, although the error reports you get can sometimes be difficult to interpret, and the actual problem can still be hard to spot.

I would be inclined to use a regex-based diagnosis - something like this:

#!/usr/bin/perl use strict; use warnings; my $infile = shift; # get input file name from @ARGV open( my $fh, "<:utf8", $infile ) or die $!; local $/; # slurp the whole file in the next line $_ = <$fh>; s/^<\?.*>\s+//; # ditch the "<?xml...?>" line, if any my %open_tags; my %close_tags; for my $tkn (split/(?<=>)|(?=<)/) { # split on look-behind | look-ahe +ad for brackets if ( $tkn =~ m{^<(\/?)(\w+)} ) { if ( $1 eq '' ) { $open_tags{$2}++; } else { $close_tags{$2}++; } } } for my $tag ( sort keys %open_tags ) { if ( ! exists( $close_tags{$tag} )) { warn sprintf( "%s: open tag %s is never closed in %d occurrenc +e(s)\n", $infile, $tag, $open_tags{$tag} ); } else { if ( $close_tags{$tag} != $open_tags{$tag} ) { warn sprintf( "%s: element %s has %d open tags but %d clos +e tag(s)\n", $infile, $tag, $open_tags{$tag}, $close_tags +{$tag} ); } delete $close_tags{$tag}; } } for my $tag ( keys %close_tags ) { warn sprintf( "%s: close tag %s has no open tags in %d occurrence( +s)\n", $infile, $tag, $close_tags{$tag} ); }
That will at least give you a clear tally of imbalances (if any) in the open/close tag inventory for a given xml file. You should be able to use this information, together with the line numbers from the xmllint reports, to locate the problems.

So, when you find these mismatched tags, isn't the next step to look at the process that is creating the xml files, and fix that? (These xml files aren't being created by manual editing, are they??)

(Update: BTW, I forgot to mention... this new information in your reply makes your OP even more egregiously obtuse. If you had said at the beginning, "I have this xml file that has an error in the tags, and I need to figure out how to find the problem," then the discussion would have been more effective. I know, you already feel bad about the OP, and I shouldn't pile it on, but it needs to be said.)


In reply to Re^3: Bug in XML::Parser by graff
in thread Bug in XML::Parser by manunamu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2024-03-28 19:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found