Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

HTML::Parser and "Invalid foo tag"

by dragonchild (Archbishop)
on Oct 24, 2008 at 13:24 UTC ( #719339=perlquestion: print w/ replies, xml ) Need Help??
dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

I'm using HTML::Parser and am getting "HTML parser error : Tag foo invalid" errors. I am parsing non-HTML that's formatted as HTML. I can't find the place where it checks against a list of known elements. Help?

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Comment on HTML::Parser and "Invalid foo tag"
Replies are listed 'Best First'.
Re: HTML::Parser and "Invalid foo tag"
by JavaFan (Canon) on Oct 24, 2008 at 13:56 UTC
    I don't think HTML::Parser validates against a DTD, or even a list of allowed tags. (Considering it's event based, and can parse chunks, it can't validate anyway - it would need the entire document for that).

    What's more, I can't find anything in HTML::Parser (or in 'strings Parser.so') that even remotely matches the error you're getting. Which suggests to me that the error isn't generated by HTML::Parser.

      I say he needs to prove it :)
      #!/usr/bin/perl -- use strict; use warnings; use HTML::Parser; my $p = HTML::Parser->new( api_version => 3, default_h => [sub{print join ' | ', grep defined, @_,"\n" },"event +,tag,text,"], # strict_names => 1, xml_mode => 1, ); $p->parse( '<boo><foo><shoo><Moo><COW></BOO> <html><body>hi <br> <a href="1"> hello </a> <boo><foo><shoo><Moo><COW></BOO> </body></html>' ); __END__ start_document | | start | boo | <boo> | start | foo | <foo> | start | shoo | <shoo> | start | Moo | <Moo> | start | COW | <COW> | end | /BOO | </BOO> | text | | start | html | <html> | start | body | <body> | text | hi | start | br | <br> | text | | start | a | <a href="1"> | text | hello | end | /a | </a> | text | | start | boo | <boo> | start | foo | <foo> | start | shoo | <shoo> | start | Moo | <Moo> | start | COW | <COW> | end | /BOO | </BOO> | text | | end | /body | </body> | end | /html | </html> |
      Google suggests its xmllint that complaining.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://719339]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (14)
As of 2015-07-29 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls