Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

HTML::Parser and "Invalid foo tag"

by dragonchild (Archbishop)
on Oct 24, 2008 at 13:24 UTC ( #719339=perlquestion: print w/ replies, xml ) Need Help??
dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

I'm using HTML::Parser and am getting "HTML parser error : Tag foo invalid" errors. I am parsing non-HTML that's formatted as HTML. I can't find the place where it checks against a list of known elements. Help?

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Comment on HTML::Parser and "Invalid foo tag"
Re: HTML::Parser and "Invalid foo tag"
by JavaFan (Canon) on Oct 24, 2008 at 13:56 UTC
    I don't think HTML::Parser validates against a DTD, or even a list of allowed tags. (Considering it's event based, and can parse chunks, it can't validate anyway - it would need the entire document for that).

    What's more, I can't find anything in HTML::Parser (or in 'strings Parser.so') that even remotely matches the error you're getting. Which suggests to me that the error isn't generated by HTML::Parser.

      I say he needs to prove it :)
      #!/usr/bin/perl -- use strict; use warnings; use HTML::Parser; my $p = HTML::Parser->new( api_version => 3, default_h => [sub{print join ' | ', grep defined, @_,"\n" },"event +,tag,text,"], # strict_names => 1, xml_mode => 1, ); $p->parse( '<boo><foo><shoo><Moo><COW></BOO> <html><body>hi <br> <a href="1"> hello </a> <boo><foo><shoo><Moo><COW></BOO> </body></html>' ); __END__ start_document | | start | boo | <boo> | start | foo | <foo> | start | shoo | <shoo> | start | Moo | <Moo> | start | COW | <COW> | end | /BOO | </BOO> | text | | start | html | <html> | start | body | <body> | text | hi | start | br | <br> | text | | start | a | <a href="1"> | text | hello | end | /a | </a> | text | | start | boo | <boo> | start | foo | <foo> | start | shoo | <shoo> | start | Moo | <Moo> | start | COW | <COW> | end | /BOO | </BOO> | text | | end | /body | </body> | end | /html | </html> |
      Google suggests its xmllint that complaining.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://719339]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (16)
As of 2014-08-01 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (257 votes), past polls