in reply to Fixing Bad HTML
Use stack. When you see an open tag, push it on to the stack, see a close tag, compare it with the last element in the stack, match than pop it out, otherwise deal with the error. If the tag is self-closed, either don't push it, or push then pop, depends on the way you treat the content.
Re: Re: Fixing Bad HTML
by Cody Pendant (Prior) on Nov 17, 2002 at 01:25 UTC
|
Thanks for that. That's a structure at least. But what if the thing to be closed isn't the last item in the stack, like if someone's crossed over tags:
blah blah <B>blah blah<I> blah blah</B></I>
which is bad HTML, but not problematic in this context?
--
($_='jjjuuusssttt annootthheer
pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
| [reply] [d/l] [select] |
|
__SIG__
use B;
printf "You are here %08x\n", unpack "L!", unpack "P4", pack
"L!", B::svref_2object(sub{})->OUTSIDE;
| [reply] [d/l] |
|
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
|
Though I'd be inclined to disallow sloppy markup like this (as others have suggested), one option I've used in the past is to backtrack up the stack looking for a matching tag and autoclosing any open tags I pass along the way.
In this case that would proceed something like this. You get to the </B> and look at the tag at the top of the stack. It's not a <B>, it's an <I>, so you generate a </I> yourself and pop that off the stack, then try again. This time it is a <B> so you can just pop it off the top and you move on.
The next closing tag is </I>. Since there's no matching open tag on the stack, you simply remove it.
$perlmonks{seattlejohn} = 'John Clyman';
| [reply] [d/l] [select] |
|