| [reply] |
The normal answer would be use HTML::TokeParser if you want
a reliable solution to parse HTML. For this *particular* task a
well constructed regex should suffice. This will strip all the <b>
</b> tags that are empty. I believe it covers all bases.
$text = join '', <DATA>;
print $text;
$text =~ s#<\s*b\s*>(?:[\s\n]| )*<\s*/\s*b\s*>##ig;
print $text;
__DATA__
<p>test
<p>test<b></b>
<p>test<b
>
</b>
<p>test<b> </b>
<p>test<b
></b >
<p>test<B></B>
<p>test<B>
</B>
<p>test<B ></B>
<p>test<B
> </B >
<p>test<B> </B>
<p><b>I am not empty!</b>
| [reply] [d/l] |
| [reply] |
I'd also go with tachyon's suggestion of HTML Tidy, but if you are trying to do this quick and dirty somewhere in the middle of a script, I'd use this regex
$text =~ s#<\s*([^>]*)\s*>[\s\n]*<\s*/\s*\1\s*>##ig;
It should remove any empty tags which don't contain any attributes (not just bold tags), so it works on
__DATA__
<i> </ I>
< B ></b>
< em> < / eM >
| [reply] [d/l] [select] |
| [reply] |