http://www.perlmonks.org?node_id=34294


in reply to Big, bad, ugly regex problem

I admit I don't quite get the scope of the whole problem. But as for only allowing certain tags, you could start with:
#!/usr/local/bin/perl -l -w my $str="<bad tag><a good tag> hello there<br></bad tag></a>"; my @good_tags = qw(p a font br h1 h2 h3 h4 h5 h6); my %good_tags; @good_tags{@good_tags} = (); $str =~ s!(</?(\w*).*?>)!exists $good_tags{lc($2)} ? $1 : ''!eg; print $str;
You can replace the '$1' by some function to replace characters as you see fit, or capture the '.*?' to $3 and pass it along with $2 to a function to verify whether or not you allow extra attributes with that particular tag. Either way I'd do it in more than one step.