Small HTML parser

by Yohimbe (Pilgrim)
Yohimbe has asked for the wisdom of the Perl Monks concerning the following question:

Given an online commenting system similar to the monastery, and desiring flexibility, with security. I need a small that can take a snippet of html code and allow ONLY a short list of "approved" tags, ie, character formatting and linking only but no tables or blockquotes or javascript.
What I'm talking about starts with something like this:
sub cleanhtml ($) { my $dirty_html=shift; my @allowed_tags=qw(A B BR P I CODE PRE); and ends with: return $safe_html; }

Re: Small HTML parser
by lhoward (Vicar) on Aug 08, 2000 at 07:38 UTC
Re: Small HTML parser
by perlmonkey (Hermit) on Aug 08, 2000 at 06:35 UTC
    There has got to be a better way to do it, but this is a slow hack that will work:
    my $list = join('|', @allowed_tags); $dirty_html =~ s!<\s*/?\s*(\w+).*?>!($a,$b)=($&,$1); $a if $b =~ /^(?: +$list)$/!esg;
RE: Small HTML parser
by DrManhattan (Chaplain) on Aug 08, 2000 at 18:39 UTC

Node Type: perlquestion
