in reply to Dynamically cleaning up HTML fragments
Glad to see that you have noticed HTML::StripScripts::Parser. I'm the maintainer, but not the guy who did the great work of writing it originally.
It fulfils all of your listed requirements, and is certainly seeing active usage on our production sites.
This code should do what you need (untested):my $s = HTML::Stripscripts::Parser->new({ Context => 'Flow', # Only allow these tags BanAllBut => [qw(p a img h3 div em)], # Allow src and href AllowSrc => 1, AllowHref => 1, Rules => { # remove empty p tags p => sub { return length $_[1]->{content} }, # a must have a local href a => { href => \&strip_abs_uri, tag => sub { return 0 unless $_[1]->{href} }, }, # img must have a local src img => { src => \&strip_abs_uri, tag => sub { return 0 unless $_[1]->{src} }, }, # Allow id and class for all tags '*' => { id => 1, class => 1, } }, }); sub strip_abs_uri { my ( $filter, $tag, $attr_name, $attr_val ) = @_; return 1 unless $attr_name =~/href|src/ return $attr_val=~m{://}; } print $s->filter_html($html);
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Dynamically cleaning up HTML fragments
by SilasTheMonk (Chaplain) on Sep 25, 2010 at 20:57 UTC | |
by clinton (Priest) on Sep 25, 2010 at 21:08 UTC |
In Section
Seekers of Perl Wisdom