Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

(dchetlin: HTML::Parser) Re: Big, bad, ugly regex problem

by dchetlin (Friar)
on Sep 28, 2000 at 03:00 UTC ( #34298=note: print w/replies, xml ) Need Help??

in reply to Big, bad, ugly regex problem

See how this does for you:

#!/usr/bin/perl -w # vim: filetype=perl use strict; use HTML::Parser; use HTML::Entities; my @tags = map {('(?:\A' . $_ . '\z)')} qw(br p font h[1-6] a); my $tag_RE; { local $" = '|'; $tag_RE = qr/@tags/; } my $unsafe = '^\w\s' my $p = HTML::Parser::->new(api_version => 3); $p->handler(start => \&tag_filter, "tagname, text"); $p->handler(end => \&tag_filter, "tagname, text"); $p->handler(default => sub {print encode_entities(shift,$unsafe)}, "text"); sub tag_filter { print $_[1] if ($_[0] =~ $tag_RE); } local $/; $p->parse(<>);

Update: I realized that since Ovid seems to want pretty much any special character escaped, it made much more sense to use the negated character class in $unsafe than to have the line noise of all of those special characters and be worried about missing one. It also avoids the typo of having an unescaped `-' in the original that causes all capital letters to be escaped.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://34298]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2020-09-25 04:13 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (136 votes). Check out past polls.