Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

(dchetlin: HTML::Parser) Re: Big, bad, ugly regex problem

by dchetlin (Friar)
on Sep 28, 2000 at 03:00 UTC ( #34298=note: print w/ replies, xml ) Need Help??


in reply to Big, bad, ugly regex problem

See how this does for you:

#!/usr/bin/perl -w # vim: filetype=perl use strict; use HTML::Parser; use HTML::Entities; my @tags = map {('(?:\A' . $_ . '\z)')} qw(br p font h[1-6] a); my $tag_RE; { local $" = '|'; $tag_RE = qr/@tags/; } my $unsafe = '^\w\s' my $p = HTML::Parser::->new(api_version => 3); $p->handler(start => \&tag_filter, "tagname, text"); $p->handler(end => \&tag_filter, "tagname, text"); $p->handler(default => sub {print encode_entities(shift,$unsafe)}, "text"); sub tag_filter { print $_[1] if ($_[0] =~ $tag_RE); } local $/; $p->parse(<>);

Update: I realized that since Ovid seems to want pretty much any special character escaped, it made much more sense to use the negated character class in $unsafe than to have the line noise of all of those special characters and be worried about missing one. It also avoids the typo of having an unescaped `-' in the original that causes all capital letters to be escaped.

-dlc

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://34298]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2016-06-25 18:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred method of making French fries (chips) is in a ...











    Results (326 votes). Check out past polls.