Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

(dchetlin: HTML::Parser) Re: Big, bad, ugly regex problem

by dchetlin (Friar)
on Sep 28, 2000 at 03:00 UTC ( #34298=note: print w/ replies, xml ) Need Help??


in reply to Big, bad, ugly regex problem

See how this does for you:

#!/usr/bin/perl -w # vim: filetype=perl use strict; use HTML::Parser; use HTML::Entities; my @tags = map {('(?:\A' . $_ . '\z)')} qw(br p font h[1-6] a); my $tag_RE; { local $" = '|'; $tag_RE = qr/@tags/; } my $unsafe = '^\w\s' my $p = HTML::Parser::->new(api_version => 3); $p->handler(start => \&tag_filter, "tagname, text"); $p->handler(end => \&tag_filter, "tagname, text"); $p->handler(default => sub {print encode_entities(shift,$unsafe)}, "text"); sub tag_filter { print $_[1] if ($_[0] =~ $tag_RE); } local $/; $p->parse(<>);

Update: I realized that since Ovid seems to want pretty much any special character escaped, it made much more sense to use the negated character class in $unsafe than to have the line noise of all of those special characters and be worried about missing one. It also avoids the typo of having an unescaped `-' in the original that causes all capital letters to be escaped.

-dlc


Comment on (dchetlin: HTML::Parser) Re: Big, bad, ugly regex problem
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://34298]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-12-28 07:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (179 votes), past polls