looks like I'm on the right path, they don't have a way to manipulate plain text in an HTML file -- while still preserving the HTML structure...
I don't know what you've been doing, but you most certainly can.
There is an example at
(crazyinsomniac) Re: Is this the best way to use HTML::TreeBuilder to bold text in an HTML document?.
Also, a regex is not completely out of the question, something like *code goes here, working on it*
use strict;
use warnings;
my $name = 'PodMaster';
my $url = 'http://perlmonks.org/?node=PodMaster';
my $html = q~
<html>
<title> PodMaster </title>
<style>
PodMaster { }
</style>
<body>
<h1>PodMaster
</h1>
Hi there PodMaster blah blah blah <b>Pod</b><i>Master</i>
</body>
</html>
~;
print $/, untag_MOD( $html, $name, $url ), $/;
#http://perlmonks.org/?node_id=161281 modified for our purposes
sub untag_MOD {
local $_ = $_[0] || $_;
# ALGORITHM:
# find < ,
# comment <!-- ... -->,
# or comment <? ... ?> ,
# or one of the start tags which require correspond
# end tag plus all to end tag
# or if \s or ="
# then skip to next "
# else [^>]
# >
# 1 is the entire "tag", add +1 to all numbers in comments
s{
( # podmaster
< # open tag
(?: # open group (A)
(!--) | # comment (1) or
(\?) | # another comment (2) or
(?i: # open group (B) for /i
( TITLE | # one of start tags
SCRIPT | # for which
APPLET | # must be skipped
OBJECT | # all content
STYLE # to correspond
) # end tag (3)
) | # close group (B), or
([!/A-Za-z]) # one of these chars, remember in (4)
) # close group (A)
(?(5) # if previous case is (4)
(?: # open group (C)
(?! # and next is not : (D)
[\s=] # \s or "="
["`'] # with open quotes
) # close (D)
[^>] | # and not close tag or
[\s=] # \s or "=" with
`[^`]*` | # something in quotes ` or
[\s=] # \s or "=" with
'[^']*' | # something in quotes ' or
[\s=] # \s or "=" with
"[^"]*" # something in quotes "
)* # repeat (C) 0 or more times
| # else (if previous case is not (4))
.*? # minimum of any chars
) # end if previous char is (4)
(?(2) # if comment (1)
(?<=--) # wait for "--"
) # end if comment (1)
(?(3) # if another comment (2)
(?<=\?) # wait for "?"
) # end if another comment (2)
(?(4) # if one of tags-containers (3)
</ # wait for end
(?i:\4) # of this tag
(?:\s[^>]*)? # skip junk to ">"
) # end if (3)
> # tag closed
)
([^<]*) # 6, text
}
'
my $ret = $1;
if( $6 ){
my $text = $6;
$text =~ s~\b(\Q$_[1]\E)\b~<a href="$_[2]">$1</a>~g; # add
+ link
$ret .= $text;
}
$ret;
'gsxe;
return $_ ? $_ : "";
}
__END__
Note the caveats in
strip HTML tags.
Another potential (i wouldn't consider it one) caveat is that both of these don't translate
<b>Pod</b><i>Master</i> into a link.
If you want to do that you should use
HTML::TreeBuilder.
MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" |
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). |
** The third rule of perl club is a statement of fact: pod is sexy. |