Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

(need feedback) Re: HTML::LinkExtractor

by PodMaster (Abbot)
on Aug 24, 2002 at 14:56 UTC ( #192544=note: print w/replies, xml ) Need Help??

in reply to HTML::LinkExtractor

I thought about adding the following
=head2 SNIPPET You've just gotten a link with C<_TEXT> but you don't want the HTML crap that comes with the text. While C<HTML::LinkExtractor> won't get rid of it for you, it's easier than easy with C<HTML::TokeParser::Simp +le> use HTML::TokeParser::Simple; my $Link = { '_TEXT' => '<a href=""> I am a LINK!! +! </a>'}; warn StripHTML( \$Link->{_TEXT} ); warn StripHTML( \'<q>Turn on your love light BABY!</q>' ); sub StripHTML { my $HtmlRef = shift; my $tp = new HTML::TokeParser::Simple( $HtmlRef ); my $t = $tp->get_token(); # MUST BE A START TAG (@TAGS_IN_NEED +) # otherwise it ain't come from LinkE +xtractor if($t->is_start_tag) { return $tp->get_trimmed_text( '/'.$t->return_tag ); } else { die " IMPOSSIBLE!!!! "; } } =head1 AUTHOR
But then it hit me, why not just provide this as a package method?

Or provide an option to do this automatically?

Use get_text instead of get_trimmed_text (maybe make this an option as well)?

BTW ~ I'm gonna stick with HTML::TokeParser::Simple. Ovid doesn't need the publicy, but I like it. This'll be on CPAN before monday.

update: well, I made some changes and put it up on CPAN

** The Third rule of perl club is a statement of fact: pod is sexy.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://192544]
[Your Mother]: $dir->chidren, grep {} $file->slurp &c &c. :P
[Your Mother]: Stick with what you know. Or post a SOPW and if it's not complicated I guarantee someone will answer; me if it's straightforward and no one else gets it better first.
[1nickt]: ++YourMother (Grammar Vigilante of Bristol)
[Lady_Aleena]: Um, how does grep do multiline? print "$file:$_" if grep { /get_(array|hash| data)\(.+\)/ } $_; only returns the same 18 lines the command line grep does.
[Lady_Aleena]: my scratchpad has what I have so far.
[choroba]: what do you mean by multiline?
[Discipulus]: thanks 1nickt!
[choroba]: grep is a filter, it selects elements from a list based on a boolean condition
[moritz]: you need to read the file in one go to do a multiline regex match
[erix]: ha ha, I like the Vigilante

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2017-05-23 19:45 GMT
Find Nodes?
    Voting Booth?