Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^6: anchor text match

by kumar801012 (Initiate)
on Dec 30, 2009 at 21:59 UTC ( #814999=note: print w/replies, xml ) Need Help??

in reply to Re^5: anchor text match
in thread anchor text match

I also tried this:
use WWW::Mechanize(); my $mech = WWW::Mechanize->new(); my $html = $mech->get(''); my @links= $mech->find_all_links( text_regex => qr/a/i ); foreach(@links){ if($_->url() eq ''){ print "\n"; print "url \n"; print $_->url(); print "\n"; print " text\n"; print $_->text(); print "\n"; } } _END_

The out put is :


text: Victoria's Secret

In case the page had an anchor tag like below:

a href="" target=_blank><img src= height=11 width=11 border=0 alt="Open this result in new window"> </anchor>

The above perl script would give :


text: Open this result in new window

But the desired result is:


text: IMAGE

Replies are listed 'Best First'.
Re^7: anchor text match
by JadeNB (Chaplain) on Dec 31, 2009 at 00:05 UTC

    I think that you may have expected a ready-made solution, which is why Re^2: anchor text match surprised you. The poster there was not (I think) trying to solve your problem, but rather to indicate to you how you could solve it. (That was the meaning of the “Two clues in one” text.)

    It's not surprising that the code you indicate doesn't do what you want—the for loop makes no effort to check whether the link being processed satisfies any special conditions, and so must treat every link equally.

    To fix this, you must have something of the following shape in your code:

    for my $link ( @links ) { if ( is_special $link ) { do_special_thing $link } else { do_ordinary_thing $link } }
    * where it's up to you to determine how to write is_special and do_special_thing (you've already indicated what you want do_ordinary_thing to be). As an aid, you have the $link object to hand, and so can test its properties in as much detail as necessary.

    * I don't mean literally that your code has to contain these words; just that, without some sort of conditional, you'll never get the special treatment you like.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://814999]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2021-05-09 20:56 GMT
Find Nodes?
    Voting Booth?
    Perl 7 will be out ...

    Results (102 votes). Check out past polls.