Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

how to extract links from a part of web page ?

by saidinesh (Initiate)
on Jun 18, 2014 at 14:34 UTC ( #1090314=perlquestion: print w/ replies, xml ) Need Help??
saidinesh has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use LWP::simple; my $doc_url = "www.perlmonks.org"; my $document; my $browser; init_browser( ); # Get the page whose links we want to check: my $response = $browser->get($doc_url); #die "Couldn't get $doc_url: ", $response->status_line #unless $response->is_success; $document = $response->content; # $doc_url = $response->base; # In case we need to resolve relative URLs later while ($document =~ m/href\s*=\s*"([^"\s]+)"/gi) { my $absolute_url = absolutize($1, $doc_url); check_url($absolute_url); } sub absolutize { my($url, $base) = @_; use URI; return URI->new_abs($url, $base)->canonical; } sub init_browser { $browser = LWP::UserAgent->new; # ...And any other initialization we might need to do... return $browser; } sub check_url { # A temporary placeholder... print "url's list $_[0]\n"; }

when i run this code it showing all the <href> in the source of the html page, but i need a middle part(say middle module of the page) for that how to modify the code ???

Comment on how to extract links from a part of web page ?
Select or Download Code
Re: how to extract links from a part of web page ?
by marto (Bishop) on Jun 18, 2014 at 14:47 UTC

    perlmonks.org has no "middle module", but you know this isn't right because the page you actually want to scrape is over at juniper.net (did you check their terms of use?). As previously discussed, please read and understand PerlMonks for the Absolute Beginner (and don't ignore the formatting advice displayed when posting How do I post a question effectively?).

    You were given lots of links previously for tools to make this job easy for you, the exampe above isn't based on this advice, rather old content from Perl & LWP.

      For the uninitiated, what's this previous posting you're referring to?

        I make no reference to a previous post, rather two long conversations in the chatterbox (see also Chatterbox FAQ) earlier today were several people discussed the various issues with OP.

        Update: fixed link

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1090314]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2014-12-21 12:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (105 votes), past polls