http://www.perlmonks.org?node_id=1090314

saidinesh has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use LWP::simple; my $doc_url = "www.perlmonks.org"; my $document; my $browser; init_browser( ); # Get the page whose links we want to check: my $response = $browser->get($doc_url); #die "Couldn't get $doc_url: ", $response->status_line #unless $response->is_success; $document = $response->content; # $doc_url = $response->base; # In case we need to resolve relative URLs later while ($document =~ m/href\s*=\s*"([^"\s]+)"/gi) { my $absolute_url = absolutize($1, $doc_url); check_url($absolute_url); } sub absolutize { my($url, $base) = @_; use URI; return URI->new_abs($url, $base)->canonical; } sub init_browser { $browser = LWP::UserAgent->new; # ...And any other initialization we might need to do... return $browser; } sub check_url { # A temporary placeholder... print "url's list $_[0]\n"; }

when i run this code it showing all the <href> in the source of the html page, but i need a middle part(say middle module of the page) for that how to modify the code ???

Replies are listed 'Best First'.
Re: how to extract links from a part of web page ?
by marto (Cardinal) on Jun 18, 2014 at 14:47 UTC

    perlmonks.org has no "middle module", but you know this isn't right because the page you actually want to scrape is over at juniper.net (did you check their terms of use?). As previously discussed, please read and understand PerlMonks for the Absolute Beginner (and don't ignore the formatting advice displayed when posting How do I post a question effectively?).

    You were given lots of links previously for tools to make this job easy for you, the exampe above isn't based on this advice, rather old content from Perl & LWP.

      For the uninitiated, what's this previous posting you're referring to?

        I make no reference to a previous post, rather two long conversations in the chatterbox (see also Chatterbox FAQ) earlier today were several people discussed the various issues with OP.

        Update: fixed link