Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Web Scrapping" Using a List of Web Pages

by nic12385 (Initiate)
on Jun 26, 2014 at 20:38 UTC ( [id://1091399]=perlquestion: print w/replies, xml ) Need Help??

nic12385 has asked for the wisdom of the Perl Monks concerning the following question:

Just learning Perl. I am trying to create a script with LWP module that will grab the html/text from several web pages. Which I can get the info from one web page when I type it in using this;
use warnings; use LWP::UserAgent; my $UserAgent = new LWP::UserAgent; my $Request = new HTTP::Request ('get', 'www.website.com'); my $Response = $UserAgent->request ($Request); open (FILE, ">/strawberry/perl/file.txt"); print FILE $Response->{_content}; close (FILE);
Now I have also got the code that can take a file and treat it as an array and the entries in the file are the scalars. Maybe my description is off, but this script works.
use warnings; use strict; my $file = "/strawberry/perl/website.txt"; open (FH, "< $file") or die "Can't open $file for read: $!"; my @lines = <FH>; print @lines; close FH or die "Cannot close $file: $!";
Now I want to combine the two and make the 'get' to loop through every single entry on my website.txt file in the second code and record the information. Something like this;
my $Request= new HTTP::Request ('get','@lines')`or my $URL=get('@lines')
I have seen a lot of modules (and tutorials) that will will allow people to get info from web sites, but haven't seen anything on accessing multiple websites. I don't care what module I use really, just want to be able to access a large list of web sites that are in a text file. Any thought? Thanks in advance.

Replies are listed 'Best First'.
Re: "Web Scrapping" Using a List of Web Pages
by Corion (Patriarch) on Jun 26, 2014 at 20:59 UTC

    You will want to learn about "loops" and other control structures.

    The basic idea is that for each URL in your list, you execute a block of code which fetches the page.

    Also, you will likely want to learn about strings and interpolating. '@lines' is just a string containing the at-sign and the word "lines". @lines (without the quotes) is the list of the elements contained in the array with the name lines, which is highly more likely to be what you want.

Re: Web Scrapping" Using a List of Web Pages
by Discipulus (Canon) on Jun 27, 2014 at 07:33 UTC
    Hello nic12385 and welcome to wonderful world of neverending learning Perl.

    As pointed by a monk wiser than us, you need some basic understanding of control structures.
    Even if the book is aged, i ever suggest the cookbook approach: Perl Cookbook

    To go deeper and modern get your copy of precious Modern Perl by chromatic.

    About web site inspection take a look at other's Perl programs: i wrote one, but i'm still learning:WebTimeLoad 0.23

    HtH
    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1091399]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-19 19:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found