Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Printing From Several Webpages

by MiriamH (Novice)
on Jul 06, 2012 at 18:18 UTC ( #980347=perlquestion: print w/ replies, xml ) Need Help??
MiriamH has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of 500 websites that I acquired using WWW::Mechanize. I have a 'cleaning' code that can take any individual website and remove all the coding to just leave me with the website data. I need a way to make my script automate 'cleaning' the web pages and then parsing the data. This is what I have so far, but it doesn't work.

#Download all the modules I used# use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; use WWW::Mechanize; #Download original webpage and acquire 500+ Links# $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); my $title = $mechanize->title; print "<b>$title</b><br />"; my @links = $mechanize->links; ## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo +ps. I thought if I put the "$link" variable as the "get ()" each tim +e it would go through the loop it would "get" a different webpage. Ho +wever it does not work even though no error shows## foreach my $link (@links) { # Retrieve the link URL my $href = $link->url; $URL1= get("$link"); $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL1); $Parsed=$Format->format($TreeBuilder); open(FILE, ">TorontoParties.txt"); print FILE "$Parsed"; close (FILE); }

Comment on Printing From Several Webpages
Download Code
Re: Printing From Several Webpages
by toolic (Chancellor) on Jul 06, 2012 at 18:26 UTC
    open(FILE, ">TorontoParties.txt"); print FILE "$Parsed";
    Every time through the foreach loop, you open and write to the same file. Perhaps you want to create a file of a different name each time thru the loop. Or maybe you want to append to the same file every time (open).
Re: Printing From Several Webpages
by Corion (Pope) on Jul 06, 2012 at 18:27 UTC

    See the replies you got to your problem in Building a Spidering Application.

    Maybe you want to reduce your problem by eliminating WWW::Mechanize from the picture?

Re: Printing From Several Webpages
by ig (Vicar) on Jul 06, 2012 at 20:06 UTC

    Your description of your problem is a bit vague. You say you don't know how to use foreach loops, but I don't see anything wrong with how you used foreach in what you posted.

    However it does not work even though no error shows

    No error shows because you are not checking for and reporting errors. For example, the synopsis of LWP::Simple has this example:

    use LWP::Simple; $content = get("http://www.sn.no/"); die "Couldn't get it!" unless defined $content;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://980347]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (15)
As of 2014-09-01 08:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (299 votes), past polls