http://www.perlmonks.org?node_id=980347

MiriamH has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of 500 websites that I acquired using WWW::Mechanize. I have a 'cleaning' code that can take any individual website and remove all the coding to just leave me with the website data. I need a way to make my script automate 'cleaning' the web pages and then parsing the data. This is what I have so far, but it doesn't work.

#Download all the modules I used# use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; use WWW::Mechanize; #Download original webpage and acquire 500+ Links# $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); my $title = $mechanize->title; print "<b>$title</b><br />"; my @links = $mechanize->links; ## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo +ps. I thought if I put the "$link" variable as the "get ()" each tim +e it would go through the loop it would "get" a different webpage. Ho +wever it does not work even though no error shows## foreach my $link (@links) { # Retrieve the link URL my $href = $link->url; $URL1= get("$link"); $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL1); $Parsed=$Format->format($TreeBuilder); open(FILE, ">TorontoParties.txt"); print FILE "$Parsed"; close (FILE); }