Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

How to allow loop to continue to run after a problem opening a file

by rizzy (Sexton)
on Oct 20, 2010 at 03:02 UTC ( #866232=perlquestion: print w/ replies, xml ) Need Help??
rizzy has asked for the wisdom of the Perl Monks concerning the following question:

This is probably a very simple thing to do, but I can't seem to find an answer. Here's the problem: I am downloading and parsing hundreds of thousands of html files (from a list)and for whatever reason, every once in a while, the perl script is not able to access one of the file (even though It is there and it usually "sees" it). WHen this happens, the code stops running.

What I would like to do is record the filename that couldn't load and continue on looping through the rest of the list. That way, I don't have to babysit the thing and can come back and try with those that didn't work later. Here's the basic structure of my code:

#!/usr/bin/perl -w use strict; use LWP::Simple; open ("output","> /output/results.txt") || die ("Could not open output + file $!"); open ("input", "< /input/urllist.txt") || die ("Could not open input f +ile $!"); $/=undef; my $urllist=<input>; while($urllist =~ m{(http://.+\.html)}g){ my $url=$1; my $html=''; $html = get("$url") or print "Couldn't fetch $url."; while($html=~ m{(find whatever I want)}gi){ $mysearch=$1; print output "$url|$mysearch\n";} } } close ("output"); close ("input");

Basically, I have a file stored locally that has a bunch of urls. I open this and for every url, I try to access it (using the get command) and then search for various things and save the results. So, it calls the "get" command for hundreds of thousands of urls. Just because of the nature of the web, some of these will not work when it tries, even though they are there. When it calls "get" and fails to find the file, how do I tell it to either keep going (by maybe replacing $html with a whitespace or something) or to move on to the next matched $url from the urllist? Thanks in advance.

Comment on How to allow loop to continue to run after a problem opening a file
Download Code
Re: How to allow loop to continue to run after a problem opening a file
by halfcountplus (Hermit) on Oct 20, 2010 at 03:08 UTC

    Hmmm -- I think you mean you want to skip the code in the loop after the get fails and "get" the next file instead? I suppose you could use an inclusive "if", but the simplest way is probably with "next":

    while($urllist =~ m{(http://.+\.html)}g){ my $url=$1; my $html = get("$url"); unless ($html) { print "Couldn't fetch $url."; next; } while($html=~ m{(find whatever I want)}gi){ $mysearch=$1; print output "$url|$mysearch\n";} } }

    "next" just means to move on to the next iteration of the loop, skipping any subsequent code.

      Yes, I worded it wrong in the title. I am pulling out every url from a list and want to continue with the next one even if the current one is not available for download . I was unaware of the unless and next command. Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://866232]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-11-21 00:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (103 votes), past polls