http://www.perlmonks.org?node_id=305173

svsingh has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to pull some statistics off a Yahoo page using LWP::Simple's get method. I noticed that my script would sometimes seem to hang inconsistently, so I stripped down the program to see if I could identify the problem.

I think get is hanging somewhere, but I don't understand why. I read a few other posts here, but wasn't able to find any advice on how to work around the problem. I don't think it's a timeout issue, because the program isn't running the "or die..." code.

Here's the stripped down script:

use strict; use LWP::Simple; my $leagueURL = 'http://sports.yahoo.com/nhl/stats/byposition?pos=C,RW +,LW,D'; my $data = get $leagueURL || die "Couldn't get $leagueURL"; while ( $data =~ m|<a href="/nhl/players/(\d+)">|gs ) { my $id = $1; my @stats = ($id); print STDERR "Getting player $id\n"; push( @stats, &getCareer($id) ); push( @stats, &getSplits($id) ); print STDOUT "@stats\n"; sleep(15); } sub getCareer { my ($id) = @_; my $url = "http://sports.yahoo.com/nhl/players/$id/career"; my $data = get $url || die "Couldn't get $url"; return ("gotCareer"); } sub getSplits { my ($id) = @_; my @s = (); my $url = "http://sports.yahoo.com/nhl/players/$id/splits?year=care +er"; my $data = get $url || die "Couldn't get $url"; return ("gotSplits"); }

I've run the stripped code four times now. The first time, it went through the while loop 7 times. Then 4, 2, and 6. In the current run, it's been working on the same iteration for the past twenty minutes.

I would expect if the request was timing out, that I'd be seeing the "Couldn't get $url." message, but that's no happening. Am I doing something wrong? Thank you.

Update: Changed $leagueURL to refer to real page instead of local file.

Replies are listed 'Best First'.
Re: LWP::Simple Seems to Hang on get
by Abigail-II (Bishop) on Nov 06, 2003 at 22:36 UTC
    You mean, it ran through the loop 7, 4, 2 and 6 times, and then completed the program, without the content of file:///c:/temp/2003.htm? The amount of time the while loop gets executed only depends on the content of file:///c:/temp/2003.htm, not of what yahoo is returning.

    Abigail

      Sorry, I was unclear. The program has never finished execution and only stops when I break execution. The main page is local, and from that I extract a list of IDs. The while loop takes each ID and then calls each subroutine with the ID. Each subroutine loads a separate page from Yahoo with the ID in the URL.

      I updated the code to use the real Yahoo index page instead of my local copy. The local file never changed while I was running the script and contains over 500 rows.

      I hope this clarifies things.

        Perhaps you are impatient? The page contains 680 players, and you sleep for 15 seconds for each player. So, even if the fetching and scanning takes no time at all, it's still going to take 2 hours and 50 minutes for your program to finish. I took the sleep out, and your program seems to be running fine.

        Abigail