Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

perl threads exiting abnormally

by srchulo (Sexton)
on Nov 18, 2012 at 09:22 UTC ( #1004387=perlquestion: print w/ replies, xml ) Need Help??
srchulo has asked for the wisdom of the Perl Monks concerning the following question:

I'm using perl's threads module with a simple crawler I'm working on so I can download pages in parallel. Ocasionally, I get error messages like these:

Thread 7 terminated abnormally: read timeout at /usr/lib64/perl5/threa +ds.pm line 101. Thread 15 terminated abnormally: Can't connect to burgundywinecomp +any.com:80 (connect: timeout) at /usr/lib64/perl5/threads.pm line 101 +. Thread 19 terminated abnormally: write failed: Connection reset by + peer at /usr/lib64/perl5/threads.pm line 101.

When I run the script linearly without threads (or if I use fork() instead of threading), I do not encounter these errors. And these errors almost seem like they are from the LWP::UserAgent module that I am using, but they do not seem like they should be causing the threads to exit abnormally; they seem like errors that would be help in my HTTP::Response object. Is there some extra precaution I have to take while using perl's threads, or something I'm missing? Thanks!

UPDATE

I have tracked down the source of these abnormal terminations, and it does seem to be whenever I make a request using LWP::UserAgent. If I remove the method call to download the webpage, then the errors stop.

Here is a test script that produces the error:

#!/usr/bin/perl use threads; use Thread::Queue; use LWP::UserAgent; my $THREADS=10; # Number of threads #(if you care about them) my $workq = Thread::Queue->new(); # Work to do my @stufftodo = qw(http://www.collectorsarmoury.com/ http://burgundywi +necompany.com/ http://beetreeminiatures.com/); $workq->enqueue(@stufftodo); # Queue up some work to do $workq->enqueue("EXIT") for(1..$THREADS); # And tell them when threads->create("Handle_Work") for(1..$THREADS); # Spawn our workers $_->join for threads->list; sub Handle_Work { while(my $todo=$workq->dequeue()) { last if $todo eq 'EXIT'; # All done print "$todo\n"; my $ua = LWP::UserAgent->new; my $RESP = $ua->get($todo); } threads->exit(0); }

Comment on perl threads exiting abnormally
Select or Download Code
Re: perl threads exiting abnormally
by zentara (Archbishop) on Nov 18, 2012 at 11:06 UTC
    It's possible that LWP::UserAgent is not threadsafe, its a complicated module, you can google for it. So why not use forks if they work? The only reason to use threads over fork is if you want to share realtime data between threads with shared variables. Sorry if that is not a satifactory answer, but Perl threads can be very tricky to use.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: perl threads exiting abnormally
by Anonymous Monk on Nov 18, 2012 at 14:40 UTC

      thanks for the tip. In the future I will do this.

Re: perl threads exiting abnormally
by Khen1950fx (Canon) on Nov 18, 2012 at 17:06 UTC
    I'd recommend that you start out working with one thread, then two, three, etc. until you get a feel for threads. I couldn't replicate your error, but then I didn't use threads the same as you. Here's a simplified script to check the number of threads enqueued:
    #!/usr/bin/perl -l BEGIN { $| = 1; $^W = 1; } use strict; use warnings; use threads; use Thread::Queue; use LWP::UserAgent; use constant THREADS => 3; my(@urls) = ( 'http://search.cpan.org', 'http://www.perl.org', 'http://www.cpan.org', ); my $workq = Thread::Queue->new; my $thr = threads->create( sub { while(defined(my $url = $workq->dequeue)) { do { my $ua = LWP::UserAgent->new; foreach my $url (@urls) { my $response = $ua->get($url); if ($response->is_success) { print $response->decoded_content; } else { die $response->status_line; } } }; } }); $workq->enqueue($urls[0], $urls[1], $urls[2]); my $num_working = $workq->pending(); print $num_working; $workq->end(); $thr->detach(); undef $thr;
      • Why did you put a 'do' block inside a while loop? The while loop is going to do the block inside it anyway.

      • $workq->end();
        I couldn't replicate your error.

        There is no end() method listed in the POD for Thread::Queue. Did you test this code? I got a 'Can't locate object method "end"...' warning for this line.

      • $workq->enqueue($urls[0], $urls[1], $urls[2]);

        Why not this? $workq->enqueue(@urls);

      • while(defined(my $url = $workq->dequeue)) { do { my $ua = LWP::UserAgent->new; foreach my $url (@urls) {
        You get $url from $workq then override it with each of the urls from @urls? Wouldn't this make each thread access every url?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004387]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-11-01 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (227 votes), past polls