Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Proper way to thread this in PERL.

by Preceptor (Chaplain)
on Dec 25, 2013 at 23:34 UTC ( #1068378=note: print w/ replies, xml ) Need Help??


in reply to Proper way to thread this in PERL.

I would strongly suggest avoiding spawning new threads in a 'while' loop. That just causes you grief. Each time you 'create' a thread, it copies your program state, which becomes a particular problem when you're loading lots of modules. Instead, I would advocate a 'worker thread' approach, and use Thread::Queue to 'feed' them.

my $url_q = Thread::Queue -> new(); sub http_fetch_thread { my $ua = LWP::UserAgent -> new(); $ua -> timeout ( 10 ); while ( my $item = $url_q -> dequeue() ) { my $url = "http://www." . $item . ".com"; <... fetch stuff ... > } }

Spawn a defined number of these threads (you look like you're trying to keep to 5?) and then just feed the contents of your file, into '$url_q'.

for ( 1..$thrCount ) { threads -> create ( \&http_fetch_thread ); } open ( my $contents, "<", $contents_file_name ) or die $!; $url_q -> enqueue ( <$contents> ); $url_q -> end; close ( $contents ); #wait for completion foreach my $thr ( threads -> list() ) { $thr -> join(); #not capturing return code, we started in a void con +text. } print "At program completion, total count was ", $totalCount,"\n";

So rather than starting a new thread every line of your file, which is instantiating a new 'useragent' object, you'll create a number equal to the number of threads defined - and then run through the list as fast as they can, and probably won't chew up your memory anything like as badly (and because you're not creating/destroying useragents and threads, you'll probably find it runs a lot faster too).

If you want a running total of 'totalCount' you can either print it from within the thread, or instead of doing the 'foreach/join' loop, do a 'while' loop:

while ( threads -> list ) { foreach my $thread ( threads -> list ( threads::joinable ) ) { $thread -> join(); } print $totalCount,"\n"; sleep 5; }

Edit: More generally I'd suggest:

  • 3 argument 'opens' are nicer, especially when threading. (When is 'CONTENT' in scope?)
  • Detaching a thread can mean your program completing without the thread finishing. That can create anomalous results, so I'd suggest avoiding it generally
  • Parsing '@ARGV' by hand is a good way to introduce bugs. Look at GetOpt::Std for anything more than very trivial cases.


Comment on Re: Proper way to thread this in PERL.
Select or Download Code
Replies are listed 'Best First'.
Re^2: Proper way to thread this in PERL.
by tekio (Novice) on Dec 26, 2013 at 01:46 UTC
    These are excellent replies! Thank you guys so much! :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1068378]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (16)
As of 2015-07-31 09:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (276 votes), past polls