in reply to Program Design Around Threads
Try this:
#! perl -slw use strict; use threads; use threads::Q; use threads::shared; use LWP::Simple; sub outputter { my( $fname, $href, $n ) = @_; open my $O, '>:utf8', $fname or die $!; for my $id ( 1 .. $n ) { sleep 1 until exists $href->{ $id }; lock %$href; print $O "$id\t::", delete $href->{ $id }; } close $O; } sub getter { my $tid = threads->tid; my( $Q, $href ) = @_; while( $_ = $Q->dq ) { my( $id, $mac ) = split $;, $_; my $content = get( "http://$mac/" ); lock %$href; $href->{ $id } = $content // "Nothing from $id:$mac\n"; } } our $T //= 8; my $iFile = $ARGV[0] or die "No input filename"; my $machines = (split ' ', `wc -l $iFile` )[0]; my %res :shared; my $Q = threads::Q->new( 128 ); my $outputter = threads->create( \&outputter, '1021943.log', \%res, $machines ) or die $!; threads->create( \&getter, $Q, \%res )->detach for 1 .. $T; open I, '<', $iFile or die $!; my $n = 0; chomp(), $Q->nq( join $;, ++$n, $_ ) while <I>; close I; $Q->nq( undef x $T ); $outputter->join;
The command to run it is:1011943 -T=16 url.fil. The output will be in a file called:1021943.log in the current directory. (For simplicity, I've assumed utf8 for the content, you'll need to check headers and stuff.)
The basic mechanism is to use a single outputter thread and shared hash to coordinate the output.
The multiple getter threads read urls prefix with an id (input file sequence number) from a size-limiting queue (you can download it from Re^5: dynamic number of threads based on CPU utilization) and get the content. When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.
The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.
Once the main thread has started the outputter and getter threads, it reads the input file and feeds the urls to the queue. The self limit queue prevent memory runaway. Once the entire list has been fed to the, it queues one undef per thread to terminate the getter threads and then waits for (joins) the outputter thread before terminating.
I've also printed a crude header before each lot of content to verify the ordering.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Program Design Around Threads
by choroba (Cardinal) on Mar 06, 2013 at 11:59 UTC | |
by BrowserUk (Patriarch) on Mar 06, 2013 at 16:28 UTC | |
by choroba (Cardinal) on Mar 06, 2013 at 23:19 UTC | |
by BrowserUk (Patriarch) on Mar 06, 2013 at 23:26 UTC | |
by choroba (Cardinal) on Mar 06, 2013 at 23:29 UTC | |
| |
Re^2: Program Design Around Threads
by 7stud (Deacon) on Mar 06, 2013 at 06:38 UTC | |
by BrowserUk (Patriarch) on Mar 06, 2013 at 07:38 UTC | |
Re^2: Program Design Around Threads
by aeaton1843 (Acolyte) on Mar 06, 2013 at 18:31 UTC |