#! perl -slw
use strict;
use threads;
use threads::Q;
use threads::shared;
use LWP::Simple;
sub outputter {
my( $fname, $href, $n ) = @_;
open my $O, '>:utf8', $fname or die $!;
for my $id ( 1 .. $n ) {
sleep 1 until exists $href->{ $id };
lock %$href;
print $O "$id\t::", delete $href->{ $id };
}
close $O;
}
sub getter {
my $tid = threads->tid;
my( $Q, $href ) = @_;
while( $_ = $Q->dq ) {
my( $id, $mac ) = split $;, $_;
my $content = get( "http://$mac/" );
lock %$href;
$href->{ $id } = $content // "Nothing from $id:$mac\n";
}
}
our $T //= 8;
my $iFile = $ARGV[0] or die "No input filename";
my $machines = (split ' ', `wc -l $iFile` )[0];
my %res :shared;
my $Q = threads::Q->new( 128 );
my $outputter = threads->create(
\&outputter, '1021943.log', \%res, $machines
) or die $!;
threads->create( \&getter, $Q, \%res )->detach for 1 .. $T;
open I, '<', $iFile or die $!;
my $n = 0;
chomp(), $Q->nq( join $;, ++$n, $_ ) while <I>;
close I;
$Q->nq( undef x $T );
$outputter->join;
The command to run it is:1011943 -T=16 url.fil. The output will be in a file called:1021943.log in the current directory. (For simplicity, I've assumed utf8 for the content, you'll need to check headers and stuff.)
The basic mechanism is to use a single outputter thread and shared hash to coordinate the output.
The multiple getter threads read urls prefix with an id (input file sequence number) from a size-limiting queue (you can download it from Re^5: dynamic number of threads based on CPU utilization) and get the content. When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.
The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.
Once the main thread has started the outputter and getter threads, it reads the input file and feeds the urls to the queue. The self limit queue prevent memory runaway. Once the entire list has been fed to the, it queues one undef per thread to terminate the getter threads and then waits for (joins) the outputter thread before terminating.
I've also printed a crude header before each lot of content to verify the ordering.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|