Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Try this:

#! perl -slw use strict; use threads; use threads::Q; use threads::shared; use LWP::Simple; sub outputter { my( $fname, $href, $n ) = @_; open my $O, '>:utf8', $fname or die $!; for my $id ( 1 .. $n ) { sleep 1 until exists $href->{ $id }; lock %$href; print $O "$id\t::", delete $href->{ $id }; } close $O; } sub getter { my $tid = threads->tid; my( $Q, $href ) = @_; while( $_ = $Q->dq ) { my( $id, $mac ) = split $;, $_; my $content = get( "http://$mac/" ); lock %$href; $href->{ $id } = $content // "Nothing from $id:$mac\n"; } } our $T //= 8; my $iFile = $ARGV[0] or die "No input filename"; my $machines = (split ' ', `wc -l $iFile` )[0]; my %res :shared; my $Q = threads::Q->new( 128 ); my $outputter = threads->create( \&outputter, '1021943.log', \%res, $machines ) or die $!; threads->create( \&getter, $Q, \%res )->detach for 1 .. $T; open I, '<', $iFile or die $!; my $n = 0; chomp(), $Q->nq( join $;, ++$n, $_ ) while <I>; close I; $Q->nq( undef x $T ); $outputter->join;

The command to run it is:1011943 -T=16 url.fil. The output will be in a file called:1021943.log in the current directory. (For simplicity, I've assumed utf8 for the content, you'll need to check headers and stuff.)

The basic mechanism is to use a single outputter thread and shared hash to coordinate the output.

The multiple getter threads read urls prefix with an id (input file sequence number) from a size-limiting queue (you can download it from Re^5: dynamic number of threads based on CPU utilization) and get the content. When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.

The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.

Once the main thread has started the outputter and getter threads, it reads the input file and feeds the urls to the queue. The self limit queue prevent memory runaway. Once the entire list has been fed to the, it queues one undef per thread to terminate the getter threads and then waits for (joins) the outputter thread before terminating.

I've also printed a crude header before each lot of content to verify the ordering.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Program Design Around Threads by BrowserUk
in thread Program Design Around Threads by aeaton1843

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (11)
    As of 2014-12-18 21:45 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (66 votes), past polls