Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Program Design Around Threads

by BrowserUk (Patriarch)
on Mar 06, 2013 at 06:18 UTC ( [id://1021957]=note: print w/replies, xml ) Need Help??


in reply to Program Design Around Threads

Try this:

#! perl -slw use strict; use threads; use threads::Q; use threads::shared; use LWP::Simple; sub outputter { my( $fname, $href, $n ) = @_; open my $O, '>:utf8', $fname or die $!; for my $id ( 1 .. $n ) { sleep 1 until exists $href->{ $id }; lock %$href; print $O "$id\t::", delete $href->{ $id }; } close $O; } sub getter { my $tid = threads->tid; my( $Q, $href ) = @_; while( $_ = $Q->dq ) { my( $id, $mac ) = split $;, $_; my $content = get( "http://$mac/" ); lock %$href; $href->{ $id } = $content // "Nothing from $id:$mac\n"; } } our $T //= 8; my $iFile = $ARGV[0] or die "No input filename"; my $machines = (split ' ', `wc -l $iFile` )[0]; my %res :shared; my $Q = threads::Q->new( 128 ); my $outputter = threads->create( \&outputter, '1021943.log', \%res, $machines ) or die $!; threads->create( \&getter, $Q, \%res )->detach for 1 .. $T; open I, '<', $iFile or die $!; my $n = 0; chomp(), $Q->nq( join $;, ++$n, $_ ) while <I>; close I; $Q->nq( undef x $T ); $outputter->join;

The command to run it is:1011943 -T=16 url.fil. The output will be in a file called:1021943.log in the current directory. (For simplicity, I've assumed utf8 for the content, you'll need to check headers and stuff.)

The basic mechanism is to use a single outputter thread and shared hash to coordinate the output.

The multiple getter threads read urls prefix with an id (input file sequence number) from a size-limiting queue (you can download it from Re^5: dynamic number of threads based on CPU utilization) and get the content. When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.

The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.

Once the main thread has started the outputter and getter threads, it reads the input file and feeds the urls to the queue. The self limit queue prevent memory runaway. Once the entire list has been fed to the, it queues one undef per thread to terminate the getter threads and then waits for (joins) the outputter thread before terminating.

I've also printed a crude header before each lot of content to verify the ordering.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Program Design Around Threads
by choroba (Cardinal) on Mar 06, 2013 at 11:59 UTC
    Are you planning to publish threads::Q on CPAN?
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      I've been planning on doing so for a long time; but whenever I think about doing so, the same problems crop up.

      What to call it?

      I refuse to add anything to the Thread::* namespace as it is so overpopulated with broken and useless modules.

      If I uploaded it to the threads::* namespace I'd almost certainly get nasty emails for using an all lower case element in the name, just like I did when I uploaded my very first module as used.pm.

      And then there is the problem of dealing with all the cpan tester failure reports that'll come because it doesn't work on non-threaded builds. (Just like the 75% failure rate that gets shown for Win32::Fmode because it doesn't compile on Linux boxes!).

      And then I'd have to figure out the names and locations of all those other files that a cpan distribution must have despite that perl doesn't use them and nobody ever reads them. (And what meaningless drivel to put in them; except for those that have to be empty.)

      At this point I usually think to myself: "I'll just point people at that node; it's one text file; stick it in the right place and it just works!"; and move on to something else.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Call it Thrum or Funicle. To avoid FAIL messages, just die unless ($Config{usethreads}) in the tests. Use a boilerplate module from CPAN to create all the needed files.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^2: Program Design Around Threads
by 7stud (Deacon) on Mar 06, 2013 at 06:38 UTC

    When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.

    The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.

    What is the point of the locks? The id's are unique, right? As far as I can tell, it doesn't matter if every thread--including the outputter--all modify the hash at the same time.

      What is the point of the locks? The id's are unique, right? As far as I can tell, it doesn't matter if every thread--including the outputter--all modify the hash at the same time.

      Primary reason is that getters add key/value pairs to the hash; and the outputter removes key/value pairs from the hash. These are substantive modifications to the hash structure.

      It may be that Perl's internal locking is sufficient to ensure that the key cannot be seen before the value is also available, but if that is the case, the additional locking will incur very little overhead as it will be released at almost exactly the same time as the internal lock.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      network sites:
Re^2: Program Design Around Threads
by aeaton1843 (Acolyte) on Mar 06, 2013 at 18:31 UTC

    Curiosity got the better of me. This makes a lot of sense. It could be easily modified to do exactly what I want. I appreciate it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1021957]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-16 15:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found