Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Program Design Around Threads

by aeaton1843 (Acolyte)
on Mar 06, 2013 at 04:37 UTC ( #1021943=perlquestion: print w/ replies, xml ) Need Help??
aeaton1843 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone

I ran across a problem and am pondering the best way to implement it. I have some working code but am not sure my path is a good one. I will put enough code here so that people can comment. If any of you have good reference material on program design around threads and PERL please post. I have done a chunk of work with threads but I can't say that I cared too much about synchronizing the output they produce.

The problem is that of not orchestrating a bunch of threads with a lot of code yet needing some synchronization. My problem is I need to get command output from several thousand network devices all of which are sitting behind a proxy. The basic program is take in a file of machines and a file of commands, run the commands and drop the responses into files.

The problem I am trying to solve is the printed order of the commands in the output file. I would like the command output to be in the order that is in the file. After a lot of tinkering and debugging of different ideas, I settled on creating a shared array that holds the order of thread creation. After the threads are populated with the information to print, I allow them to print in the order of their creation. It isn't very elegant and I am guessing there is a better way. It just hasn't hit me yet. So my question is what is a good way to handle the file creation with multiple threads spitting out files at the same time. I have seen some information out there about sharing filehandles between threads and the like but nothing about a good design to deal with something like this. Frankly, another design might be to have the threads print to separate files and the synch problem goes away. Not really what I wanted though.

use threads; use threads::shared; my $semaphore = new Thread::Semaphore; my $live_threads :shared = 0; my $thrwait_to_finish :shared = 5; my $maximum_active_threads :shared = 10; my @tid_array :shared; my @commands = ("show ip interface brief | exclude unassigned", "show vlan", "show vrf"); my @machines = ('1.1.1.1', '2.2.2.2', '3.3.3.3'); foreach my $cmd (@commands) { foreach my $machine_name (@machines) { my $url = "http://proxy:port/api/device/$machine_name/execute?cmd= +$cmd"; if ($live_threads >= $maximum_active_threads) { my $new_thread_num = wait_for_finish(); $semaphore->down(); $live_threads = $new_thread_num; $semaphore->up(); } else { my $thread = threads->create('get_request', $cmd, $url, $machi +ne_name); my $tid = $thread->tid(); $semaphore->down(); push (@tid_array, $tid); $live_threads++; $semaphore->up(); } } } sub get_request { sleep 1; my $cmd = $_[0]; my $url = $_[1]; my $machine_name = $_[2]; my $thread_id = threads->tid(); my $count = '0'; ... $web_response = $response->content; ... while ($thread_id != $tid_array[0]) { print "Thread_sleaping: $thread_id\n"; sleep 1; } ... $semaphore->down(); shift @tid_array; $semaphore->up(); print $fh ($webresponse); }

Comment on Program Design Around Threads
Download Code
Re: Program Design Around Threads
by BrowserUk (Pope) on Mar 06, 2013 at 04:55 UTC

    Can you confirm I understood you?

    1. You have an input file that list 1000s of machines to be queried for some information.

      How many thousands?

    2. You want to conduct multiple queries concurrently.

      What type of query mechanism? (http,ftp,ssh?)

      How many concurrent queries? (and the proxy handle the throughput?)

      How long do the responses take (on average?)

    3. The results from each query are multiple lines.

      How many lines per machine?

    4. You want all the results in a single output file.
    5. You want the multiple lines all grouped together; and in the same order as the input file.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I wanted to answer your questions. I will have to run through the proposed solution tonight.

      1. It depends but 2-3k, maybe less if I can narrow to a specific data center.

      2a. I am being careful with thread bombing the proxy. I have a thread creation limitter. Right now it is sitting at 10 threads.

      2b. Worst case, I have to pull an entire configuration off an Arista. That process can take 30 seconds on a busy Arista device. Much less for most everything else.

      3. I think the worst case is "show interface" @ 20ish*384/per machine The 20ish is variable based on what is configured on the interface. Traffic shaping for example.

      4. Sorry I wasn't clear enough here. I want one file per queried machine with all of the command outputs in it.

      5. The outputs of each command should be put into the output file in the order of the command list in the command file. IE if show running-configuration is first in the command file then its output should appear first in the output file. Not that it matters but right now the file's name is that of the machine name.

        Sorry I wasn't clear enough here. I want one file per queried machine with all of the command outputs in it.

        Then I do not understand your stated problem (from the OP): "The problem I am trying to solve is the printed order of the commands in the output file. ", or why you think you need all those darn semaphores in your code?

        If you programmed a single threaded solution to this is might look something like:

        for my $machine ( @machines ) { open my $out, '>', "$machine.dat" or die $!; for my $cmd ( @commands ) { my $content = get "$machine/$cmd"; print $out $content; } close $out; }

        The outputs from the commands end up in the file in the same order as the commands are run, because that the order you print them in.

        To turn that into a threaded solution, just make the body of the outer loop the thread:

        for my $machine ( @machines ) { async { open my $out, '>', "$machine.dat" or die $!; for my $cmd ( @commands ) { my $content = get "$machine/$cmd"; print $out $content; } close $out; }->detach; }

        And (essentially*) that's it! Each thread it using a different file, so no conflicts or ordering problems arise. No need for locking or semaphores or synchronisation.

        *As shown, the above would start a new thread for every one of the 1000s of machines and run them concurrently which would blow your memory to hell and thrash your disc to death. But fixing that is very simple:

        my $running :shared = 0; ## This tracks the number of conc +urrent threads for my $machine ( @machines ) { async { { lock $running; ++$running; } ## incr on start open my $out, '>', "$machine.dat" or die $!; for my $cmd ( @commands ) { my $content = get "$machine/$cmd"; print $out $content; } close $out; { lock $running; --$running; } ## decr on finish }->detach; sleep 1 until $running < 10; ## sleep a bit if more + than 10 are running } sleep 1 while $running; ## Make sure the main threads waits for th +e last few threads to finish

        (That would be simpler still if the API allowed sleep 1 while threads->list( threads::detach ) > 10;; but it doesn't.)

        It would also be more efficient of your machine resources (cpu & memory) to use a thread pool (NOT Thread::Pool!!!) solution; but as you're IO-bound; and limiting that for the sake of your proxy; you are unlikely to trouble the resources of even the least well specified machine with the above code.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Program Design Around Threads
by BrowserUk (Pope) on Mar 06, 2013 at 06:18 UTC

    Try this:

    #! perl -slw use strict; use threads; use threads::Q; use threads::shared; use LWP::Simple; sub outputter { my( $fname, $href, $n ) = @_; open my $O, '>:utf8', $fname or die $!; for my $id ( 1 .. $n ) { sleep 1 until exists $href->{ $id }; lock %$href; print $O "$id\t::", delete $href->{ $id }; } close $O; } sub getter { my $tid = threads->tid; my( $Q, $href ) = @_; while( $_ = $Q->dq ) { my( $id, $mac ) = split $;, $_; my $content = get( "http://$mac/" ); lock %$href; $href->{ $id } = $content // "Nothing from $id:$mac\n"; } } our $T //= 8; my $iFile = $ARGV[0] or die "No input filename"; my $machines = (split ' ', `wc -l $iFile` )[0]; my %res :shared; my $Q = threads::Q->new( 128 ); my $outputter = threads->create( \&outputter, '1021943.log', \%res, $machines ) or die $!; threads->create( \&getter, $Q, \%res )->detach for 1 .. $T; open I, '<', $iFile or die $!; my $n = 0; chomp(), $Q->nq( join $;, ++$n, $_ ) while <I>; close I; $Q->nq( undef x $T ); $outputter->join;

    The command to run it is:1011943 -T=16 url.fil. The output will be in a file called:1021943.log in the current directory. (For simplicity, I've assumed utf8 for the content, you'll need to check headers and stuff.)

    The basic mechanism is to use a single outputter thread and shared hash to coordinate the output.

    The multiple getter threads read urls prefix with an id (input file sequence number) from a size-limiting queue (you can download it from Re^5: dynamic number of threads based on CPU utilization) and get the content. When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.

    The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.

    Once the main thread has started the outputter and getter threads, it reads the input file and feeds the urls to the queue. The self limit queue prevent memory runaway. Once the entire list has been fed to the, it queues one undef per thread to terminate the getter threads and then waits for (joins) the outputter thread before terminating.

    I've also printed a crude header before each lot of content to verify the ordering.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      When they have it, they lock the shared hash and add the content (or an error messgae) as the value, keyed by the id.

      The outputter thread monitors this hash waiting for the appearance of the next id in sequence, and when it appears, they lock the hash; write it to the file and then delete it.

      What is the point of the locks? The id's are unique, right? As far as I can tell, it doesn't matter if every thread--including the outputter--all modify the hash at the same time.

        What is the point of the locks? The id's are unique, right? As far as I can tell, it doesn't matter if every thread--including the outputter--all modify the hash at the same time.

        Primary reason is that getters add key/value pairs to the hash; and the outputter removes key/value pairs from the hash. These are substantive modifications to the hash structure.

        It may be that Perl's internal locking is sufficient to ensure that the key cannot be seen before the value is also available, but if that is the case, the additional locking will incur very little overhead as it will be released at almost exactly the same time as the internal lock.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        network sites:
      Are you planning to publish threads::Q on CPAN?
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        I've been planning on doing so for a long time; but whenever I think about doing so, the same problems crop up.

        What to call it?

        I refuse to add anything to the Thread::* namespace as it is so overpopulated with broken and useless modules.

        If I uploaded it to the threads::* namespace I'd almost certainly get nasty emails for using an all lower case element in the name, just like I did when I uploaded my very first module as used.pm.

        And then there is the problem of dealing with all the cpan tester failure reports that'll come because it doesn't work on non-threaded builds. (Just like the 75% failure rate that gets shown for Win32::Fmode because it doesn't compile on Linux boxes!).

        And then I'd have to figure out the names and locations of all those other files that a cpan distribution must have despite that perl doesn't use them and nobody ever reads them. (And what meaningless drivel to put in them; except for those that have to be empty.)

        At this point I usually think to myself: "I'll just point people at that node; it's one text file; stick it in the right place and it just works!"; and move on to something else.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Curiosity got the better of me. This makes a lot of sense. It could be easily modified to do exactly what I want. I appreciate it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021943]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2014-07-10 10:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (206 votes), past polls