Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Perl: the Markov chain saw
 
PerlMonks  

Re^2: thread/fork boss/worker framework recommendation

by learnedbyerror (Beadle)
on Dec 01, 2011 at 15:35 UTC ( #941096=note: print w/ replies, xml ) Need Help??


in reply to Re: thread/fork boss/worker framework recommendation
in thread thread/fork boss/worker framework recommendation

BrowserUK, Thanks for your offer, but I can't shared the code at this time. Even if I could, it is too large to share in mass via this site.

Your response got me thinking or re-thinking about what are the issues with my current codes set. My concerns, at least today until I learn more :), are two: 1.) Scalability (primarily memory), 2.) Program Architecture (extendability and maintainability).

From the scalability standpoint, my current code is broken into 7 discrete, or high level, job steps that are run sequentially. Within each job step, I use one or more pools of worker threads to parallelize the execution of portions of the job. My current threads based approach stores everything needed for the run of each step in shared memory. I chose this because I thought it to be the fastest approach for each step; however, this consumes a lot of memory and includes the overhead of additional perl interpreters for each thread. I do create the worker threads very early in each step; so I do minimize the amount of memory consumed by each instance. Ultimately, I want to assemble these 7 job stops in an asynchronous framework that will let data flow through each step as it is available and have the merge points, which are currently the completion of a step, happen on a more granualuar basis to allow the application run more like a peaceful running riven, than a tsunami.

From a program architecture standpoint, I am having to do a lot of low level thread and thread pool management in my code. As I progressed from writing the first job step to the 7th, I have developed a module that encapsulates this pattern; however, it isn't work that I am proud of or want to support. I have been through a number of thread pool management modules on CPAN. Most simply don't work, don't fully work, or don't work well with the current versions of perl. When I looked to migrating from threads to forks, I started getting into the challenges of shared memory IPC.

So, I spent some time back in CPAN and mucking around with some small functional tests. I think I have come up with an approach using two well known modules and one relatively new module that will allow me to address both of my concerns.

  • Approach: Use a custom written, pseudo-event, main loop to initialize asynchronous process (i.e. forks ) to query web api.
  • IPC::Lite will be used to construct and manage global variables need to store application state, information to be shared by process and job step data queues.
  • Parallel::Forker will be used to construct and manage the level 1 job steps (i.e. manage steps to merge points).
  • Parallel::Fork::BossWorkerAsync will be used to manage multiple process pools for discrete job steps (querying api in parallel, downloading files in parallel).

The use of forks and IPC::Lite should minimize my memory footprint and on the whole should be comparable in performance to threads. Parallel::Forker allows me to define a job step sequence that will let me create the merge points needed for process synchronization. Parallel::Fork::BossWorkerAsync will let me spin off pools of forks where I can parallelize as a pool.

I believe that this approach will let me get my code running where I can consider it production quality from a run and maintenance perspective. I remain interested in pursuing a full event-loop approach as think this will give me better extendability and further minimize the code footprint that I will have to maintain; however, either the state of the documentation for something like AnyEvent is too chopped for me to readily synergize how to use it, or I'm just not smart enough to pick it up and run with it. In any event, I need to spend some time working with it to get my confidence up sufficiently to give it a try.

Again, thanks in advance!

lbe


Comment on Re^2: thread/fork boss/worker framework recommendation
Re^3: thread/fork boss/worker framework recommendation
by BrowserUk (Pope) on Dec 01, 2011 at 16:17 UTC
    it is too large to share in mass via this site.

    I do have email... but if you can't show me, I cannot help.

    From a program architecture standpoint, I am having to do a lot of low level thread and thread pool management in my code. As I progressed from writing the first job step to the 7th, I have developed a module that encapsulates this pattern; however, it isn't work that I am proud of or want to support. I have been through a number of thread pool management modules on CPAN. Most simply don't work, don't fully work, or don't work well with the current versions of perl.

    Re: The highlighted portion. This is a myth! (Or simply beginners coding.)

    Thread pool management only becomes complex or laborious when people insist on trying to wrap it up in an API. It is the very process of that wrapping that creates complexity. There are so many ways to use thread pools, that writing a wrapper has to deal with so many possibilities that it just creates complexity everywhere. Which is why I don't use such modules.

    Written the right way, thread pools don't need to be 'managed', they manage themselves. Here is a complete working example in 30 lines, some of which are just for show. I've typed it so many times now, I can do it from memory, and get it right first time, every time:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; use threads; use Thread::Queue; sub worker { my $tid = threads->tid; print "Thread: $tid started"; my $Q = shift; while( my $workitem = $Q->dequeue ) { print "Thread: $tid processing workitem $workitem"; sleep $workitem; } print "Thread: $tid ended"; } our $WORKERS //= 10; our $ITEMS //= 1000; our $MAXWORK //= 2; my $Q = new Thread::Queue; my @workers = map threads->create( \&worker, $Q ), 1 .. $WORKERS; for ( 1 .. $ITEMS ) { sleep 1 while $Q->pending > $WORKERS; $Q->enqueue( rand( $MAXWORK ) ); print "main Q'd workitem $_"; } print "Main: telling threads to die"; $Q->enqueue( (undef) x $WORKERS ); print "Main waiting for threads to die"; $_->join for @workers;

    A run:

    I think I have come up with an approach

    Okay, it looks like you are decided and there's nothing left for me to try and help you with. Good luck.

    I'll finish by saying this though. I bet that if you'd give me the specs for your problem so that I could write the thread-pool version, it would be quicker and simpler to write, easier to maintain, and scale better than your event-driven approach.

    (That's a bold claim given I have no knowledge of what your code does. But hey, you gotta live a little :)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I wouldn't consider myself a beginner with threads and neither would I consider myself an expert, journeyman is probably about right. But, I do have about 35 years of experience with a multitude of different languages and have learned a few things that are true of every one of them. #1 - the less repetitive typing used the fewer the bugs, and #2 - I would rather stand on the shoulders of giants by using their knowledge than become a giant on my own.

      Though I have been coding perl since version 2, I don't use it every day and things get rusty at times. Furthermore, though Perl5 has been around for a while, there are a lot of new and exciting things going on in it. I don't want to constrain myself to just what I have learned/done in the past. I don't discount that what I know will work, I just try to take a look around when tackling something new to see if there is a better mouse trap available.

      In general, I like the current thread model in perl (post 5.8), and I use quite regularly in small stuff that comes and goes. I hope that future improvements will reduce its overhead, reduce the complexity of threads::shared for complex data structures in references, and possibly even provide an OO interface.

      For whatever reasons, possibly intangible, I don'think threading is the right approach for this current project, hence my investigation of alternatives. As I refactor my work to date, I'll break up some of the larger chunks of code into a more modular approach. Afterwards, I'll be in a position to be able to efficiently try several different multi-threading/processing approaches and see how they work. If I find anything interesting, I'll stick it out here.

      Thanks again for your opinions

      lbe

        I wouldn't consider myself a beginner with threads and neither would I consider myself an expert, journeyman is probably about right. But, I do have about 35 years of experience with a multitude of different languages and have learned a few things that are true of every one of them. #1 - the less repetitive typing used the fewer the bugs, and #2 - I would rather stand on the shoulders of giants by using their knowledge than become a giant on my own.

        That paragraph brings two thoughts to mind.

        1. The iThreads model used by Perl/threads.pm is quite different from every other threading model I've used going back the best part of 30 years.

          As a bolt-on after-the-fact addition to Perl, it has its limitations and peculiarities. It can be really quite effective for many types of concurrent algorithms, but does require that you are aware of it quirks in order to get the best from it.

          The approach it requires for many problems is often quite different from other threading models.

        2. The problem with most (actually all those i've tried) of the thread pooling modules on CPAN is that they haven't been written by giants.

          For the most part they've been written as a first attempt by people for whom iThreads is their first real experience of threading of any form, and based upon very dubious analogies.

          They are over-complex, badly tested and often written in isolation of any real application. Hence they may seem to run for some purely demonstration application, but fall in a heap when you try to use them for anything remotely practical.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re^3: thread/fork boss/worker framework recommendation
by Anonymous Monk on Feb 02, 2012 at 05:08 UTC

    Hey lbe,

    I'm the author of Parallel::Fork::BossWorkerAsync. I created it for my own use, and I use it daily. But mine is a limited, sole, use case. Your project sounds meaty enough to be revealing in terms of what the module does, what maybe it should do, etc. I'd be very grateful if you could shoot me an email to moosie@gmail.com at some point, describing your experience with the module.

    Cheers,

    -joe

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://941096]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (13)
As of 2014-04-23 18:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (551 votes), past polls