http://www.perlmonks.org?node_id=1059083

ThelmaJay has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,

I'm relatively new to Perl.

I'm with a multi-thread dilemma :).

To implement a multitask environment I used Forks (Parallel::ForkManager). But I think forking is not the solution I need.

I create 50 forks each one is responsible for parsing a stream.

But if one of the streams is bigger than the other 49, these 49 instead of starting to parse new streams they have to wait for the one that is taking more time to process.

How can I "unblock" the other threads? Is forking the solution?

My code:

my $pm = Parallel::ForkManager->new(50); my $i = 0; while(i<50){ $pm->start and next; processStream($stream); $i++; $pm->finish; # do the exit in the child process } $pm->wait_all_children();

If I remove wait_all_children() what would happen?

Best Regards, thank you for your help.

Replies are listed 'Best First'.
Re: Fork vs pThreads
by RichardK (Parson) on Oct 21, 2013 at 11:52 UTC

    You're not really blocked, all the work is done and you're just waiting for the fork doing the most work.

    In your example you've got 50 units of work, while (i<50), and 50 forks, so the total time is limited by the fork that take the longest. Try reducing the number of forks and see what happens.

Re: Fork vs pThreads
by BrowserUk (Patriarch) on Oct 21, 2013 at 10:31 UTC
    But if one of the streams is bigger than the other 49, these 49 instead of starting to parse new streams they have to wait for the one that is taking more time to process.

    That doesn't ring true. What evidence have for that conclusion?

    Also, how many cores do you have?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm using amazon's EC2 m1.xlarge (4 vCPU). Because of waiting_all_children() and because I put some prints to know when each thread begins and ends. And also the time it took from one block of 50 to other block of 50 and it was the same time as the longer one.
        And also the time it took from one block of 50 to other block of 50 and it was the same time as the longer one.

        Of course. How could it be otherwise?

        If you draw 10 parallel lines of different lengths:

        --------------- ------ ---------------- ---- - ----------- -------- ------ -------------- -----

        Is there any way to make the overall width less than the longest line?

        Same thing.

        As an aside, but entirely relevant; running 50 tasks concurrently on 4 cpus will take longer than running those same 50 tasks; but only 4 at any given time.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Fork vs pThreads
by Corion (Patriarch) on Oct 21, 2013 at 13:49 UTC

    How many units of work do you have overall? Are there only 50 units of work, or are there more than 50?

    Your current setup only processes the first 50 units of work and does not process any further items.

    If the code you show is not the code you are really running, you'll need to show some more relevant code.

      I have an unknown number of streams.

      Every 5 seconds I have a new one that is stored in a table. I then execute a select with a limit of 50, the number of rows returned(nrRowsReturned) goes into the fork pool.

      my $pm = Parallel::ForkManager->new($nrRowsReturned); while(there are rows to fetch){ $pm->start and next; processStream($stream); $pm->finish; } $pm->wait_all_children();

        Why are you running as many children in parallel as you have rows? For 1000 rows, you will launch 1000 children in parallel.

        The idea of Parallel::Forkmanager is to use the optimal number of parallel children, which is usually roughly the number of CPUs (or cores) of your machine, and not the number of tasks to be processed.

        Other people have already recommended a smaller number of concurrent processes, like 4 or 8. Maybe you should try that suggestion.

Re: Fork vs pThreads
by sundialsvc4 (Abbot) on Oct 21, 2013 at 14:43 UTC

    What BrowserUK is saying about “only 4 at a time” is anything but “an aside.”   It’s the key to the whole thing.

    Consider what you see going on every day in a fast-food joint.   There’s a certain number of workers, and all of them are working on a queue of incoming food orders.   If 1,000 orders suddenly come pouring in, then the queues will get very long, but the kitchen won’t get overcrowded.   The number of workers in the kitchen, and each of their assigned tasks, is set to maximize throughput, which means that all the workers are working as fast as they can and that they are not competing with one another for resources.   The restaurant doesn’t lose the ability to do its job ... it just takes (predictably!) longer.   (And they can tell you, within a reasonably accurate time-window, just how long it will take.)

    The loss of throughput, furthermore, isn’t linear:   no matter what the ruling-constraint actually is, the loss becomes exponential.   If you plot the average completion-time as the y-axis on a graph, where the “number of simultaneous processes” is x, the resulting graph has an elbow-shape:   it gradually gets worse, then, !!wham!! it “hits the wall” and goes to hell and never comes back.   If you plot “number of seconds required to complete 1,000 requests” as the y, the lesson becomes even clearer.   You will finish the work-load faster (“you will complete the work, period ...”) by controlling the number of simultaneous workers, whether they be processes or threads.

    The number-one resource of contention is always:   virtual memory.   “It’s the paging that gets ya,” and we have a special word for what happens:   “thrashing.”   But any ruling-constraint can cause congestive collapse, with similarly catastrophic results.

      Thank you so much for your explanation :) It means a lot to me. So just to see if I understood.

      By launching 50 it means that each core is going to have a queue of approx 12 streams to be processed by each core. Only 4 simultaneously.

      Because the queue is long, and gets longer because I'm always adding more in each while cycle,the available throughput,cpu and memory gets smaller, causing a bottleneck.

      Did I get it? :)

      So the fact that one stream is bigger than the other does not impact?

        To abuse the fast food analogy where employees are threads, starting a new thread also involves going through HR paperwork before the new thread can do their task. (You really want the task to be more than making a single burger for customer #42 before retiring too)

        Your quad-core restaurant requires a bit of time for one employee to save all their tools away before someone else can change context and use one of the four stations.

        And once you run out of physical ram/floor space for threads to stand in, then you've got to use a bus to swap employees in and out which is horrifyingly slow.

        Actually ... no!   :-)

        As you undoubtedly know, a “process” (or thread ...) has absolutely nothing to do with “a core.”   It is (so much for your, ahem, too-thin attempt at humor at my expense ...) just “an employee.”   This employee (no matter what core(s) (s)he happens to get dispatched upon) “finds work to do, and does it, and in so doing remains just as busy as (s)he possibly can be.”   As do all the other employees in the grease-shack.

        Thanks to the existence of a queue, of a “to-do list,” our intrepid employee will never become overwhelmed, no matter how many tour-buses full of hungry folks show up in the drive-thru.   And this is the aforementioned “key.”   There are only so-many square feet of floor-space in the kitchen, therefore only so many burgers that can be cooked at a time.   This will never change, no matter how many burgers are ordered.   “The optimal burger-throughput,” for this particular restaurant, therefore, is always constant ... and the same thing is true of your computing facility.   So, to serve (however many customers there may be) in the least amount of time, you should pay attention only to the conditions in the kitchen ... not the lobby.   Do not over-commit the kitchen.   Instead, parcel out the workload (whatever it is ...) in such a way as to maintain full-utilization of the physical resources but nothing more.   Yes, customers will have to wait, but they are accustomed to that, if they feel that the wait-time is predictable.   (Furthermore, it is necessary(!) for them to wait, if they are to be served in the least amount of time.)   Do not allow the employees to compete with each other.   Do not over-commit the deep-fat fryer.   Do not permit the order-completion time to become, “due to resource contention,” less than it would be if the restaurant were completely empty.   If it should come to that, keep the hungry-folks outside.   Do not permit them to enter the doorway unless you know that you can serve them consistently.

Re: Fork vs pThreads
by Anonymous Monk on Oct 21, 2013 at 15:52 UTC
    I think your basic logic error is here:
    my $pm = Parallel::ForkManager->new(50); # 50 forks my $i = 0; while(i<50){ # start 50 forks $pm->start and next; processStream($stream); $i++; $pm->finish; # do the exit in the child process } $pm->wait_all_children(); # and then do nothing
    I'd amend your code to this (adjusting the fork-amount to something sane such as 4):
    # max 4 processes simultaneously my $pm = Parallel::ForkManager->new(4); while( .. there are rows to fetch ..){ $pm->start and next; processStream($stream); $pm->finish; # do the exit in the child process } $pm->wait_all_children();
    The amount of rows to fetch could be 50 or whatever number that you want. Parallel::ForkManager will take care of the queueing and will keep four forks alive as long as there are jobs in the queue.
      Exactly :). You guys are the best!! :) I already see improvements :). Thank you all for your replies! :)