Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Parallel-processing the code

by sundialsvc4 (Abbot)
on May 16, 2018 at 21:03 UTC ( #1214692=note: print w/replies, xml ) Need Help??


in reply to Parallel-processing the code

As appealing as such a notion might be at first blush, in my experience the results turn out to be disappointing.   Basically, this sort of algorithm is I/O-bound, limited in its execution time by the speed of the disk drive and associated drivers and nothing else.   The CPU is loafing.   If you now inject multiple processes or threads into the mix, you can actually make things worse because the disk drive is now faced with a much more unpredictable situation ... odds are that it will now be moving the read/write head back and forth much more frequently than if it were servicing requests from only one worker.   A CPU that thinks in nanoseconds is now waiting for multiple milliseconds:   one or more Ferraris, all stuck in traffic right next to a Yugo who might be moving along faster than they are.

The fact that you are now updating a single shared data-structure is another pinch-point.   Although of course Perl can do this reliably, the workers are now obliged to wait, not only for the disk-drive, but also for one another.

The best approach, very-recently discussed here, is to leverage the operating system’s built-in file buffering mechanisms as aggressively as possible, so that data is read-in from the disk “in great big gulps,” and devising the entire algorithm to minimize the need to move the read/write mechanism to some other cylinder on the platter.   “When I/O is necessary, make it count.”  

Just my two cents ...

Replies are listed 'Best First'.
Re^2: Parallel-processing the code
by marioroy (Priest) on May 17, 2018 at 04:58 UTC

    Hi sundialsvc4,

    What may be true years ago may be irrelevant today. Batching minimizes IPC overhead. MCE workers do not read input data simultaneously. Instead, MCE workers do parallel and serial processing automatically. Sometime, there is no involvement of the manager process as well. That said, there is a possibility that parallel may work.

    Kind regards, Mario

Re^2: Parallel-processing the code
by jeffenstein (Pilgrim) on May 17, 2018 at 06:20 UTC

    It's not usually so simple. Modern physical disks are intelligent devices, and giving them more parallel requests can allow them to optimize the head movement and lead to more global throughput at the expense of response time for an individual thread. Storage arrays and RAID levels can make a huge difference here, even if it is a single filesystem.

    That said, rajaman didn't give any information on the hardware he is using, so anything we say about how to maximize his I/O throughput is just speculation generalizations.

    PS: My issue isn't with whether or not parallelism will help with this particular problem, but rather the generalization that I/O bound processes can't generally benefit from parallelism. Storage manufacturers, OS developers and Systems Administrators put a lot of effort into making storage work better for different workloads, so you can sometimes be surprised by what your storage can do if you put in a little effort.

      Modern physical disks are intelligent devices, and giving them more parallel requests can allow them to optimize the head movement and lead to more global throughput at the expense of response time for an individual thread. Storage arrays and RAID levels can make a huge difference here, even if it is a single filesystem.

      SSDs don't have any heads that need to be mechanically moved around, and they don't have to wait for the data on the disk to appear at the read head, so the seek times reduce to (nearly) zero. Therefore, you gain even more from parallel requests to SSDs.

      Also, your application rarely talks directly to the disk. The operating system is also trying to optimize disk access, by caching and reordering requests.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^2: Parallel-processing the code
by Anonymous Monk on May 16, 2018 at 21:29 UTC
      I don't agree with many (most) of sundialsvc4's posts, but in this specific case, at least some of it makes sense, please don't throw the baby out with the bathwater. Especially the difference between IO-bound processes and CPU-bound processes does make sense, even though modern CPU with, among other things, their multiple cores and advanced multi-layered caching strategies, make it very difficult to predict performance. Only benchmarking can really sort out these things.

      But, yes, the difference between IO-bound processes and CPU-bound processes is still a good starting point to try to understand what is going on.

        please don't throw the baby out with the bathwater.

        Whilst I applaud your tolerance -- as a windows user who was denigrated here for no more reason than that when I first came here -- you make the following mistakes:

        1. You take what he says in isolation.

          By failing to consider his past postings, you make the mistake of assuming there is some insight behind *this* post.

        2. You ignore the Law of Averages.

          If you say one of the same 6 things in response to 4518 posts, you are on average going to be somewhere in the ball park some small percentage of the time.

          But if you take the few percentage of those 4518 times that he got something close to being correct as a sign that he knew what he was saying, you do this place -- and all those that expend their energies here to help others -- a disservice.

          And you give succour to the stickiest blanket sticker I've ever had the misfortune to encounter.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

        Piling on. My response was going to be, "It's hardly ever bathwater. It's usually battery acid. Throwing out the baby is generally a mercy." Good thing I kept it to myself. :P

        I don't agree with many (most) of sundialsvc4's posts, but in this specific case, at least some of it makes sense, please don't throw the baby out with the bathwater.

        Just because it looks like a baby doesn't make it any less of a turd -- flush it

      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1214692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2018-11-19 06:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My code is most likely broken because:
















    Results (208 votes). Check out past polls.

    Notices?