Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: Multi-threaded behavior, file handle dups and writing to a file

by sundialsvc4 (Abbot)
on Jan 06, 2014 at 21:24 UTC ( #1069580=note: print w/replies, xml ) Need Help??

in reply to Multi-threaded behavior, file handle dups and writing to a file

You can get so much useful work done with xargs -Pn, as shown above, if your version of Linux/Unix supports it.   Perhaps most useful, you can very quickly see whether-or-not parallelism will actually be beneficial to you, without having to “write a complicated [Perl ...] program” in order to find out.

The only determinant of the runtime of this particular task will be:   how fast the disk-drives, channel subsystems and so-forth can move the requisite amount of data past md5sum’s nose.   The CPU processing-time pales against the I/O time, and many filesystems handle parallelism internally, on behalf of all comers, very well on their own.   Lustre might give you faster and/or more-scaleable throughput on this particular task . . . or not.   Certainly you should fiddle-around very extensively with the xargs approach to find out how your particular hardware configuration will (or won’t) respond favorably ... where the “sweet-spot” number of processes is, if it’s actually greater than 1, and so on ... then decide for yourself whether a more-elaborate approach is justified.   (Likely it won’t be, and in any case, a perl-script that is designed simply to be run in this way via xargs is much easier to bang-out than something that actually implements its own multithread controller, and it just might work as well or even better.   If you possibly can, “Jest get ’er done.”)

  • Comment on Re: Multi-threaded behavior, file handle dups and writing to a file

Replies are listed 'Best First'.
Re^2: Multi-threaded behavior, file handle dups and writing to a file
by cganote (Initiate) on Jan 08, 2014 at 06:04 UTC

    I really appreciate your response. I've only used xargs before for tricky pipes and to get the -n 1 feature. I see what you mean about CPU time on this issue, now that I've run it a few times.

    If, in a case where I find it does pay off to run the script with multiple tasks, is there a good detailed overview of how the I/O is handled on duped filehandles or between processes? What I've read so far still doesn't explain the 'why it do dat' of my program, and I'd like to fill in the holes in my understanding for future cases.

    >If you possibly can, “Jest get ’er done.”
    I wish I'd asked sooner. Sometimes it's helpful to be reminded of the goal and not caught up on the implementation details!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1069580]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2018-06-24 11:35 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.