Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Multi-threaded behavior, file handle dups and writing to a file

by sundialsvc4 (Monsignor)
on Jan 06, 2014 at 21:24 UTC ( #1069580=note: print w/ replies, xml ) Need Help??


in reply to Multi-threaded behavior, file handle dups and writing to a file

You can get so much useful work done with xargs -Pn, as shown above, if your version of Linux/Unix supports it.   Perhaps most useful, you can very quickly see whether-or-not parallelism will actually be beneficial to you, without having to “write a complicated [Perl ...] program” in order to find out.

The only determinant of the runtime of this particular task will be:   how fast the disk-drives, channel subsystems and so-forth can move the requisite amount of data past md5sum’s nose.   The CPU processing-time pales against the I/O time, and many filesystems handle parallelism internally, on behalf of all comers, very well on their own.   Lustre might give you faster and/or more-scaleable throughput on this particular task . . . or not.   Certainly you should fiddle-around very extensively with the xargs approach to find out how your particular hardware configuration will (or won’t) respond favorably ... where the “sweet-spot” number of processes is, if it’s actually greater than 1, and so on ... then decide for yourself whether a more-elaborate approach is justified.   (Likely it won’t be, and in any case, a perl-script that is designed simply to be run in this way via xargs is much easier to bang-out than something that actually implements its own multithread controller, and it just might work as well or even better.   If you possibly can, “Jest get ’er done.”)


Comment on Re: Multi-threaded behavior, file handle dups and writing to a file
Re^2: Multi-threaded behavior, file handle dups and writing to a file
by cganote (Initiate) on Jan 08, 2014 at 06:04 UTC

    I really appreciate your response. I've only used xargs before for tricky pipes and to get the -n 1 feature. I see what you mean about CPU time on this issue, now that I've run it a few times.

    If, in a case where I find it does pay off to run the script with multiple tasks, is there a good detailed overview of how the I/O is handled on duped filehandles or between processes? What I've read so far still doesn't explain the 'why it do dat' of my program, and I'd like to fill in the holes in my understanding for future cases.

    >If you possibly can, “Jest get ’er done.”
    I wish I'd asked sooner. Sometimes it's helpful to be reminded of the goal and not caught up on the implementation details!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1069580]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-08-21 23:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (144 votes), past polls