Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Multi-threaded behavior, file handle dups and writing to a file

by oiskuu (Pilgrim)
on Jan 06, 2014 at 19:28 UTC ( #1069575=note: print w/ replies, xml ) Need Help??


in reply to Multi-threaded behavior, file handle dups and writing to a file

Could you satisfy our curiosity regarding the hardware being used? MD5 speed ought to be in the ballpark of .5 GB/sec for modern cores (single thread). What bandwidth does your disk subsystem sport?

Regarding the problem:

$ find . -name '*.md5' -print0 | xargs -0 -n1 -P4 md5sum -c
Should run four threads of md5sum checks in parallel (provided separate checksum files exist for each image file).


Comment on Re: Multi-threaded behavior, file handle dups and writing to a file
Re^2: Multi-threaded behavior, file handle dups and writing to a file
by cganote (Initiate) on Jan 06, 2014 at 21:37 UTC

    I'm running on a compute node of a cluster; the file system is mounted on a separate machine. It looks like I'm getting approx .27GB/s real time using:

    $ time md5sum filename
    real 0m34.537s
    user 0m31.455s
    sys 0m3.047s

    ..on a 9.3G file. Did this a few times.

    Thanks for the tip on xargs -P. I'll see if I can work that in instead - though I am still curious about perl forks.

      the file system is mounted on a separate machine

      Update - NVM: didn't read original question well enough: Are you perhaps limited by the network-based IO present here?

      --MidLifeXis

      What throughput do you get if you do a simple: time cat filename >/dev/null on the same file?

      That's your device/network IO baseline. (Assuming your null device is reasonably efficient.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        For the first lookup of the file with cat, it takes 50 seconds to go from the mounted filesystem to the local /device with a 9GB file; 25 seconds with a 4.6GB file; 46 seconds with a 6GB file. Subsequent cat calls (haha) on the same file take much less time, probably because the file is already loaded in memory (locally or on the filesystem side, I'm not sure), on the order of 1 - 3 seconds.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1069575]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (13)
As of 2014-07-23 18:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (148 votes), past polls