http://www.perlmonks.org?node_id=1069575


in reply to Multi-threaded behavior, file handle dups and writing to a file

Could you satisfy our curiosity regarding the hardware being used? MD5 speed ought to be in the ballpark of .5 GB/sec for modern cores (single thread). What bandwidth does your disk subsystem sport?

Regarding the problem:

$ find . -name '*.md5' -print0 | xargs -0 -n1 -P4 md5sum -c
Should run four threads of md5sum checks in parallel (provided separate checksum files exist for each image file).

Replies are listed 'Best First'.
Re^2: Multi-threaded behavior, file handle dups and writing to a file
by cganote (Initiate) on Jan 06, 2014 at 21:37 UTC

    I'm running on a compute node of a cluster; the file system is mounted on a separate machine. It looks like I'm getting approx .27GB/s real time using:

    $ time md5sum filename
    real 0m34.537s
    user 0m31.455s
    sys 0m3.047s

    ..on a 9.3G file. Did this a few times.

    Thanks for the tip on xargs -P. I'll see if I can work that in instead - though I am still curious about perl forks.

      What throughput do you get if you do a simple: time cat filename >/dev/null on the same file?

      That's your device/network IO baseline. (Assuming your null device is reasonably efficient.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        For the first lookup of the file with cat, it takes 50 seconds to go from the mounted filesystem to the local /device with a 9GB file; 25 seconds with a 4.6GB file; 46 seconds with a 6GB file. Subsequent cat calls (haha) on the same file take much less time, probably because the file is already loaded in memory (locally or on the filesystem side, I'm not sure), on the order of 1 - 3 seconds.

      the file system is mounted on a separate machine

      Update - NVM: didn't read original question well enough: Are you perhaps limited by the network-based IO present here?

      --MidLifeXis