Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Multi-threaded behavior, file handle dups and writing to a file

by oiskuu (Hermit)
on Jan 06, 2014 at 19:28 UTC ( #1069575=note: print w/replies, xml ) Need Help??

in reply to Multi-threaded behavior, file handle dups and writing to a file

Could you satisfy our curiosity regarding the hardware being used? MD5 speed ought to be in the ballpark of .5 GB/sec for modern cores (single thread). What bandwidth does your disk subsystem sport?

Regarding the problem:

$ find . -name '*.md5' -print0 | xargs -0 -n1 -P4 md5sum -c
Should run four threads of md5sum checks in parallel (provided separate checksum files exist for each image file).

  • Comment on Re: Multi-threaded behavior, file handle dups and writing to a file

Replies are listed 'Best First'.
Re^2: Multi-threaded behavior, file handle dups and writing to a file
by cganote (Initiate) on Jan 06, 2014 at 21:37 UTC

    I'm running on a compute node of a cluster; the file system is mounted on a separate machine. It looks like I'm getting approx .27GB/s real time using:

    $ time md5sum filename
    real 0m34.537s
    user 0m31.455s
    sys 0m3.047s

    ..on a 9.3G file. Did this a few times.

    Thanks for the tip on xargs -P. I'll see if I can work that in instead - though I am still curious about perl forks.

      What throughput do you get if you do a simple: time cat filename >/dev/null on the same file?

      That's your device/network IO baseline. (Assuming your null device is reasonably efficient.)

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        For the first lookup of the file with cat, it takes 50 seconds to go from the mounted filesystem to the local /device with a 9GB file; 25 seconds with a 4.6GB file; 46 seconds with a 6GB file. Subsequent cat calls (haha) on the same file take much less time, probably because the file is already loaded in memory (locally or on the filesystem side, I'm not sure), on the order of 1 - 3 seconds.

      the file system is mounted on a separate machine

      Update - NVM: didn't read original question well enough: Are you perhaps limited by the network-based IO present here?


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1069575]
[Cosmic37]: I'm wondering whether there is a specific/(or at least "usual") command or does one take a copy before undefining and then copy it back after slurping file into a string?
[BarApp]: I need help accessing perl files. I need permission!!!
[Cosmic37]: I wish thee a peachy life and express gratitude for considering my pathetic questions
[erix]: record separator
[Cosmic37]: Permissions are interesting earthlings. Did nature determine who gives permission and who asks permission. Who was the first to get permission? Are you not related to them as one big earthling family?
[karlgoethebier]: Cojones! We need cojones!

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2017-06-29 16:35 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (672 votes). Check out past polls.