Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: dynamic number of threads based on CPU utilization

by sundialsvc4 (Abbot)
on Sep 26, 2012 at 15:47 UTC ( #995801=note: print w/ replies, xml ) Need Help??


in reply to dynamic number of threads based on CPU utilization

Remember, the work being done here is I/O-bound, not CPU bound.   This means that the processor is spending nearly all of its time getting the next I/O-operation started, then everyone goes to sleep again until the next I/O is complete.   Therefore, CPU utilization would not be expected to be a useful bellwether of how much work is now being done; or could potentially be done.   If anything is to get saturated, it’s most likely to be I/O capacity.   (Unless you are blowing memory with some too-big-for-its-britches hash and thus dropping into thrashing-hell; dunno.)

Perhaps you could consider writing this program so that it simply is given a directory-name as an input parameter and it munches through that directory and its subs, doing its thing, then writes the completed output (say...) to a shared SQLite database file.   Now, the job can be done, appropriate to each machine and to the changing workload, simply by launching multiple copies of the program simultaneously from the command-line with different parameters.   This would achieve the same goal ... of exploiting parallelism ... but with considerable reduction of internal complexity and handing more influence back to the user.   A command-line parameter to “consider only newer files,” etc., might be useful options.


Comment on Re: dynamic number of threads based on CPU utilization
Re^2: dynamic number of threads based on CPU utilization
by BrowserUk (Pope) on Sep 26, 2012 at 15:59 UTC
    Remember, the work being done here is I/O-bound, not CPU bound.

    Why are you doing this?

    For 90%+ of the runtime of the OPs program, IT IS CPU BOUND NOT IO BOUND.

    So do just stop regurgitating your useless, pointless, irrelevant, and gratuitously incorrect home-spun wisdoms.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      my apologies...I thought that since the procXml sub worked just fine, it would not be relevant to the discussion or potential solution. Within the procXml sub, I simply slurp the file into a hash, then operate on the hash.

      I was under the impression that because I was operating on the file contents in memory (i.e. the hash), it was a mostly CPU-bound process (minus slurping the input file and printing to the output file.

        .I thought that since the procXml sub worked just fine, it would not be relevant to the discussion or potential solution.

        You were mostly right. The only relevance it has is that nowhere in that code do I see any sign of locking (the keyword 'lock' does not appear), which means that multiple threads are writing to a shared hash and there is nothing to prevent them from corrupting data through collisions.

        You may 'get away with it', but I wouldn't want to be responsible for when things go wrong.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

      When a CPU runs in terms of literally billions of ops per second these days, and it is drawing its inputs from a large number of files, then ... it is I/O-bound because that’s what it is waiting on.   Nanoseconds vs. milliseconds.   The completion-time of this program, over the course of let us say one minute, will chiefly be regulated by its ability to perform input/output, not by the speed of the processor(s).   If you were to place the program onto a CPU that ran twice as fast, all other things being equal, such a program would not complete in half the time.   If it truly were CPU-bound, then it would not “slow down,” a-n-d drop out of CPU-utilization at the same time, as it is reported to be doing.

      As you say in the (upvoted) earlier comment, this is a poorly thought-out program from the start.   I would further guess that the hash might well have become enormous by that time, and that quite possibly the program has descended into “thrashing hell.”   Something, and it can only be I/O, is utterly preventing the CPU from getting any work done during the second phase.   Thrashing is about the only culprit that exists to explain that.

        Look you idiot. You are talking crap. SO DO SHUT THE F*** UP!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

        What kind of I/O? Mechanical disk I/O? SSD I/O? Ethernet I/O? DRAM I/O? L1 cache I/O?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995801]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-09-22 07:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (182 votes), past polls