Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Wait for individual sub processes

by BrowserUk (Patriarch)
on Apr 25, 2015 at 12:39 UTC ( [id://1124674]=note: print w/replies, xml ) Need Help??


in reply to Re: Wait for individual sub processes
in thread Wait for individual sub processes [SOLVED]

Splitting the file into parts before running is not necessary. Chunking is integrated into MCE

But does it solve the OPs problem of uneven processor use due to disparate processing requirements of records?

If so, how?

Does it reassemble the output in the correct order without allowing the slow processing of some records to block processing or output from other subsequent records?

If so, how?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^2: Wait for individual sub processes

Replies are listed 'Best First'.
Re^3: Wait for individual sub processes
by marioroy (Prior) on Apr 25, 2015 at 13:23 UTC

    Yes and yes. I added merge_to_iter to the MCE example.

    MCE follows a bank-teller queuing model when processing input. A slow chunk will not delay or block subsequent chunks. Each chunk comes with a $chunk_id value which is beneficial for preserving output order. Out of order items from gathering are held temporarily until ordered items arrive.

    MCE processes immediately. Thus, the pre-processing step to split the file into parts is not necessary.

    I applied a correction to the example; $slurp_ref is a scalar reference, thus print $fh $$slurp_ref;

    There merge_to_iter (the iterator itself) is executed by the manager process while running.

      Out of order items from gathering are held temporarily until ordered items arrive.

      so, if one of the early records takes an exceptionally long time to process, all the outputs from records processed after it will accumulate in memory until that record finally finishes, thus risk memory exhaustion?

      If so, is there any mechanism, automated or manual, for detecting that memory accumulation and suspending chunk dispatch until the exceptionally slow record is processed and the output released?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        The concern for memory utilization is valid. The file content is not gathered in the MCE example. Only the chunk id and file path are gathered. The parted content remains inside the output directory until ordered items arrive prior to being merged and unlinked.

        ... $mce->gather($chunk_id, "$part.out"); ...

        The upcoming MCE 1.7 release adds an await method to MCE::Queue. I will demonstrate the gathering of $chunk_id and "$part.out" to a queue and have workers block temporarily in a new MCE::Cookbook.pod. The idea is not to go beyond (200 + max_workers) number of files inside the output directory.

        ... $q->enqueue( [ $chunk_id, "$part.out" ] ); $q->await( 200 ); # blocks until the queue has 200 or less items ...

        Well, will ensure an example is included before releasing 1.7.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1124674]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-24 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found