Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^18: randomising file order returned by File::Find

by jeffa (Chancellor)
on Mar 02, 2011 at 04:20 UTC ( #890896=note: print w/ replies, xml ) Need Help??


in reply to Re^17: randomising file order returned by File::Find
in thread randomising file order returned by File::Find

"I'm reject the suggestion for mis-application of that tool."

I merely pointed out that "... [script] builds a big list in memory and then partitions the matching files into 100+ lists (1 per cluster instance) and writes the to separate files" is what Hadoop gives you for free. Whether or not the OP's problem is CPU bound or IO bound is determined by exactly how the OP is processing said images, which has yet to be revealed. Rather than make assumptions, like you have done, i merely saw a chance for an idea.

And concurring with tilly's suggestion is not really suggesting it. tilly suggested it. You didn't even mention Hadoop in your link.

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)


Comment on Re^18: randomising file order returned by File::Find
Select or Download Code
Re^19: randomising file order returned by File::Find
by BrowserUk (Pope) on Mar 02, 2011 at 04:29 UTC
    s what Hadoop gives you for free.

    It isn't free if you don't already have the cluster set up to use it.

    And once you've gone through the cluster set-up process, just in order to deliver a couple of hundred k of filenames to the clients, they still have to get access to each of the huge image files, which they cannot do in-situ, they would have to be shipped to the local HDFS filesystem. And then Hadoop has nothing whatsoever to offer in the processing of that file.

    And if you can think of some legitimate reason for going through all of that in order to distribute a few thousand filenames...

    Let's face it. Your suggestion is a crock.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      "Let's face it. Your suggestion is a crock."

      Again, you assume. You assumed i suggested Hadoop. I merely pointed out that "... [script] builds a big list in memory and then partitions the matching files into 100+ lists (1 per cluster instance) and writes the to separate files" is what Hadoop (not EC2) gives you for "free," as in you don't have to roll that wheel.

      Do you have anything to say about taking credit for tilly's suggestion just now? When you said, and i quote, "I even suggested it here a couple of weeks ago."

      Update: I suppose not.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://890896]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2014-10-31 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (221 votes), past polls