I've not rejected Hadoop. I even suggested it here a couple of weeks ago.
I'm simply not recommending it for this as it is inappropriate. To summarise: You cannot force fit variable size and (typically huge) 3D image files into fix-sized aggregated packets; nor can you process images line-by-line from STDIN as is required by Hadoop streaming.
I'm not rejecting the tool; I'm reject the suggestion for mis-application of that tool. The OP reads and makes up his own mind on the matter.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
"I'm reject the suggestion for mis-application of that tool."
I merely pointed out that "... [script] builds a big list in memory and then partitions the matching files into 100+ lists (1 per cluster instance) and writes the to separate files" is what Hadoop gives you for free. Whether or not the OP's problem is CPU bound or IO bound is determined by exactly how the OP is processing said images, which has yet to be revealed. Rather than make assumptions, like you have done, i merely saw a chance for an idea.
And concurring with tilly's suggestion is not really suggesting it. tilly suggested it. You didn't even mention Hadoop in your link.
| [reply] [d/l] [select] |
s what Hadoop gives you for free.
It isn't free if you don't already have the cluster set up to use it.
And once you've gone through the cluster set-up process, just in order to deliver a couple of hundred k of filenames to the clients, they still have to get access to each of the huge image files, which they cannot do in-situ, they would have to be shipped to the local HDFS filesystem. And then Hadoop has nothing whatsoever to offer in the processing of that file.
And if you can think of some legitimate reason for going through all of that in order to distribute a few thousand filenames...
Let's face it. Your suggestion is a crock.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |