"I'm reject the suggestion for mis-application of that tool."
I merely pointed out that "... [script] builds a big list in memory and then partitions the matching files into 100+ lists (1 per cluster instance) and writes the to separate files" is what Hadoop gives you for free. Whether or not the OP's problem is CPU bound or IO bound is determined by exactly how the OP is processing said images, which has yet to be revealed. Rather than make assumptions, like you have done, i merely saw a chance for an idea.
And concurring with tilly's suggestion is not really suggesting it. tilly suggested it. You didn't even mention Hadoop in your link.