in reply to Efficiently selecting a random, weighted element
This is just a quick thought, but it would be fast. How about estmating an average value for the (bytes/word) in the files. Once you have that, just stat the files for filesize, then obtain a number $n = ( $filesize / $ave_bytes_per_word).
I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Then push the name of the file, $n times into a selection array. When done filling the processing array, just randomly select from the array.
I'm not really a human, but I play one on earth. Cogito ergo sum a bum
In Section
Cool Uses for Perl