|go ahead... be a heretic|
Like almost any algorithm, it all depends on the exact nature of the problem space.
One refinement, if you really want to consider splicing out high-weight elements, is to create your array in sorted order so that all your highest weight files are at the end of the array. That will decrease the amount of recalculation necessary if you choose to drop them.
In the extreme case, if the highest weight word is chosen, you just pop the last element of the array and decrease the word count and you're done. If the second highest weight word is chosen, you splice out that word and have only one index to recalculate. Etc.
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
In reply to Re^3: Efficiently selecting a random, weighted element