|go ahead... be a heretic|
Re: Randomize lines with limited memoryby jdporter (Canon)
|on Nov 01, 2003 at 22:14 UTC||Need Help??|
Divide and conquer. Chop the big file into as many files as it takes to make them a manageable size. Then randomize each of them normally (i.e. using the fisher-yates). The tricky part is the initial chopping up. You can't simply take the first 10k lines, then the second 10k lines, etc. That wouldn't give you adequate randomization (obviously). I think I would read each line, and choose an output file at random, and append that line to it. The files won't come out exactly the same size (except perhaps rarely), but that doesn't matter. You could also similarly randomize on the final joining step as well, but I'm not sure that would actually buy you anything. You could try it.