|Problems? Is your data what you think it is?|
A series of random number and othersby lightoverhead (Monk)
|on Oct 09, 2008 at 00:17 UTC||Need Help??|
lightoverhead has asked for the
wisdom of the Perl Monks concerning the following question:
I tried to randomly select 20 million lines from a 40 million lines file.
I have two questions.
First, how to select the 20 million lines from the file without bias. I have tried to use perl. But it seems perl is not that good for such thing, so I used R to generate 20 million index to create an index file (rand_sorted.txt) and use it to print the selected lines.
Second, my code is as below:
This works fine and fast for me, but I just hate these two loops nested each other, it looks stupid, or maybe it's stupid. Any one can give me an idea how to do it? Thanks.