Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: Random sampling a variable length file.by ambrus (Abbot) |
on Dec 26, 2009 at 14:32 UTC ( [id://814424]=note: print w/replies, xml ) | Need Help?? |
Yes, it's called "random sample" indeed, you've got the right keyword so you just have had to search and you'd have found this excellent past thread: improving the efficiency of a script.
(Short answer, because my reply there isn't clear: if you need a sample of k records, take an array holding k records, initialize it with the first k records of your file; then reading the rest of the file sequentially, and for each record, if its ( Update: sorry, above procedure is wrong, you've got to take a dice whose number of sides is the one-based index of the record in the file. To make this clearer, here's some code. Records are one per line, first command line argument is number of samples you need. I assumed throughout this node that you want samples without repetition and that the order of samples don't matter. (Before you ask, yes, I do know about $. and even use it sometimes.)
Update: It's easy to make an error in these kinds of things, so you have to test them. Below shows that you get all 20 possible choices of 3 out of 6 with approximately equal frequency, so we can hope it's a truly uniform random choice.
In Section
Seekers of Perl Wisdom
|
|