![]() |
|
Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: Random sampling a variable record-length file.by wfsp (Abbot) |
on Dec 27, 2009 at 13:13 UTC ( [id://814508]=note: print w/replies, xml ) | Need Help?? |
Perhaps a batch of records would be a valid sample of already random records? Wouldn't the first hundred be as valid a sample as any chosen by any other method?
Rather than count the first hundred you could do as they allegedly do for the Labour vote in parts of South Wales - weigh them. In this case read 4KB worth. If you wanted more than one batch you could take a batch from the middle and the end too. You could do your stats on each, compare them and if there is a close enough correlation your're done. You could change the size and number of batches to suit the time available/accuracy required (start expensive and reduce as confidence is established). Likely not the answer you're looking for but my background in this sort of thing revolved around buckets of rivets rather than CSV files. :-)
In Section
Seekers of Perl Wisdom
|
|