My intuition wants to say that if there is no correlation between the lengths of adjacent records, then it doesn't matter that you are selecting records that follow long records preferentially, because following long records doesn't correlate with anything. Put another way, if all of your records have an equal chance of following a long record (or more generally, any other particular record), then the sampling method is as valid as any other.
|Replies are listed 'Best First'.|
Re^8: Random sampling a variable length file.
by BrowserUk (Pope) on Dec 27, 2009 at 11:41 UTC