http://www.perlmonks.org?node_id=814503


in reply to Re^7: Random sampling a variable length file.
in thread Random sampling a variable record-length file.

My intuition wants to say that if there is no correlation between the lengths of adjacent records, then it doesn't matter that you are selecting records that follow long records preferentially, because following long records doesn't correlate with anything. Put another way, if all of your records have an equal chance of following a long record (or more generally, any other particular record), then the sampling method is as valid as any other.

Thankyou! That's what my intuition is telling me. I was hoping one of the math guys around these parts (the set of whom you may or may ot be a member, I have no way of knowing:), would be able to put some semi-formal buttressing behind that intuition.

But in the absence of that, the fact that at least one other person has a similar intuition--and define the logic for it in their own words--, and no strong counter argument has been stated, gives me a good enough feeling to make it worth while pursuing it to the next level. Ie. coding up something crude and attempting to define a test scenario to substantiate it.

Any thoughts on a test scenario that might avoid the mistake of inherently confirming what I'm looking for?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.