Re^2: Building a new file by filtering a randomized old file on two fields

To get around potential memory issues (due to 4-5 Gb files), you can use Tie::File. This will not load the entire file into memory.

That code will be horribly slow.

This single line:

my $last_index = $#locations;
[download]

Will cause the entire file to be read through a tiny buffer.

And this line:

    my ($chr, $pos) = (split ' ', $locations[$indexes[$rand_index]])[0
+, 1];
[download]

will require the disk heads to shuffle back and forth all over the disk to locate the randomly chosen records.

Many parts of the file will be read and re-read many times. Performance will be abysmal.

This algorithm will fairly pick random lines from using a single pass over that file.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.