http://www.perlmonks.org?node_id=1084481


in reply to Re: Building a new file by filtering a randomized old file on two fields
in thread Building a new file by filtering a randomized old file on two fields

To get around potential memory issues (due to 4-5 Gb files), you can use Tie::File. This will not load the entire file into memory.

That code will be horribly slow.

This single line:

my $last_index = $#locations;

Will cause the entire file to be read through a tiny buffer.

And this line:

my ($chr, $pos) = (split ' ', $locations[$indexes[$rand_index]])[0 +, 1];

will require the disk heads to shuffle back and forth all over the disk to locate the randomly chosen records.

Many parts of the file will be read and re-read many times. Performance will be abysmal.

This algorithm will fairly pick random lines from using a single pass over that file.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.