in reply to Re: Building a new file by filtering a randomized old file on two fields
in thread Building a new file by filtering a randomized old file on two fields
To get around potential memory issues (due to 4-5 Gb files), you can use Tie::File. This will not load the entire file into memory.
That code will be horribly slow.
This single line:
my $last_index = $#locations;
Will cause the entire file to be read through a tiny buffer.
And this line:
my ($chr, $pos) = (split ' ', $locations[$indexes[$rand_index]])[0 +, 1];
will require the disk heads to shuffle back and forth all over the disk to locate the randomly chosen records.
Many parts of the file will be read and re-read many times. Performance will be abysmal.
This algorithm will fairly pick random lines from using a single pass over that file.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
In Section
Seekers of Perl Wisdom