http://www.perlmonks.org?node_id=1031022


in reply to Reading HUGE file multiple times

Index the file in one pass; then use the index to seek the id/data directly:

#! perl -slw use strict; my %idx; ## Index the file $idx{ <> } = tell( ARGV ), scalar <> until eof(); for ( 1 .. 1000 ) { my $id = getNextId( ... ); seek ARGV, $idx{ $id }; scalar <>; # discard id line (or verify) print scalar <>; ## access data; }

Untested code for flavour only.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Reading HUGE file multiple times
by Anonymous Monk on Apr 28, 2013 at 10:35 UTC
    thanks, will try it right away

      On my system, the code above indexed a 6.4 million record, 5GB file in 57 seconds.

      1367141700 1367141757 6348909

      Once indexed, accessing the records randomly runs at 1 second per thousand.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.