Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Reading HUGE file multiple times

by BrowserUk (Pope)
on Apr 28, 2013 at 01:06 UTC ( #1031022=note: print w/ replies, xml ) Need Help??


in reply to Reading HUGE file multiple times

Index the file in one pass; then use the index to seek the id/data directly:

#! perl -slw use strict; my %idx; ## Index the file $idx{ <> } = tell( ARGV ), scalar <> until eof(); for ( 1 .. 1000 ) { my $id = getNextId( ... ); seek ARGV, $idx{ $id }; scalar <>; # discard id line (or verify) print scalar <>; ## access data; }

Untested code for flavour only.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Reading HUGE file multiple times
Download Code
Replies are listed 'Best First'.
Re^2: Reading HUGE file multiple times
by Anonymous Monk on Apr 28, 2013 at 10:35 UTC
    thanks, will try it right away

      On my system, the code above indexed a 6.4 million record, 5GB file in 57 seconds.

      1367141700 1367141757 6348909

      Once indexed, accessing the records randomly runs at 1 second per thousand.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031022]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (13)
As of 2015-07-29 17:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (266 votes), past polls