Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Reading HUGE file multiple times

by BrowserUk (Pope)
on Apr 28, 2013 at 01:06 UTC ( #1031022=note: print w/replies, xml ) Need Help??


in reply to Reading HUGE file multiple times

Index the file in one pass; then use the index to seek the id/data directly:

#! perl -slw use strict; my %idx; ## Index the file $idx{ <> } = tell( ARGV ), scalar <> until eof(); for ( 1 .. 1000 ) { my $id = getNextId( ... ); seek ARGV, $idx{ $id }; scalar <>; # discard id line (or verify) print scalar <>; ## access data; }

Untested code for flavour only.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Reading HUGE file multiple times
by Anonymous Monk on Apr 28, 2013 at 10:35 UTC
    thanks, will try it right away

      On my system, the code above indexed a 6.4 million record, 5GB file in 57 seconds.

      1367141700 1367141757 6348909

      Once indexed, accessing the records randomly runs at 1 second per thousand.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031022]
help
Chatterbox?
[choroba]: I want to present the most bizzare bugs and misfeatures I met when working for a large financial institution
[choroba]: I already gave a similar talk to my friends in a pub and at an internal conference at work and people liked it, so maybe...
[choroba]: LanX: That's the heritage, I can't do anything else
[RonW]: Sounds like some system my employer has "It does exactly what we need it to do and can't afford to risk anything we can't prove is 100% compatible"
[marto]: choroba sounds interesting
[RonW]: james28909 Why not write a Perl program to do the task?
[choroba]: RonW Yes, but then, one day, they needed to switch from FTP to SFTP, and... but I can't give the whole talk away here :)

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (10)
As of 2017-05-22 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?