Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re^7: Reading HUGE file multiple times

by BrowserUk (Pope)
on Apr 28, 2013 at 14:42 UTC ( #1031079=note: print w/replies, xml ) Need Help??

in reply to Re^6: Reading HUGE file multiple times
in thread Reading HUGE file multiple times

Problem is there is an error with tell and it's all -1.

D'oh! I made the same mistake you did; I forgot to change ARGV for $Library. The line should read:

$Library_Index{<$Library>} = tell($library), scalar <$Library> until e +of($Library);

I tested the write-the-index-to-disc code with a file containing 17,000 id/record pairs with 300,000 data records (5.2GB).

This creates the index and writes it to disc:

#! perl -slw use strict; use Storable qw[ store ]; print time; my %idx; $idx{ <> } = tell( STDIN ), scalar <> until eof STDIN; store \%idx, '1031021.idx' or die $!; print time;

The whole process takes a little over 3 minutes:

C:\test> <1031021.dat 1367160156 1367160362 C:\test>dir 1031021* 28/04/2013 15:30 193 28/04/2013 15:04 5,272,940,608 1031021.dat 28/04/2013 15:46 316,385 1031021.idx 28/04/2013 15:29 374

And this code loads that index from disk (<1 second) and the reads 1000 random records (26 seconds) using it:

#! perl -slw use strict; use Storable qw[ retrieve ]; print time; my $idx = retrieve '1031021.idx' or die $!; print time; open DAT, '+<', '1031021.dat' or die $!; for( 1 .. 1000 ) { my( $id, $offset ) = each %$idx; seek DAT, $offset, 0; my $vid = <DAT>; die 'mismatch' unless $id eq $vid; my $data = <DAT>; } close DAT; print time;


C:\test> 1367160624 1367160624 1367160651

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^8: Reading HUGE file multiple times
by Anonymous Monk on Apr 28, 2013 at 20:03 UTC
    Perfect! Works like a charm and is blazing fast comparing to initial read method. Thanks so much for your help.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031079]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (1)
As of 2017-03-26 03:41 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (313 votes). Check out past polls.