Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^7: Reading HUGE file multiple times

by BrowserUk (Pope)
on Apr 28, 2013 at 14:42 UTC ( #1031079=note: print w/ replies, xml ) Need Help??


in reply to Re^6: Reading HUGE file multiple times
in thread Reading HUGE file multiple times

Problem is there is an error with tell and it's all -1.

D'oh! I made the same mistake you did; I forgot to change ARGV for $Library. The line should read:

$Library_Index{<$Library>} = tell($library), scalar <$Library> until e +of($Library);

I tested the write-the-index-to-disc code with a file containing 17,000 id/record pairs with 300,000 data records (5.2GB).

This creates the index and writes it to disc:

#! perl -slw use strict; use Storable qw[ store ]; print time; my %idx; $idx{ <> } = tell( STDIN ), scalar <> until eof STDIN; store \%idx, '1031021.idx' or die $!; print time;

The whole process takes a little over 3 minutes:

C:\test>1031021-i.pl <1031021.dat 1367160156 1367160362 C:\test>dir 1031021* 28/04/2013 15:30 193 1031021-i.pl 28/04/2013 15:04 5,272,940,608 1031021.dat 28/04/2013 15:46 316,385 1031021.idx 28/04/2013 15:29 374 1031021.pl

And this code loads that index from disk (<1 second) and the reads 1000 random records (26 seconds) using it:

#! perl -slw use strict; use Storable qw[ retrieve ]; print time; my $idx = retrieve '1031021.idx' or die $!; print time; open DAT, '+<', '1031021.dat' or die $!; for( 1 .. 1000 ) { my( $id, $offset ) = each %$idx; seek DAT, $offset, 0; my $vid = <DAT>; die 'mismatch' unless $id eq $vid; my $data = <DAT>; } close DAT; print time;

Run:

C:\test>1031021.pl 1367160624 1367160624 1367160651

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^7: Reading HUGE file multiple times
Select or Download Code
Re^8: Reading HUGE file multiple times
by Anonymous Monk on Apr 28, 2013 at 20:03 UTC
    Perfect! Works like a charm and is blazing fast comparing to initial read method. Thanks so much for your help.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031079]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2015-07-07 12:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls