Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Large file, multi dimensional hash - out of memory

by kennethk (Abbot)
on May 15, 2013 at 14:37 UTC ( #1033697=note: print w/ replies, xml ) Need Help??


in reply to Large file, multi dimensional hash - out of memory

In my experience, the next step after a massive hash fails is to go to a database. DBD::SQLite is fast and easy; see Databases made easy for an intro, and http://www.w3schools.com/sql/ for quick reference.

Update: Demo code:

use strict; use warnings; use DBI; require DBD::SQLite; unlink 'track.db'; my $db = DBI->connect( 'dbi:SQLite:dbname=track.db', '', '', { RaiseError => 1, AutoCommit => 1, }, ); $db->do(<<EOSQL); CREATE TABLE tracking ( fname varchar(255), fext varchar(255) ) EOSQL my $count_query = $db->prepare(<<EOSQL); SELECT COUNT(*) FROM tracking WHERE fname=? AND fext=? EOSQL my $insert_query = $db->prepare(<<EOSQL); INSERT INTO tracking (fname, fext) VALUES(?,?) EOSQL open(my $fh, "<", "input.txt") or die "cannot open < input.txt: $!"; while (my $line = <$fh>) { chomp $line; my ($fname, $fext) = split(' ',$line); $count_query->execute($fname, $fext); my ($count) = $count_query->fetchrow_array; $count_query->finish; if (!$count) { print "$fname $fext\n"; $insert_query->execute($fname, $fext); } } $db->disconnect;

Yeah, it's longer, but you are going to have to do work on disk if you can't hold it in memory. Feel like implementing your own Merge sort instead?


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: Large file, multi dimensional hash - out of memory
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1033697]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (21)
As of 2015-07-30 20:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls