Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Large file, multi dimensional hash - out of memory

by kennethk (Monsignor)
on May 15, 2013 at 14:37 UTC ( #1033697=note: print w/ replies, xml ) Need Help??


in reply to Large file, multi dimensional hash - out of memory

In my experience, the next step after a massive hash fails is to go to a database. DBD::SQLite is fast and easy; see Databases made easy for an intro, and http://www.w3schools.com/sql/ for quick reference.

Update: Demo code:

use strict; use warnings; use DBI; require DBD::SQLite; unlink 'track.db'; my $db = DBI->connect( 'dbi:SQLite:dbname=track.db', '', '', { RaiseError => 1, AutoCommit => 1, }, ); $db->do(<<EOSQL); CREATE TABLE tracking ( fname varchar(255), fext varchar(255) ) EOSQL my $count_query = $db->prepare(<<EOSQL); SELECT COUNT(*) FROM tracking WHERE fname=? AND fext=? EOSQL my $insert_query = $db->prepare(<<EOSQL); INSERT INTO tracking (fname, fext) VALUES(?,?) EOSQL open(my $fh, "<", "input.txt") or die "cannot open < input.txt: $!"; while (my $line = <$fh>) { chomp $line; my ($fname, $fext) = split(' ',$line); $count_query->execute($fname, $fext); my ($count) = $count_query->fetchrow_array; $count_query->finish; if (!$count) { print "$fname $fext\n"; $insert_query->execute($fname, $fext); } } $db->disconnect;

Yeah, it's longer, but you are going to have to do work on disk if you can't hold it in memory. Feel like implementing your own Merge sort instead?


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: Large file, multi dimensional hash - out of memory
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1033697]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (11)
As of 2014-10-21 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (98 votes), past polls