Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Large file, multi dimensional hash - out of memory

by kennethk (Abbot)
on May 15, 2013 at 14:37 UTC ( #1033697=note: print w/replies, xml ) Need Help??

in reply to Large file, multi dimensional hash - out of memory

In my experience, the next step after a massive hash fails is to go to a database. DBD::SQLite is fast and easy; see Databases made easy for an intro, and for quick reference.

Update: Demo code:

use strict; use warnings; use DBI; require DBD::SQLite; unlink 'track.db'; my $db = DBI->connect( 'dbi:SQLite:dbname=track.db', '', '', { RaiseError => 1, AutoCommit => 1, }, ); $db->do(<<EOSQL); CREATE TABLE tracking ( fname varchar(255), fext varchar(255) ) EOSQL my $count_query = $db->prepare(<<EOSQL); SELECT COUNT(*) FROM tracking WHERE fname=? AND fext=? EOSQL my $insert_query = $db->prepare(<<EOSQL); INSERT INTO tracking (fname, fext) VALUES(?,?) EOSQL open(my $fh, "<", "input.txt") or die "cannot open < input.txt: $!"; while (my $line = <$fh>) { chomp $line; my ($fname, $fext) = split(' ',$line); $count_query->execute($fname, $fext); my ($count) = $count_query->fetchrow_array; $count_query->finish; if (!$count) { print "$fname $fext\n"; $insert_query->execute($fname, $fext); } } $db->disconnect;

Yeah, it's longer, but you are going to have to do work on disk if you can't hold it in memory. Feel like implementing your own Merge sort instead?

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1033697]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2018-02-25 22:25 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (315 votes). Check out past polls.