tying a hash from a big dictionary

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: tying a hash from a big dictionary by BrowserUk (Patriarch) on Oct 31, 2011 at 13:39 UTC
I use this code to read my dictionary: You are using far more (double maybe even triple the memory requirement) because of the way you are returning the data from your subroutine. It may not be enough to relieve your out-of-memory situation, but try this before you seek other more complex and inevitably slower solutions: `sub read_dict{ my $file = shift; my %dict; open( my $fh, "<:encoding(utf5)", $file ); while( <FILE> ) { chomp; ## no need to chomp twice my ($p1, $p2) = split /\t/; push( @{ $dict{ $p1 } }, $p2 ); } close $fh; return \%dict; ## main space saving change; return a ref to the ha +sh } ... my $dict = read_dict( $dict_name ); ... for my $next_phrase ( @{ $dict->{ $key } } ){ ... }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 13:53 UTC
That was a nice one thanks! Although I still have memory problem, but this tip saved me a lot as well!	[reply]
Re^3: tying a hash from a big dictionary by BrowserUk (Patriarch) on Oct 31, 2011 at 13:56 UTC
How many lines has your file? How many of those are you succeeding in loading before you run out of memory? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 14:06 UTC
Re^5: tying a hash from a big dictionary by BrowserUk (Patriarch) on Oct 31, 2011 at 14:10 UTC
Some notes below your chosen depth have not been shown here
Re^3: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 14:55 UTC
on a 4gb machine, it will run out of memory after 5m of dictionary lines.	[reply]
Re: tying a hash from a big dictionary by johngg (Canon) on Oct 31, 2011 at 13:42 UTC
If your huge dictionary will not fit in memory then perhaps you should look at a disk-based DBM, perhaps Berkeley DB. Cheers, JohnGG	[reply]
Re: tying a hash from a big dictionary by repellent (Priest) on Nov 01, 2011 at 09:23 UTC
You can try accessing the dictionary file directly using the Search::Dict core module, assuming your dictionary is sorted. It performs a binary search through the file. Here, I've wrapped its functionality into an OO-module for convenience: `use Data::Dumper; use Search::Dict::Object; my $d = Search::Dict::Object->new( file => "/tmp/dict.txt", keyval_xfrm => sub { split /\t/ }, comp => sub { $_[0] cmp $_[1] }, # should correspond to file sort +order ); print Dumper { aaa => $d->get('aaa'), foo => $d->get('foo'), bar => $d->get('bar'), baz => $d->get('baz'), zzz => $d->get('zzz'), }; __END__ $VAR1 = { 'bar' => '789', 'baz' => '456', 'aaa' => undef, 'foo' => '123', 'zzz' => undef };` [download] The dictionary file: `$ cat /tmp/dict.txt aho 234 bar 789 bat 567 baz 456 cut 678 foo 123 yyy 000` [download] The `Search::Dict::Object` package: Read more... (2 kB)	[reply] [d/l] [select]
Re^2: tying a hash from a big dictionary by BrowserUk (Patriarch) on Nov 01, 2011 at 09:39 UTC
I've wrapped its functionality into an OO-module for convenience: "I've wrapped your bicycle in tissue paper and a nice bow." -- but it sure ain't for "convenience" :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: tying a hash from a big dictionary by repellent (Priest) on Nov 02, 2011 at 01:24 UTC
I find it convenient to have a single transform sub that produces the key-value pair for the object to search+parse a hash-like dict file. Handling/closing of filehandle is really just cake icing. Search::Dict sets the filehandle position to the first line greater than or equal `$key`. This seems pretty raw to me (read: that I should probably write some wrapper that takes care of the edge cases). The OO stick is not always the first thing I reach for, in case you're wondering.	[reply] [d/l]
Re: tying a hash from a big dictionary by tokpela (Chaplain) on Nov 01, 2011 at 18:14 UTC
Have you looked at DBM::Deep?	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks