Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Efficient giant hashes

by dragonchild (Archbishop)
on Mar 10, 2005 at 13:28 UTC ( [id://438236]=note: print w/replies, xml ) Need Help??


in reply to Efficient giant hashes

The script becomes notably slower when reaching this size.

100_000 elements is going to be, roughly, 1-2K per element. That's 100M-200M of RAM. Unless each element is some huge data structure in its own right (like an array of hashes or somesuch), you're probably not doing a lot of swapping to disk or anything like that.

It sounds like your algorithm(s) aren't scaling with your data structures. For example, doing

foreach my $key (keys %hash) { ... }
is going to be considerably slower than the equivalent
while (my ($key, $value) = each %hash) { ... }
foreach has to build the list in memory, then iterate over it. each will only bring one value in at a time.

My bet is on your algorithms, not your data structures. Maybe if you posted a few snippets of how you use this massive hash, we might be able to help you out.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Replies are listed 'Best First'.
Re^2: Efficient giant hashes
by halley (Prior) on Mar 10, 2005 at 14:54 UTC
    foreach my $key (keys %hash) { }
    Is that actually slower? Got benchmarks to back that up?

    And if it IS, then why should it be? There are specialized magic iterators when the parser recognizes an impending iteration with constants, like for (1 .. 1_000_000). Why does perl not implement an automatic iterator when the parser notices a simple sort-free for (keys %foo)? That's such a common idiom I would be amazed it wasn't getting special attention.

    --
    [ e d @ h a l l e y . c c ]

      Anonymonk beat me to the punch. But, the reason for why foreach (keys) and while (each) behave differently has nothing to do with keys being an iterator or not. (Well, it does, but not really.) It has to do with the difference in behavior between foreach and while. foreach is defined to operate on a list. If you give it a list, then you're good. If, however, you give it a function or keyword, then it has to call that function/keyword and construct a temporary list with the return value(s). Incidentally, this is why the following doesn't DWIM:
      my @x = 1 .. 5; sub get_x { @x } foreach my $v ( get_x() ) { $v *= 2; } print "@x\n";

      while, however, re-evaluates its condition every time. This is why the while-loop goes infinite, but the foreach-loop doesn't.

      while (my ($k, $v) = each %hash) { $hash{ $k . 'a' } = 1; } foreach my $k (keys %hash) { $hash{ $k . 'a' } = 1; }

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      Got benchmarks to back that up?
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw 'cmpthese'; our %hash = map {$_, $_} 1 .. 100_000; cmpthese (-10, { keys => '$a = 0; for (keys %hash) {$a += $_}', while => '$b = 0; while ($_ = each %hash) {$b += $_}', }); __END__ Rate keys while keys 5.23/s -- -26% while 7.05/s 35% --
      Now, that's for a simple hash. If the hash is tied to a huge DBM file, the results would be far more dramatic.
      Why does perl not implement an automatic iterator when the parser notices a simple sort-free for (keys %foo)? That's such a common idiom I would be amazed it wasn't getting special attention.
      Because it isn't common idiom, and changing it to an iterator changes the results. Perl has an iterator, and it's called each. You can't change keys to an iterator:
      for my $k1 (keys %hash) { for my $k2 (keys %hash) { } } for my $k (keys %hash) { %hash = (); .... }
      Changing either of the "simple sort-free for (keys %foo)" to an iterator will break the code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://438236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-28 16:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found