Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Fast Way to Combine Two Hashes

by arunhorne (Pilgrim)
on Jun 18, 2002 at 22:04 UTC ( [id://175510]=perlquestion: print w/replies, xml ) Need Help??

arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

I am writing an object to act as a set (as in the Computer Science description). It main requirement is to be as slim and fast as possible. I have used a hash as the backing data structure because this allows me to enforce the 'no duplicates' property of sets.

So to add an element to the hash I just map it to undef as such:

$set{$item} = undef;

One thing I want to do is combine two sets. Obviously this could be a real performance bottleneck and due to its intended use it is essential I combine the two backing hashes as fast as possible. Obviously I could use:

foreach $key (keys %set1) { $set2{$key} = undef; }

Is this the quickest way I can manage... or is there potentially a faster technique that anyone knows?

Thanks...

____________
Arun

Replies are listed 'Best First'.
Re: Fast Way to Combine Two Hashes
by caedes (Pilgrim) on Jun 18, 2002 at 22:10 UTC
    my %total = (%hash1, %hash2);
    I was wondering about this myself recently, and I'm amazed that this works. :-)
Re: Fast Way to Combine Two Hashes
by Zaxo (Archbishop) on Jun 18, 2002 at 22:11 UTC

    This:

    %set2 = ( %set2, %set1 );
    does it in one swell foop. Check perlfaq for other set operations.

    After Compline,
    Zaxo

(tye)Re: Fast Way to Combine Two Hashes
by tye (Sage) on Jun 18, 2002 at 22:23 UTC

    I suspect that the fastest way is to "add" %hash2 to %hash1 via:     @hash1{keys %hash2}= values %hash2; but this has come up several times before and I'm pretty sure someone has run benchmarks to meet all of your premature nano-optimization needs. (:

    Of course, this requires you to define your interface to allow the addition of one "set" to another without having to make a copy. I consider this a good thing. :)

            - tye (but my friends call me "Tye")
Re: Fast Way to Combine Two Hashes
by shotgunefx (Parson) on Jun 18, 2002 at 22:26 UTC
    Regarding %set2 = ( %set2, %set1 );, This unrolls the hashes into a temporary list which is fine unless your hashes are big. If they're really big, you may want to use
    while ( ($k,$v) = each %set2 ){ $set1{$k} = undef ; # or perhaps more useful, ++ } # or to merge two and leave them both intact. my %merged = () foreach $hashref ( \%set1, \%set2 ) { while (($k, $v) = each %$hashref) { $merged{$k} = $v; } }


    -Lee

    "To be civilized is to deny one's nature."
Re: Fast Way to Combine Two Hashes
by tadman (Prior) on Jun 18, 2002 at 22:14 UTC
    You might want to compare how these Benchmark:
    @foo{keys %bar} = values %bar; %foo = (%foo, %bar); foreach (keys %bar) { $foo{$_} = $bar{$_} }
    You can speculate on what might be the fastest, but Benchmark can be one way to validate it. No benchmark is perfect, though, but it can at least give you an idea.
      For this particular test I ran, hash assignment and slices seemed to be much faster than creating the union one item at a time:
      use strict; use warnings; use Benchmark 'cmpthese'; my (%set_a, %set_b); @set_a{1..100} = undef; @set_b{50..150} = undef; sub slices { my %set; @set{keys %set_a, keys %set_b} = undef; } sub items { my %set; $set{$_} = undef for keys %set_a, keys %set_b; } sub hasheq { my %set; %set = (%set_a, %set_b); } cmpthese(10000, { slices => \&slices, items => \&items, hasheq => \&hasheq, });
      gives these results on my machine:
      Benchmark: timing 10000 iterations of hasheq, items, slices... hasheq: 3 wallclock secs ( 2.97 usr + 0.00 sys = 2.97 CPU) @ 33 +67.00/s (n=10000) items: 4 wallclock secs ( 4.18 usr + 0.00 sys = 4.18 CPU) @ 23 +92.34/s (n=10000) slices: 3 wallclock secs ( 2.96 usr + 0.00 sys = 2.96 CPU) @ 33 +78.38/s (n=10000) Rate items hasheq slices items 2392/s -- -29% -29% hasheq 3367/s 41% -- -0% slices 3378/s 41% 0% --

      -- Mike

      --
      just,my${.02}

Re: Fast Way to Combine Two Hashes
by mirod (Canon) on Jun 18, 2002 at 22:13 UTC

    I don't know if it is faster but at least it is shorter:

    @set2{keys %set1}=undef;

    You can test it with:

    #!/usr/bin/perl -w use strict; my %set1=( k1 => 1, k2 => 2, k4 => 2); my %set2=( k1 => 1, k3 => 2, k4 => 2); @set2{keys %set1}=undef; while( my( $k, $v)= each %set2) { $v='undef' unless( defined $v); print "$k => $v\n"; }
      or alternatively (and I suspect the fastest way):
      undef @hash2{keys %hash1};

        Actually not. If you look at the original question, all values for the keys of %set2 should be set to undef. undef @set2{keys %set1}; leaves the original value in the hash. @set2{keys %set1}=undef; performs exactly as the original foreach $key (keys %set1) { $set2{$key} = undef; }.

        In fact @set2{keys %set1}= (undef) x scalar keys %set1; is more correct (thanks to Zaxo for providing it), but might be slower.

Re: Fast Way to Combine Two Hashes
by Rhandom (Curate) on Nov 11, 2004 at 18:16 UTC
    Long time passing - but here are some better benchmarks that are rather interesting...
    use Benchmark qw(timethese cmpthese countit timestr); my $hash1 = { one => 'One', two => 'One', three => 'One', four => 'One', five => 'One', }; my $hash2 = { one => 'One', four => 'One', five => 'One', }; my $hash3 = {}; for (0..1000) { $hash3->{"foo_$_"} = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aa" } cmpthese (-2,{ forloop => sub { my $hash = {}; foreach my $foo ($hash1, $hash2, $hash3) { foreach my $key (keys %$foo) { $hash->{$key} = $foo->{$key} if ! exists $hash->{$key}; } } }, forloop2 => sub { my $hash = {}; foreach my $foo ($hash3, $hash2, $hash1) { foreach my $key (keys %$foo) { $hash->{$key} = $foo->{$key}; } } }, whileloop => sub { my $hash = {}; foreach my $foo ($hash1, $hash2, $hash3) { while (my($key, $val) = each %$foo) { $hash->{$key} = $val if ! exists $hash->{$key}; } } }, whileloop2 => sub { my $hash = {}; foreach my $foo ($hash3, $hash2, $hash1) { while (my($key, $val) = each %$foo) { $hash->{$key} = $val; } } }, slice => sub { my $hash = {}; foreach my $foo ($hash3, $hash2, $hash1) { @$hash{keys %$foo} = values %$foo; } }, merge => sub { my $hash = {%$hash3, %$hash2, %$hash1}; }, },'auto');
    Which produce the following (for our box anyway)...
    Benchmark: running forloop, forloop2, merge, slice, whileloop, whilelo +op2, each for at least 2 CPU seconds... forloop: 3 wallclock secs ( 2.22 usr + 0.00 sys = 2.22 CPU) @ 10 +8.11/s (n=240) forloop2: 2 wallclock secs ( 2.09 usr + 0.01 sys = 2.10 CPU) @ 12 +2.38/s (n=257) merge: 2 wallclock secs ( 2.10 usr + 0.00 sys = 2.10 CPU) @ 17 +6.19/s (n=370) slice: 2 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU) @ 14 +0.19/s (n=300) whileloop: 2 wallclock secs ( 2.16 usr + 0.00 sys = 2.16 CPU) @ 87 +.04/s (n=188) whileloop2: 2 wallclock secs ( 2.07 usr + 0.01 sys = 2.08 CPU) @ 11 +0.10/s (n=229) Rate whileloop forloop whileloop2 forloop2 slice + merge whileloop 87.0/s -- -19% -21% -29% -38% + -51% forloop 108/s 24% -- -2% -12% -23% + -39% whileloop2 110/s 26% 2% -- -10% -21% + -38% forloop2 122/s 41% 13% 11% -- -13% + -31% slice 140/s 61% 30% 27% 15% -- + -20% merge 176/s 102% 63% 60% 44% 26% + --
    As for memory - with a 0..1 on the hash3, we used 4 k less memory for the merge. On a 0..10000 on the hash3 we used 300k more (1.5M vs 1.2M) for the merge. Merge is definitely the fastest way - without "too much" memory overhead.

    my @a=qw(random brilliant braindead); print $a[rand(@a)];

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://175510]
Approved by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-20 03:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found