Fast Way to Combine Two Hashes

arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Fast Way to Combine Two Hashes by caedes (Pilgrim) on Jun 18, 2002 at 22:10 UTC
`my %total = (%hash1, %hash2);` [download] I was wondering about this myself recently, and I'm amazed that this works. :-)	[reply] [d/l]
Re: Fast Way to Combine Two Hashes by Zaxo (Archbishop) on Jun 18, 2002 at 22:11 UTC
This: `%set2 = ( %set2, %set1 );` [download] does it in one swell foop. Check perlfaq for other set operations. After Compline, Zaxo	[reply] [d/l]
(tye)Re: Fast Way to Combine Two Hashes by tye (Sage) on Jun 18, 2002 at 22:23 UTC
I suspect that the fastest way is to "add" %hash2 to %hash1 via: `@hash1{keys %hash2}= values %hash2;` but this has come up several times before and I'm pretty sure someone has run benchmarks to meet all of your premature nano-optimization needs. (: Of course, this requires you to define your interface to allow the addition of one "set" to another without having to make a copy. I consider this a good thing. :) - tye (but my friends call me "Tye")	[reply] [d/l]
Re: Fast Way to Combine Two Hashes by shotgunefx (Parson) on Jun 18, 2002 at 22:26 UTC
Regarding `%set2 = ( %set2, %set1 );`, This unrolls the hashes into a temporary list which is fine unless your hashes are big. If they're really big, you may want to use `while ( ($k,$v) = each %set2 ){ $set1{$k} = undef ; # or perhaps more useful, ++ } # or to merge two and leave them both intact. my %merged = () foreach $hashref ( \%set1, \%set2 ) { while (($k, $v) = each %$hashref) { $merged{$k} = $v; } }` [download] -Lee "To be civilized is to deny one's nature."	[reply] [d/l] [select]
Re: Fast Way to Combine Two Hashes by tadman (Prior) on Jun 18, 2002 at 22:14 UTC
You might want to compare how these Benchmark: `@foo{keys %bar} = values %bar; %foo = (%foo, %bar); foreach (keys %bar) { $foo{$_} = $bar{$_} }` [download] You can speculate on what might be the fastest, but Benchmark can be one way to validate it. No benchmark is perfect, though, but it can at least give you an idea.	[reply] [d/l]
Re: Re: Fast Way to Combine Two Hashes by thelenm (Vicar) on Jun 18, 2002 at 22:22 UTC
For this particular test I ran, hash assignment and slices seemed to be much faster than creating the union one item at a time: `use strict; use warnings; use Benchmark 'cmpthese'; my (%set_a, %set_b); @set_a{1..100} = undef; @set_b{50..150} = undef; sub slices { my %set; @set{keys %set_a, keys %set_b} = undef; } sub items { my %set; $set{$_} = undef for keys %set_a, keys %set_b; } sub hasheq { my %set; %set = (%set_a, %set_b); } cmpthese(10000, { slices => \&slices, items => \&items, hasheq => \&hasheq, });` [download] gives these results on my machine: `Benchmark: timing 10000 iterations of hasheq, items, slices... hasheq: 3 wallclock secs ( 2.97 usr + 0.00 sys = 2.97 CPU) @ 33 +67.00/s (n=10000) items: 4 wallclock secs ( 4.18 usr + 0.00 sys = 4.18 CPU) @ 23 +92.34/s (n=10000) slices: 3 wallclock secs ( 2.96 usr + 0.00 sys = 2.96 CPU) @ 33 +78.38/s (n=10000) Rate items hasheq slices items 2392/s -- -29% -29% hasheq 3367/s 41% -- -0% slices 3378/s 41% 0% --` [download] -- Mike `-- just,my${.02}`	[reply] [d/l] [select]
Re: Fast Way to Combine Two Hashes by mirod (Canon) on Jun 18, 2002 at 22:13 UTC
I don't know if it is faster but at least it is shorter: `@set2{keys %set1}=undef;` You can test it with: `#!/usr/bin/perl -w use strict; my %set1=( k1 => 1, k2 => 2, k4 => 2); my %set2=( k1 => 1, k3 => 2, k4 => 2); @set2{keys %set1}=undef; while( my( $k, $v)= each %set2) { $v='undef' unless( defined $v); print "$k => $v\n"; }` [download]	[reply] [d/l] [select]
Re: Re: Fast Way to Combine Two Hashes by runrig (Abbot) on Jun 19, 2002 at 00:01 UTC
or alternatively (and I suspect the fastest way): `undef @hash2{keys %hash1};` [download]	[reply] [d/l]
Re: Re: Re: Fast Way to Combine Two Hashes by mirod (Canon) on Jun 19, 2002 at 13:05 UTC
Actually not. If you look at the original question, all values for the keys of `%set2` should be set to `undef`. `undef @set2{keys %set1};` leaves the original value in the hash. `@set2{keys %set1}=undef;` performs exactly as the original `foreach $key (keys %set1) { $set2{$key} = undef; }`. In fact `@set2{keys %set1}= (undef) x scalar keys %set1;` is more correct (thanks to Zaxo for providing it), but might be slower.	[reply] [d/l] [select]
Re: Re: Re: Re: Fast Way to Combine Two Hashes by runrig (Abbot) on Jun 19, 2002 at 16:57 UTC
Re: Fast Way to Combine Two Hashes by Rhandom (Curate) on Nov 11, 2004 at 18:16 UTC
Long time passing - but here are some better benchmarks that are rather interesting... use Benchmark qw(timethese cmpthese countit timestr); my $hash1 = { one => 'One', two => 'One', three => 'One', four => 'One', five => 'One', }; my $hash2 = { one => 'One', four => 'One', five => 'One', }; my $hash3 = {}; for (0..1000) { $hash3->{"foo_$_"} = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aa" } cmpthese (-2,{ forloop => sub { my $hash = {}; foreach my $foo ($hash1, $hash2, $hash3) { foreach my $key (keys %$foo) { $hash->{$key} = $foo->{$key} if ! exists $hash->{$key}; } } }, forloop2 => sub { my $hash = {}; foreach my $foo ($hash3, $hash2, $hash1) { foreach my $key (keys %$foo) { $hash->{$key} = $foo->{$key}; } } }, whileloop => sub { my $hash = {}; foreach my $foo ($hash1, $hash2, $hash3) { while (my($key, $val) = each %$foo) { $hash->{$key} = $val if ! exists $hash->{$key}; } } }, whileloop2 => sub { my $hash = {}; foreach my $foo ($hash3, $hash2, $hash1) { while (my($key, $val) = each %$foo) { $hash->{$key} = $val; } } }, slice => sub { my $hash = {}; foreach my $foo ($hash3, $hash2, $hash1) { @$hash{keys %$foo} = values %$foo; } }, merge => sub { my $hash = {%$hash3, %$hash2, %$hash1}; }, },'auto'); [download] Which produce the following (for our box anyway)... Benchmark: running forloop, forloop2, merge, slice, whileloop, whilelo +op2, each for at least 2 CPU seconds... forloop: 3 wallclock secs ( 2.22 usr + 0.00 sys = 2.22 CPU) @ 10 +8.11/s (n=240) forloop2: 2 wallclock secs ( 2.09 usr + 0.01 sys = 2.10 CPU) @ 12 +2.38/s (n=257) merge: 2 wallclock secs ( 2.10 usr + 0.00 sys = 2.10 CPU) @ 17 +6.19/s (n=370) slice: 2 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU) @ 14 +0.19/s (n=300) whileloop: 2 wallclock secs ( 2.16 usr + 0.00 sys = 2.16 CPU) @ 87 +.04/s (n=188) whileloop2: 2 wallclock secs ( 2.07 usr + 0.01 sys = 2.08 CPU) @ 11 +0.10/s (n=229) Rate whileloop forloop whileloop2 forloop2 slice + merge whileloop 87.0/s -- -19% -21% -29% -38% + -51% forloop 108/s 24% -- -2% -12% -23% + -39% whileloop2 110/s 26% 2% -- -10% -21% + -38% forloop2 122/s 41% 13% 11% -- -13% + -31% slice 140/s 61% 30% 27% 15% -- + -20% merge 176/s 102% 63% 60% 44% 26% + -- [download] As for memory - with a 0..1 on the hash3, we used 4 k less memory for the merge. On a 0..10000 on the hash3 we used 300k more (1.5M vs 1.2M) for the merge. Merge is definitely the fastest way - without "too much" memory overhead. my @a=qw(random brilliant braindead); print $a[rand(@a)];	[reply] [d/l] [select]


We don't bite newbies here... much
	PerlMonks