Problems? Is your data what you think it is? PerlMonks

### Re^4: Fast, Efficient Union and Intersection on arrays

by TGI (Parson)
 on Nov 24, 2008 at 19:35 UTC ( #725673=note: print w/replies, xml ) Need Help??

Good catch on the bum algorithm! I was able to fix the error by adding some parentheses. However, I still get a good advantage for the two grep method.

```sub two_greps {
my %hash;
@hash{@a} = (1) x @a;

my @union        =  (@a, grep { !\$hash{\$_} } @b);
my @intersection =  (    grep {  \$hash{\$_} } @b);
}
__DATA__
Range (1,3) vs (3,9)
Rate  one_cfor   one_for two_greps
one_cfor  55648/s        --      -18%      -23%
one_for   68027/s       22%        --       -5%
two_greps 71891/s       29%        6%        --
Range (1,5) vs (3,8)
Rate  one_cfor   one_for two_greps
one_cfor  50378/s        --      -13%      -15%
one_for   58140/s       15%        --       -2%
two_greps 59277/s       18%        2%        --

Update: the reason for the error is that = has a higher precedence than ,. Therefore the bad version was like doing: (my @union = @a), grep { !\$hash{\$_} } @b;. The grep was executed, but the results were discarded.

A more serious error with the greps is that we test the value in the hash, not its existence. So 0 would never be in the intersection of a set, and always in the union. Two greps still wins by a small amount after that is fixed, too.

```sub two_greps {
my %hash;
@hash{@a} = (1) x @a;

my @union        =  (@a, grep { ! exists \$hash{\$_} } @b);
my @intersection =  (    grep {   exists \$hash{\$_} } @b);

}

Range (1,3) vs (0,9)
Rate  one_cfor   one_for two_greps
one_cfor  50787/s        --      -20%      -25%
one_for   63371/s       25%        --       -7%
two_greps 68027/s       34%        7%        --
Range (1,5) vs (3,8)
Rate  one_cfor   one_for two_greps
one_cfor  50813/s        --      -12%      -18%
one_for   57670/s       13%        --       -7%
two_greps 62150/s       22%        8%        --

TGI says moo

Create A New User
Node Status?
node history
Node Type: note [id://725673]
help
Chatterbox?
 [choroba]: Woohoo! Fixed a test that hasn't run for 3 years. [marinersk]: Corion Yes, sometimes whitespace in column headers is acceptable, but I still consider it be less than desireable if that query might get revectored for an ETL-esque process... [marinersk]: choroba++ [choroba]: it's a long running test, so it's normally skipped unless an env var is set [choroba]: nobody has been bothered to set the variable in the last 3 years [marinersk]: sub newtest{my \$expected_result = &target('foo'); my \$actual_result = &target('foo'); if (\$actual_result eq \$expected_result) { &tdd_success(); } else { &tdd_fail(); } } # Test works after three years! [choroba]: or nobody bothered... [choroba]: The problem was bigger, as the test tried to call a method that didn't exist anymore [marinersk]: :: ducking :: [choroba]: because, someone renamed the method, but didn't notice it was used in the test, as the test was skipped

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (11)
As of 2017-05-25 15:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favorite model of computation is ...

Results (187 votes). Check out past polls.