Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^3: Fast, Efficient Union and Intersection on arrays

by wanradt (Scribe)
on Nov 22, 2008 at 10:44 UTC ( #725318=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Fast, Efficient Union and Intersection on arrays
in thread Fast, Efficient Union and Intersection on arrays

But there is flaw in grep-algorithm: it does not union correctly, if first array is shorter than second. If you consider this, then the for-algorithm will about 6-7% faster.

Nġnda, WK


Comment on Re^3: Fast, Efficient Union and Intersection on arrays
Re^4: Fast, Efficient Union and Intersection on arrays
by TGI (Vicar) on Nov 24, 2008 at 19:35 UTC

    Good catch on the bum algorithm! I was able to fix the error by adding some parentheses. However, I still get a good advantage for the two grep method.

    sub two_greps { my %hash; @hash{@a} = (1) x @a; my @union = (@a, grep { !$hash{$_} } @b); my @intersection = ( grep { $hash{$_} } @b); } __DATA__ Range (1,3) vs (3,9) Rate one_cfor one_for two_greps one_cfor 55648/s -- -18% -23% one_for 68027/s 22% -- -5% two_greps 71891/s 29% 6% -- Range (1,5) vs (3,8) Rate one_cfor one_for two_greps one_cfor 50378/s -- -13% -15% one_for 58140/s 15% -- -2% two_greps 59277/s 18% 2% --

    Update: the reason for the error is that = has a higher precedence than ,. Therefore the bad version was like doing: (my @union = @a), grep { !$hash{$_} } @b;. The grep was executed, but the results were discarded.

    A more serious error with the greps is that we test the value in the hash, not its existence. So 0 would never be in the intersection of a set, and always in the union. Two greps still wins by a small amount after that is fixed, too.

    sub two_greps { my %hash; @hash{@a} = (1) x @a; my @union = (@a, grep { ! exists $hash{$_} } @b); my @intersection = ( grep { exists $hash{$_} } @b); } Range (1,3) vs (0,9) Rate one_cfor one_for two_greps one_cfor 50787/s -- -20% -25% one_for 63371/s 25% -- -7% two_greps 68027/s 34% 7% -- Range (1,5) vs (3,8) Rate one_cfor one_for two_greps one_cfor 50813/s -- -12% -18% one_for 57670/s 13% -- -7% two_greps 62150/s 22% 8% --


    TGI says moo

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://725318]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2014-09-21 23:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (176 votes), past polls