Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Getting the intersection of n collections.

by jbrugger (Parson)
on Oct 11, 2006 at 11:47 UTC ( #577590=perlquestion: print w/ replies, xml ) Need Help??
jbrugger has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.
I've got the following question:
I've got n arrays, that all contain data.
I want to return a new array, containing only the data, that exist in all n arrays mentioned above.
(i hope i'm clear in what i want)
Right now, my following solution works, but i wonder if there is a more clever and fast way to do this:
# If my datasets contain # @1 = 1,2,3,4,5 # @2 = 2,3,6,7,8,8,8,9,10 # @3 = 5,6,2,3 # # the result should be: 2,3 # my %intersection; my @result; foreach my $dataArray(@AllArraysIHave) { my %lookuptable; foreach(@{$dataArray}) { if (!$lookuptable{$_}) { $lookuptable{$_} = 1; $intersection{$_}++; } } } my $amnt = scalar(@AllArraysIHave); foreach my $key (keys %intersection) { if ($intersection{$key} == $amnt) { push @result, $key; } } return \@result;
*update*
Thanks for all the fast replies
lima1, indeed, it has to work for duplicate items as well. (changed the example a little.)

"We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.

Comment on Getting the intersection of n collections.
Download Code
Re: Getting the intersection of n collections.
by dtr (Scribe) on Oct 11, 2006 at 11:59 UTC

    Assuming that each number can only appear once in each list, then the following should work:-

    my %counts = (); foreach my $val (@d1, @d2, @d3) { $counts{$val}++; } return grep { $counts{$_} == 3 } sort keys %counts;
Re: Getting the intersection of n collections.
by japhy (Canon) on Oct 11, 2006 at 12:08 UTC
    I'd probably start by setting one array as the resulting intersection list, and comparing each array to that one.
    my @arrays = ([...], [...], [...]); my %tmp_int; # assume intersection is entirety of first array # and set each "frequency" to the number of arrays @tmp_int{@{ $arrays[0] }) = (@arrays+0) x @{ $arrays[0] }; # then check the other arrays against it for my $set (@arrays[1..$#arrays]) { my %seen; $tmp_int{$_}-- for grep { !$seen{$_} } @$set; } # the ACTUAL intersection is all the # keys in the hash whose value is 1 my @intersection = grep { $tmp_int{$_} == 1 } keys %tmp_int;

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Getting the intersection of n collections.
by lima1 (Curate) on Oct 11, 2006 at 12:10 UTC
    If the array may NOT have the same value multiple times, then this (intersect is taken and slightly modified from the perl cookbook-one must be careful these days ;) ) should work:
    my @a = ( [ 1,2,3,4,5] , [2,3,6,7,8,9,10], [5,6,2,3], ); my @b = @{$a[0]}; sub intersect{ my (%union, %isect); foreach my $e (@_) { $union{$e}++ && $isect{$e}++ } return keys %isect } foreach my $a_ref (@a) { @b = intersect(@$a_ref, @b); }
    UPDATE: only works for unduplicated items!
Re: Getting the intersection of n collections.
by basje (Beadle) on Oct 11, 2006 at 12:30 UTC
    this also works with duplicates
    my $sets = [ [1,2,3,3,3,3,4,5,6,7,8,9,10], [1,1,1,2,6,7], [3,4,5,5,5,5,6,6,6,6,7,8,9], ]; my %hash = map { $_, 1 } @{$sets->[0]}; for(my $i=1; $i<scalar(@{$sets});$i++) { %hash = map { $_, 1 } grep {%hash->{$_}} @{$sets->[$i]}; } return (keys(%hash));
      I think you may have a typo in your solution. The grep {%hash->{$_}} should perhaps be grep {$hash{$_}}.

      Nice solution, though.

      Cheers,

      JohnGG

Re: Getting the intersection of n collections.
by jdporter (Canon) on Oct 11, 2006 at 14:17 UTC

    I point out that solutions which perform set unions pair-wise, iteratively on the set of lists do not scale well. Consider the worst-case scenario, where all of the lists are identical, and there's a million of them.

    A better approach is the counting one, like that suggested by dtr, but uniqifying each list by converting it to its corresponding set hash, as basje did.

    my %counts; for my $ar ( @arrays ) { my %set; @set{ @$ar } = (); # this is faster. $counts{$_}++ for keys %set; } my @union = grep { $counts{$_} == @arrays } keys %counts;
    We're building the house of the future together.
Re: Getting the intersection of n collections.
by Anonymous Monk on Oct 12, 2006 at 07:43 UTC
    Untested (should work with duplicates):
    my $mask = ""; my %seen; for(my $i=0; $i<@AllArraysIHave; ++$i) { vec($mask, $i, 1) = 1; foreach my $item (@{$AllArraysIHave[$i]}) { $seen{$item} = "" unless exists($seen{$item}); vec($seen{$item}, $i, 1) = 1; } } my @all_there = grep $seen{$_} eq $mask, keys(%seen);
Re: Getting the intersection of n collections.
by GrandFather (Cardinal) on Oct 12, 2006 at 07:59 UTC

    List::Compare is the tool generally used for this sort of thing:

    use strict; use warnings; use List::Compare; my @AllArraysIHave = ([1,2,3,4,5], [2,3,6,7,8,8,8,9,10], [5,6,2,3]); my $lc = List::Compare->new('-u', @AllArraysIHave); my @result = $lc->get_intersection; print "@result";

    Prints:

    3 2

    Update: speed is disapointing though (listCompareF use the functional interface):

    manual: 3 2 listCompareF: 2 3 listCompare: 3 2 Rate listCompare listCompareF manual listCompare 1128/s -- -70% -93% listCompareF 3822/s 239% -- -78% manual 17051/s 1411% 346% --

    DWIM is Perl's answer to Gödel
      That's because List::Compare calculates lots of things you aren't interested in. It's doing them from new.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://577590]
Approved by Hue-Bond
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2014-12-27 00:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls