Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Getting the intersection of n collections.

by jbrugger (Parson)
on Oct 11, 2006 at 11:47 UTC ( #577590=perlquestion: print w/ replies, xml ) Need Help??
jbrugger has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.
I've got the following question:
I've got n arrays, that all contain data.
I want to return a new array, containing only the data, that exist in all n arrays mentioned above.
(i hope i'm clear in what i want)
Right now, my following solution works, but i wonder if there is a more clever and fast way to do this:
# If my datasets contain # @1 = 1,2,3,4,5 # @2 = 2,3,6,7,8,8,8,9,10 # @3 = 5,6,2,3 # # the result should be: 2,3 # my %intersection; my @result; foreach my $dataArray(@AllArraysIHave) { my %lookuptable; foreach(@{$dataArray}) { if (!$lookuptable{$_}) { $lookuptable{$_} = 1; $intersection{$_}++; } } } my $amnt = scalar(@AllArraysIHave); foreach my $key (keys %intersection) { if ($intersection{$key} == $amnt) { push @result, $key; } } return \@result;
*update*
Thanks for all the fast replies
lima1, indeed, it has to work for duplicate items as well. (changed the example a little.)

"We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.

Comment on Getting the intersection of n collections.
Download Code
Re: Getting the intersection of n collections.
by dtr (Scribe) on Oct 11, 2006 at 11:59 UTC

    Assuming that each number can only appear once in each list, then the following should work:-

    my %counts = (); foreach my $val (@d1, @d2, @d3) { $counts{$val}++; } return grep { $counts{$_} == 3 } sort keys %counts;
Re: Getting the intersection of n collections.
by japhy (Canon) on Oct 11, 2006 at 12:08 UTC
    I'd probably start by setting one array as the resulting intersection list, and comparing each array to that one.
    my @arrays = ([...], [...], [...]); my %tmp_int; # assume intersection is entirety of first array # and set each "frequency" to the number of arrays @tmp_int{@{ $arrays[0] }) = (@arrays+0) x @{ $arrays[0] }; # then check the other arrays against it for my $set (@arrays[1..$#arrays]) { my %seen; $tmp_int{$_}-- for grep { !$seen{$_} } @$set; } # the ACTUAL intersection is all the # keys in the hash whose value is 1 my @intersection = grep { $tmp_int{$_} == 1 } keys %tmp_int;

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Getting the intersection of n collections.
by lima1 (Curate) on Oct 11, 2006 at 12:10 UTC
    If the array may NOT have the same value multiple times, then this (intersect is taken and slightly modified from the perl cookbook-one must be careful these days ;) ) should work:
    my @a = ( [ 1,2,3,4,5] , [2,3,6,7,8,9,10], [5,6,2,3], ); my @b = @{$a[0]}; sub intersect{ my (%union, %isect); foreach my $e (@_) { $union{$e}++ && $isect{$e}++ } return keys %isect } foreach my $a_ref (@a) { @b = intersect(@$a_ref, @b); }
    UPDATE: only works for unduplicated items!
Re: Getting the intersection of n collections.
by basje (Beadle) on Oct 11, 2006 at 12:30 UTC
    this also works with duplicates
    my $sets = [ [1,2,3,3,3,3,4,5,6,7,8,9,10], [1,1,1,2,6,7], [3,4,5,5,5,5,6,6,6,6,7,8,9], ]; my %hash = map { $_, 1 } @{$sets->[0]}; for(my $i=1; $i<scalar(@{$sets});$i++) { %hash = map { $_, 1 } grep {%hash->{$_}} @{$sets->[$i]}; } return (keys(%hash));
      I think you may have a typo in your solution. The grep {%hash->{$_}} should perhaps be grep {$hash{$_}}.

      Nice solution, though.

      Cheers,

      JohnGG

Re: Getting the intersection of n collections.
by jdporter (Canon) on Oct 11, 2006 at 14:17 UTC

    I point out that solutions which perform set unions pair-wise, iteratively on the set of lists do not scale well. Consider the worst-case scenario, where all of the lists are identical, and there's a million of them.

    A better approach is the counting one, like that suggested by dtr, but uniqifying each list by converting it to its corresponding set hash, as basje did.

    my %counts; for my $ar ( @arrays ) { my %set; @set{ @$ar } = (); # this is faster. $counts{$_}++ for keys %set; } my @union = grep { $counts{$_} == @arrays } keys %counts;
    We're building the house of the future together.
Re: Getting the intersection of n collections.
by Anonymous Monk on Oct 12, 2006 at 07:43 UTC
    Untested (should work with duplicates):
    my $mask = ""; my %seen; for(my $i=0; $i<@AllArraysIHave; ++$i) { vec($mask, $i, 1) = 1; foreach my $item (@{$AllArraysIHave[$i]}) { $seen{$item} = "" unless exists($seen{$item}); vec($seen{$item}, $i, 1) = 1; } } my @all_there = grep $seen{$_} eq $mask, keys(%seen);
Re: Getting the intersection of n collections.
by GrandFather (Sage) on Oct 12, 2006 at 07:59 UTC

    List::Compare is the tool generally used for this sort of thing:

    use strict; use warnings; use List::Compare; my @AllArraysIHave = ([1,2,3,4,5], [2,3,6,7,8,8,8,9,10], [5,6,2,3]); my $lc = List::Compare->new('-u', @AllArraysIHave); my @result = $lc->get_intersection; print "@result";

    Prints:

    3 2

    Update: speed is disapointing though (listCompareF use the functional interface):

    manual: 3 2 listCompareF: 2 3 listCompare: 3 2 Rate listCompare listCompareF manual listCompare 1128/s -- -70% -93% listCompareF 3822/s 239% -- -78% manual 17051/s 1411% 346% --

    DWIM is Perl's answer to Gödel
      That's because List::Compare calculates lots of things you aren't interested in. It's doing them from new.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://577590]
Approved by Hue-Bond
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2015-07-06 00:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls