baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:
Dear all
I am trying to compute the number of intersecting array elements for multiple arrays (essentially elements belonging to separate parts of venn diagrams).
Initially and naively I thought this should be an easy thing to do and for 2 arrays it is:
1. Make a hash table for one array with keys being its elements
2. Loop through another array and count if an element exists in the previously created hash table
3. Subtract the number of shared elements from the total number of elements of each array.
But when the number of arrays increases the number of possible intersections between respective arrays increases (for A,B,C arrays we have A/B, A/C, B/C, A/B/C). So the problem I am having is how to scale the above algorithm.
Does anyone know a solution to this problem even if a limited number of sets can only be included (I think this is an NP problem thus any algorithm that can produce a solution for more than 7 arrays would be acceptable)
Thank you
Re: Set intersection problem
by hv (Prior) on May 23, 2023 at 12:45 UTC

I think a possible approach to this would look roughly like: create an individual hash lookup for each array, and an additional results hash containing all the elements; then for each element of the results hash, check its presence in each of the individual hashes and build up a representation (such as a bitvector) to reflect that. If you need to group elements by the type of intersection, you can store a result by the bitvector in yet another hash.
my $grouped = intersector([1..5], [3..7], [2, 6]);
print join(' ', @$_), "\n" for values %$grouped;
# output (in some order):
# 2
# 6
# 1
# 5 3 4
# 7
exit 0;
sub intersector {
# accept a list of arrayrefs
my @ar = @_;
my(%all, @single);
# initialize the results hash %all and all the individual hashes in
+@single
for my $i (0 .. $#ar) {
my $a = $ar[$i];
@all{@$a} = (undef) x @$a;
$single[$i] = { map +($_ => undef), @$a };
}
# now for each element, find its signature
for my $el (keys %all) {
my $sig = '';
for my $i (0 .. $#ar) {
vec($sig, $i, 1) = 1 if exists $single[$i]{$el};
}
# store the final signature in the results array if needed
$all{$el} = $sig;
# store the element by its signature if needed
push @{ $grouped{$sig} }, $el;
}
# now analyse further, or return a result
return \%grouped;
}
A space saving can be made by combining the two main loops, removing the need for @single; I'll leave that as an exercise for the reader.  [reply] [d/l] [select] 
Re: Set intersection problem
by Corion (Patriarch) on May 23, 2023 at 08:12 UTC

I think you can reduce the problem to the problem of overlapping "axis aligned bounding boxes" ("AABB"). There are many algorithms to improve "collision detection" for these AABBs as the problem occurs often in computer graphics. Depending on whether this is a onetime setup or whether the arrays change over time, it can pay off to use a more complicated structure (quadtree) over brute force.
 [reply] 
Re: Set intersection problem
by salva (Canon) on May 23, 2023 at 14:02 UTC

I am not sure I completely understand your problem, so maybe the following idea doesn't make sense, but, have you consider inverting the approach?
For every element make an array with the names of the sets containing it. Once that is done, you can reverse it again, converting the list of sets into a key and use it to group the elements:
# untested
my %sets = { A => [1,2,3], B => [3, 4], C => [1, 3, 4] }
my %in;
for my $set (keys %sets) {
for my $element (@{$sets{$set}}) {
push @{$in[$element] //= []}, $set;
}
}
my %parts;
for my $element (keys %in) {
my $key = join("/", sort @{$in{$element}});
push @{$parts{$key} //= []}, $element
}
Dump \%parts;
 [reply] [d/l] 
Re: Set intersection problem
by kcott (Archbishop) on May 23, 2023 at 15:36 UTC

G'day baxy77bax,
"any algorithm that can produce a solution for more than 7 arrays would be acceptable"
Major Update:
On reviewing my post, I noticed some of the intersections were missing: ABF, AD, and so on.
There was a major logic flaw in the 'for my $elem (keys %temp) { ... }' loop.
I've completely rewritten that loop, made a couple of minor changes elsewhere, and reposted the output.
The previous code and output has been stricken and is in the spoiler for any who wish to review the changes.
#!/usr/bin/env perl
use strict;
use warnings;
my %data = (
A => [qw{ant bee cat dog eel fly gnu hog}],
B => [qw{ant cat eel gnu}],
C => [qw{bee dog fly hog}],
D => [qw{cat fly ant}],
E => [qw{ant dog gnu}],
F => [qw{gnu bee}],
G => [qw{eel hog}],
H => [qw{zoo}],
I => [qw{}],
);
my %temp;
for my $set (keys %data) {
my @elems = @{$data{$set}};
@elems = ('') unless @elems;
for my $elem (@elems) {
push @{$temp{$elem}}, $set;
}
}
my (%venn, %seen);
for my $elem (sort keys %temp) {
my @sets = @{$temp{$elem}};
my $glob_expr = '{' . join(',', @sets) . '}';
for my $i (1 .. $#sets) {
$glob_expr .= '{,' . join(',', @sets[$i .. $#sets]) . '}';
}
for my $key (glob($glob_expr)) {
next if $key =~ /([^]+)(?=.*?\1)/;
$key =~ y///s;
$key =~ s/(?:^$)//g;
$key = join '', sort split //, $key;
push @{$venn{$key}}, $elem unless $seen{$key}{$elem}++;
}
}
# DEMO: Output
my $fmt = "%7s %s %s\n";
{
no warnings 'qw';
printf $fmt, qw{Sets # Elements};
printf $fmt, qw{  };
}
printf $fmt, $_, 0+@{$venn{$_}}, "@{$venn{$_}}" for sort keys %venn;
Output:
Sets # Elements
  
A 8 ant bee cat dog eel fly gnu hog
AB 4 ant cat eel gnu
ABD 2 ant cat
ABDE 1 ant
ABE 2 ant gnu
ABEF 1 gnu
ABF 1 gnu
ABG 1 eel
AC 4 bee dog fly hog
ACD 1 fly
ACE 1 dog
ACF 1 bee
ACG 1 hog
AD 3 ant cat fly
ADE 1 ant
AE 3 ant dog gnu
AEF 1 gnu
AF 2 bee gnu
AG 2 eel hog
B 4 ant cat eel gnu
BD 2 ant cat
BDE 1 ant
BE 2 ant gnu
BEF 1 gnu
BF 1 gnu
BG 1 eel
C 4 bee dog fly hog
CD 1 fly
CE 1 dog
CF 1 bee
CG 1 hog
D 3 ant cat fly
DE 1 ant
E 3 ant dog gnu
EF 1 gnu
F 2 bee gnu
G 2 eel hog
H 1 zoo
I 1
#!/usr/bin/env perl
use strict;
use warnings;
my %data = (
A => [qw{ant bee cat dog eel fly gnu hog}],
B => [qw{ant cat eel gnu}],
C => [qw{bee dog fly hog}],
D => [qw{cat fly ant}],
E => [qw{ant dog gnu}],
F => [qw{gnu bee}],
G => [qw{eel hog}],
H => [qw{zoo}],
I => [qw{}],
);
my (%venn, %temp);
for my $set (sort keys %data) {
my @elems = @{$data{$set}};
@elems = ('') unless @elems;
for my $elem (@elems) {
push @{$temp{$elem}}, $set;
}
}
for my $elem (keys %temp) {
my @sets = @{$temp{$elem}};
for my $i (0 .. $#sets) {
my @subsets = @sets[$i .. $#sets];
my $key = shift @subsets;
push @{$venn{$key}}, $elem;
for my $set (@subsets) {
$key .= "$set";
push @{$venn{$key}}, $elem;
}
}
}
my $fmt = "%7s %s %s\n";
{
no warnings 'qw';
printf $fmt, qw{Sets # Elements};
printf $fmt, qw{  };
}
for my $sets (sort keys %venn) {
printf $fmt, $sets, 0+@{$venn{$sets}},
join ' ', sort @{$venn{$sets}};
}
Output:
Sets # Elements
  
A 8 ant bee cat dog eel fly gnu hog
AB 4 ant cat eel gnu
ABD 2 ant cat
ABDE 1 ant
ABE 1 gnu
ABEF 1 gnu
ABG 1 eel
AC 4 bee dog fly hog
ACD 1 fly
ACE 1 dog
ACF 1 bee
ACG 1 hog
B 4 ant cat eel gnu
BD 2 ant cat
BDE 1 ant
BE 1 gnu
BEF 1 gnu
BG 1 eel
C 4 bee dog fly hog
CD 1 fly
CE 1 dog
CF 1 bee
CG 1 hog
D 3 ant cat fly
DE 1 ant
E 3 ant dog gnu
EF 1 gnu
F 2 bee gnu
G 2 eel hog
H 1 zoo
I 1
 [reply] [d/l] [select] 
Re: Set intersection problem
by kikuchiyo (Hermit) on May 23, 2023 at 10:03 UTC

 [reply] 

 [reply] 
Re: Set intersection problem
by tybalt89 (Monsignor) on May 23, 2023 at 15:00 UTC

I'm not sure I understand your problem, but my guess goes something like this:
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11152381
use warnings;
my %sets = ( A => [1,2,3], B => [3, 4], C => [1, 3, 4] );
use Data::Dump 'dd'; dd 'original sets', \%sets;
my %intersection;
for my $setname ( keys %sets )
{
$intersection{$_}{$setname}++ for @{ $sets{$setname} };
}
use Data::Dump 'dd'; dd 'intersection', \%intersection;
print "items in more than one set:\n";
for my $setname ( sort keys %intersection )
{
if( 1 < keys %{ $intersection{$setname} } )
{
my $where = join ' and ', map "'$_'",
sort keys %{ $intersection{$setname} };
print " '$setname' is in $where\n";
}
}
Outputs (complete with intermediate calculations):
(
"original sets",
{ A => [1, 2, 3], B => [3, 4], C => [1, 3, 4] },
)
(
"intersection",
{
1 => { A => 1, C => 1 },
2 => { A => 1 },
3 => { A => 1, B => 1, C => 1 },
4 => { B => 1, C => 1 },
},
)
items in more than one set:
'1' is in 'A' and 'C'
'3' is in 'A' and 'B' and 'C'
'4' is in 'B' and 'C'
 [reply] [d/l] [select] 
Re: Set intersection problem
by tybalt89 (Monsignor) on May 23, 2023 at 15:30 UTC

If you don't care about which sets an item belongs to, and there are no duplicates within any set:
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11152381
use warnings;
my %sets = ( A => [1,2,3], B => [3, 4], C => [1, 3, 4] );
my %seen;
my @items = sort grep $seen{$_}++ == 1, map @$_, values %sets;
print "Items in more than one set: @items\n";
Outputs:
Items in more than one set: 1 3 4
 [reply] [d/l] [select] 
Re: Set intersection problem
by Fletch (Bishop) on May 23, 2023 at 11:21 UTC

I have a possible solution in mind but so as to not muddy the waters with something irrelevant could you give a couple samples what your arrays look like and what you’re considering an intersection? E.g. if A were [3,4,5] and B [23,5,17,42] is the intersection you want [5] ? I think I follow but moar concrete data would help see if I’m on a track in the same neighborhood. . .
Edit: The post below by hv is along the lines I was thinking of actually so maybe not completely irrelevant . . .
The cake is a lie.
The cake is a lie.
The cake is a lie.
 [reply] [d/l] [select] 
Re: Set intersection problem
by karlgoethebier (Abbot) on May 23, 2023 at 18:44 UTC

 [reply] 
Re: Set intersection problem
by Anonymous Monk on May 23, 2023 at 09:20 UTC

It's 2**n at worst and inductive:
You get the solution S(i) for i sets by combining set s_i with all members of S(i1).
For i=2..n and S(1)={s_1}
This can be further optimized  no point in trying (A/B)/C if A/C is empty  but
2**7 is only 128.
To speed up intersection you can use hash slices in Perl or specialized CPAN modules.
 [reply] 

