comment on

Shouldn't you have MIN instead of MAX? You are, after all, interested in the things with the most in common.

Jumping on tall_man's idea to use Bit::Vector::Overload (and shamelessly stealing your data generator), here's a new solution. It's reasonably quick (about 15x faster than yours on my slow machine, though a chunk of the difference is printing time) to generate all the tuples and spit them out, nicely ordered by cardinality.

There is much less output, because only tuples that actually represent the intersection of some pair of elements are included. When such a tuple is found, then the rest of the elements are checked to see if they should be included with it, so that the list for the tuple is complete.

use strict;
use warnings;
use Bit::Vector::Overload;
use List::Util 'shuffle';

my $MIN = 3;

## Some keywords
my @keywords = qw[
    zero one two three four five six seven eight nine 
    ten eleven twelve thirteen fourteen 
    fifteen sixteen seventeen eighteen nineteen
];

## Generate some test data
my %items = map{
    $_ => [ @keywords[ ( shuffle( 0 .. $#keywords ) )[ 0 .. rand @keyw
+ords ] ] ]
} 'a' .. 'z';
print "The item list:\n";
print "$_ => @{ $items{ $_ } }\n" for sort keys %items;
print '=' x 30, "\n";

# First, build an index of the distinct values by building a hash
# (anonymous) and taking the keys
my @val_index = keys %{{map {($_ => undef)} map {@$_} values %items}};
# and a reverse lookup
my %rev_val_index = map {($val_index[$_] => $_)} 0..$#val_index;

# Now represent each entry as a bit vector
my %vectors;
while (my ($k, $v) = each %items) {
    $vectors{$k} = new Bit::Vector(scalar(@val_index));
    $vectors{$k}->Bit_On($_) for @rev_val_index{@$v};
}

# Compare elements pairwise and add each element
# to an AoHoH indexed by size of tuple and tuple member list
# if they have elements in common
my @intersections;
my @item_keys = keys %items;
for my $i (0..$#item_keys-1) {
    for my $j ($i+1..$#item_keys) {
    my $intersect = $vectors{$item_keys[$i]} & $vectors{$item_keys[$j]
+};
        my @common_elements = @val_index[$intersect->Index_List_Read];
        next if @common_elements < $MIN;
    my $name_list = join ' ', @common_elements;
    @{$intersections[scalar @common_elements]{$name_list}}{@item_keys[
+$i,$j]} = ();
    # Include any higher-order matches in lower-order matches
    for my $k (0..$#item_keys) {
        next if $k == $i or $k == $j;
        my $new_isect = $intersect & $vectors{$item_keys[$k]};
        if ($new_isect eq $intersect) {
        $intersections[scalar @common_elements]{$name_list}{$item_keys
+[$k]} = ();
        }
    }
    }
}

for ($MIN..$#intersections) {
    next unless keys %{$intersections[$_]};
    print "=== $_-tuples of common elements:\n";
    while (my ($k, $v) = each %{$intersections[$_]}) {
    print "$k: ", join(', ', sort keys %$v), "\n";
    }
}
[download]

Caution: Contents may have been coded under pressure.

In reply to Re^2: algorithm for 'best subsets' by Roy Johnson
in thread algorithm for 'best subsets' by halley

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks