Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Here is my entry, making use of some nice operations from Bit::Vector::Overload. It still has some rough spots in combining the final results, but it narrows things down very quickly. No combinatoric generators are used.

Update: Revised to collect the groups much better. I believe it will now do what Roy Johnson suggested. I didn't change any hashes to arrays, though.

Timing on my machine for a 676-item test case generated by the benchmark program halley did:

4.830u 0.000s 0:08.29 58.2% 0+0k 0+0io 420pf+0w

Update2: I also tried a 17,576 item example (with 'kaaa' ... 'kzzz' and 'iaaa' .. 'izzz'). It ran for one hour to find all groups from 2 up to 4 (the maximum available in this case). The timing is consistent with O(I^2 * log K), where I is the item count and K is the keyword count.

Update3: Inner loop optimization -- better ways to test for empty sets (is_empty) and count bits in sets (Norm). Went from one hour to 54 minutes on the biggest case.

#!/usr/bin/perl use strict; use warnings; use Bit::Vector::Overload; my %items = ( a => [ qw/one six/ ], b => [ qw/two three five/ ], c => [ qw/one two five/ ], d => [ qw/one seven five/ ], e => [ qw/one two five/ ], f => [ qw/one two four seven/ ], g => [ qw/one two five/ ], h => [ qw/one two three five/ ], ); my $icount = keys %items; # Form a mapping from items to bit positions. # Collect a list of bitmaps for combination work. my $ix = 0; my %ipos; my @ilst; for my $itm ( sort keys %items ) { $ipos{$itm} = $ix; my $set1 = new Bit::Vector($icount + 1); $set1->Bit_On($ix); push @ilst, $set1; ++$ix; } my %revipos = reverse(%ipos); # Form a mapping from keywords to bit positions. my $scount = 0; my %kpos; for my $itm ( sort keys %items ) { for my $keyw ( @{ $items{ $itm }} ) { $kpos{$keyw} = $scount++ if (!exists $kpos{$keyw}); } } # Also form a reverse index for later printing. my %revkpos = reverse(%kpos); # Form bit vectors with ones in the keyword positions, # one for every item. my %keyword_vecs; my @lst1; for my $itm ( sort keys %items ) { my $set0 = new Bit::Vector($scount + 1); $keyword_vecs{$itm} = $set0; for my $keyw ( @{ $items{ $itm }} ) { $set0->Bit_On($kpos{$keyw}); } # hold both sets - items and keywords together. push @lst1,[$ilst[$ipos{$itm}], $set0]; } # Must have at least matching pairs. my @lst2; my %same_merger; # want to merge combos with common intersections. my $i; my $j; my $imax = @lst1; for ($i = 0; $i < $imax; ++$i) { for ($j = $i+1; $j < $imax; ++$j) { my $kcombo = $lst1[$i]->[1] & $lst1[$j]->[1]; next if $kcombo->is_empty(); next if $kcombo->Norm() < 2; my $k = "$kcombo"; if (exists $same_merger{$k}) { $same_merger{$k}->[0] |= $lst1[$i]->[0]; $same_merger{$k}->[0] |= $lst1[$j]->[0]; } else { $same_merger{$k} = [ ($lst1[$i]->[0] | $lst1[$j]->[0]), $kcom +bo ]; } } } for (keys %same_merger) { my $kref = $same_merger{$_}; my $icombo = $kref->[0]; my $kcombo = $kref->[1]; my @inames = @revipos{ $icombo->Index_List_Read() }; my @knames = @revkpos{ $kcombo->Index_List_Read() }; # Result could be externally sorted, or sort the lst2 array and the +n print. print scalar(@inames)," : @inames combo is @knames","\n"; push @lst2,[$icombo, $kcombo]; }

In reply to Re: algorithm for 'best subsets' by tall_man
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-28 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found