comment on

jimt,
I am sad to tell you that despite providing an iterative version that need not be called more than necessary, it is terribly slow. I timed (not Benchmarked) 4 versions and unfortunately this wasn't even a contender:

java recursive 1 = 12 seconds
perl recursive 1 = 13 seconds
perl recursive 2 = 28 seconds
perl iterative 1 = 6_190 seconds (only 25% done, getting slower, and producing duplicates)

The code to generate the 3_477 line data file and the recursive java version can be found at How many words does it take?. The two recursive perl versions are below:

They both share the following code:

#!/usr/bin/perl
use strict;
use warnings;

my %seen;
for my $file (@ARGV) {
    open(my $fh, '<', $file) or die "Unable to open '$file' for readin
+g: $!";
    while (<$fh>) {
        my ($set) = $_ =~ /^(\w+)/;
        powerset($set);
    }
}
sub powerset {
    my $set = shift @_;
    return if $seen{$set}++;
    print "$set\n";
    powerset($_) for subsets($set);
}
[download]

The 13 second version has subsets() as

sub subsets {
    my $set = shift @_;
    return if length($set) == 1;
    my ($head, $char, $tail) = ($set, '', '');
    my @ret;
    while ($head) {
        $char = chop $head;
        push @ret, $head . $tail;
        $tail = $char . $tail;
    }
    return @ret;
}
[download]

The 28 second version has subsets() as

sub subsets {
    my $set = shift @_;
    return if length($set) == 1;
    my @list = split //, $set;
    my $pos = @list;
    my @ret;
    while ($pos--) {
        push @ret, join '', @list[grep $_ != $pos, 0 .. $#list];
    }
    return @ret;
}
[download]

I made minor modifications to your code to handle my dataset as well as produce comparable output:

# All references to $calls removed

# $limbic_sets = [ ... ]
# foreach my $limbic_set (@$limbic_sets) { ... }

# The above two lines became
open(my $fh, '<', 'phase1.data') or die $!;
while ( <$fh> ) {
    my ($limbic_set) = $_ =~ /^(\w+)/;
    $limbic_set = [ split //, $limbic_set ];
    # ...
}

# removed print "checks set @$limbic_set\n";

# my $format = "%2s" x scalar(@$padded_limbic_set) . " (%d)\n";
# printf($format, (map {defined $_ && $display->{$_} ? $_ : ' '} @$pad
+ded_limbic_set), $idx);

# The above 2 lines became
print join '', map {defined $_ && $display->{$_} ? $_ : ''} @$padded_l
+imbic_set;
print "\n";
[download]

Update 1: After your 3rd update, your code finished in a respectable 78 seconds. Unfortunately it is still producing about 40% duplicates. Additionally, it doesn't produce the correct output (missing missing 92_835 strings out of 508_062). For instance 'cdglnst' does not appear at all in your output.

Update 2: After your 4th update, your code narrowly makes 3rd place with 26 seconds and correct output! I included the entire perl script I am using above to ensure we are comparing apples to apples. Admittedly, yours does scale much better with both speed and memory. Unfortunately, it still isn't quite up to the task I needed. I will have to put this in my back pocket for later though.

Cheers - L~R

In reply to Re^4: Powerset short-circuit optimization by Limbic~Region
in thread Powerset short-circuit optimization by Limbic~Region

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks