jimt,
I am sad to tell you that despite providing an iterative version that need not be called more than necessary, it is terribly slow. I timed (not
Benchmarked) 4 versions and unfortunately this wasn't even a contender:
- java recursive 1 = 12 seconds
- perl recursive 1 = 13 seconds
- perl recursive 2 = 28 seconds
- perl iterative 1 = 6_190 seconds (only 25% done, getting slower, and producing duplicates)
The code to generate the 3_477 line data file and the recursive java version can be found at How many words does it take?. The two recursive perl versions are below:
They both share the following code:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
for my $file (@ARGV) {
open(my $fh, '<', $file) or die "Unable to open '$file' for readin
+g: $!";
while (<$fh>) {
my ($set) = $_ =~ /^(\w+)/;
powerset($set);
}
}
sub powerset {
my $set = shift @_;
return if $seen{$set}++;
print "$set\n";
powerset($_) for subsets($set);
}
The 13 second version has subsets() as
sub subsets {
my $set = shift @_;
return if length($set) == 1;
my ($head, $char, $tail) = ($set, '', '');
my @ret;
while ($head) {
$char = chop $head;
push @ret, $head . $tail;
$tail = $char . $tail;
}
return @ret;
}
The 28 second version has subsets() as
sub subsets {
my $set = shift @_;
return if length($set) == 1;
my @list = split //, $set;
my $pos = @list;
my @ret;
while ($pos--) {
push @ret, join '', @list[grep $_ != $pos, 0 .. $#list];
}
return @ret;
}
I made minor modifications to your code to handle my dataset as well as produce comparable output:
# All references to $calls removed
# $limbic_sets = [ ... ]
# foreach my $limbic_set (@$limbic_sets) { ... }
# The above two lines became
open(my $fh, '<', 'phase1.data') or die $!;
while ( <$fh> ) {
my ($limbic_set) = $_ =~ /^(\w+)/;
$limbic_set = [ split //, $limbic_set ];
# ...
}
# removed print "checks set @$limbic_set\n";
# my $format = "%2s" x scalar(@$padded_limbic_set) . " (%d)\n";
# printf($format, (map {defined $_ && $display->{$_} ? $_ : ' '} @$pad
+ded_limbic_set), $idx);
# The above 2 lines became
print join '', map {defined $_ && $display->{$_} ? $_ : ''} @$padded_l
+imbic_set;
print "\n";
Update 1: After your 3rd update, your code finished in a respectable 78 seconds. Unfortunately it is still producing about 40% duplicates. Additionally, it doesn't produce the correct output (missing missing 92_835 strings out of 508_062). For instance 'cdglnst' does not appear at all in your output.
Update 2: After your 4th update, your code narrowly makes 3rd place with 26 seconds and correct output! I included the entire perl script I am using above to ensure we are comparing apples to apples. Admittedly, yours does scale much better with both speed and memory. Unfortunately, it still isn't quite up to the task I needed. I will have to put this in my back pocket for later though.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.