Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
jimt,
I am sad to tell you that despite providing an iterative version that need not be called more than necessary, it is terribly slow. I timed (not Benchmarked) 4 versions and unfortunately this wasn't even a contender:
  • java recursive 1 = 12 seconds
  • perl recursive 1 = 13 seconds
  • perl recursive 2 = 28 seconds
  • perl iterative 1 = 6_190 seconds (only 25% done, getting slower, and producing duplicates)

The code to generate the 3_477 line data file and the recursive java version can be found at How many words does it take?. The two recursive perl versions are below:

They both share the following code:
#!/usr/bin/perl use strict; use warnings; my %seen; for my $file (@ARGV) { open(my $fh, '<', $file) or die "Unable to open '$file' for readin +g: $!"; while (<$fh>) { my ($set) = $_ =~ /^(\w+)/; powerset($set); } } sub powerset { my $set = shift @_; return if $seen{$set}++; print "$set\n"; powerset($_) for subsets($set); }
The 13 second version has subsets() as
sub subsets { my $set = shift @_; return if length($set) == 1; my ($head, $char, $tail) = ($set, '', ''); my @ret; while ($head) { $char = chop $head; push @ret, $head . $tail; $tail = $char . $tail; } return @ret; }
The 28 second version has subsets() as
sub subsets { my $set = shift @_; return if length($set) == 1; my @list = split //, $set; my $pos = @list; my @ret; while ($pos--) { push @ret, join '', @list[grep $_ != $pos, 0 .. $#list]; } return @ret; }

I made minor modifications to your code to handle my dataset as well as produce comparable output:

# All references to $calls removed # $limbic_sets = [ ... ] # foreach my $limbic_set (@$limbic_sets) { ... } # The above two lines became open(my $fh, '<', 'phase1.data') or die $!; while ( <$fh> ) { my ($limbic_set) = $_ =~ /^(\w+)/; $limbic_set = [ split //, $limbic_set ]; # ... } # removed print "checks set @$limbic_set\n"; # my $format = "%2s" x scalar(@$padded_limbic_set) . " (%d)\n"; # printf($format, (map {defined $_ && $display->{$_} ? $_ : ' '} @$pad +ded_limbic_set), $idx); # The above 2 lines became print join '', map {defined $_ && $display->{$_} ? $_ : ''} @$padded_l +imbic_set; print "\n";

Update 1: After your 3rd update, your code finished in a respectable 78 seconds. Unfortunately it is still producing about 40% duplicates. Additionally, it doesn't produce the correct output (missing missing 92_835 strings out of 508_062). For instance 'cdglnst' does not appear at all in your output.

Update 2: After your 4th update, your code narrowly makes 3rd place with 26 seconds and correct output! I included the entire perl script I am using above to ensure we are comparing apples to apples. Admittedly, yours does scale much better with both speed and memory. Unfortunately, it still isn't quite up to the task I needed. I will have to put this in my back pocket for later though.

Cheers - L~R


In reply to Re^4: Powerset short-circuit optimization by Limbic~Region
in thread Powerset short-circuit optimization by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-26 01:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found