Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I too couldn't leave good enough alone. After realizing an embarrasing oversight (pointed out by ambrus), I decided to try my hand at a perl + Inline::C version. It finishes the 3_477 records in 1 second flat with constant memory consumption (less than 65MB). Furthermore, here is how it stacks up against the 67_108_863 lines of output from 9_448_847 lines of input (see How many words does it take? for more details):
  • Pure perl ver 1 = 343 min
  • Pure perl ver 2 = 288 min
  • Pure perl ver 3 = 251 min
  • Java = 66 min
  • Perl + Inline::C = 3 min
#!/usr/bin/perl use strict; use warnings; use Inline C =>; for my $file (@ARGV) { open(my $fh, '<', $file) or die "Unable to open '$file' for readin +g: $!"; while (<$fh>) { my ($set) = $_ =~ /^(\w+)/; powerset($set, length($set)); } } __END__ __C__ #include <stdio.h> #include <stdlib.h> void powerset (char *set, int len) { static char seen[67108864]; static int init_flag = 0; static long val[128]; int i, j, k, new_len; long bit = 0; char *oldset, *newset; if (! init_flag) { memset(seen, '0', sizeof(seen)); val['a'] = 1; val['b'] = 2; val['c'] = 4; v +al['d'] = 8; val['e'] = 16; val['f'] = 32; val['g'] = 64; v +al['h'] = 128; val['i'] = 256; val['j'] = 512; val['k'] = 1024; v +al['l'] = 2048; val['m'] = 4096; val['n'] = 8192; val['o'] = 16384; v +al['p'] = 32768; val['q'] = 65536; val['r'] = 131072; val['s'] = 262144; v +al['t'] = 524288; val['u'] = 1048576; val['v'] = 2097152; val['w'] = 4194304; v +al['x'] = 8388608; val['y'] = 16777216; val['z'] = 33554432; init_flag = 1; oldset = malloc((len + 1) * sizeof(char)); for (i = 0; i < len; ++i) { oldset[i] = set[i]; } oldset[len] = '\0'; } else { oldset = set; } for (i = 0; i < len; ++i) { bit += val[ oldset[i] ]; } if (seen[bit] == '1') { return; } seen[bit] = '1'; printf("%s\n", oldset); if (len == 1) { return; } new_len = len - 1; newset = malloc((len + 1) * sizeof(char)); for (i = 0; i < len; ++i) { k = 0; for (j = 0; j < i; ++j) { newset[k++] = oldset[j]; } for (j = i + 1; j < len; ++j) { newset[k++] = oldset[j]; } newset[k] = '\0'; powerset(newset, new_len); } free(newset); }

Update: I had solicited feedback on my poor C in the CB and was told that, despite it being a translation of the Perl above, that documentation would go along way in getting the desired comments.

For each $set in a file, the code generates the powerset (minus the empty set). It is designed to skip any subset that has previously been produced. The code to generate the powerset is intentionally inefficient from the perspective of a single $set, but more than makes up for it when the "skip previously seen subset" is applied across all sets in the file.

The algorithm to generate the powerset (ABCD) is as follows:

  • Generate all subsets of size N-1 (ABC, ABC, ACD, BCD)
  • For each new subset, repeat process until size is 1
  • Stop recursing as soon as subset has previously been seen

Because over 67 million (2 ** 26) sets will be generated, keeping track of what has previously been seen can take up a lot of space. There is a pretty straight forward way to translate a set into an integer that fits into a contigous range of 1 .. 2 ** 26. For those interested, the algorithm is the sum of 1 << ord(x) - ord('a') where x is all lowercase chars in the subset. I have created a static lookup table val[] rather than generate the values each time. In perl, I use a bitstring with vec but in C have chosen to use a char array seen[] which takes up (2 ** 26) / 1024 * 1024 or 64MB.

Since some of these arrays need to retain their values between invocations they have been made static. It is basically then just a matter of determining if the passed in string has been seen and if not, allocate a new string of size N - 1. Generate each string of size N - 1 and call self recursively.

Cheers - L~R

In reply to Re^4: Powerset short-circuit optimization by Limbic~Region
in thread Powerset short-circuit optimization by Limbic~Region

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [1nickt]: This is less than perfect ... but demanding perfection (from people or from life) is a sure way to unhappiness.
    [Discipulus]: and anyway we have CB where every (democratic) opinion can be expressed
    erix eat the rich!
    [1nickt]: I do think it is sad that roho has received 3 downvotes for his polite request, as did I when I objected to the profanity in stonecolddevin's sig. I upvoted both him and Karl for the discussion. Way too much downvoting for inappropriate reasons here!
    Discipulus learn that 'argue' has a little negative sense, he thought was a neutral sense, 'vox media'
    [1nickt]: argue == discuss && argue == be contentious
    [Discipulus]: you are right 1nickt i didnt voted nor downvoted; I just upvote perl content i like
    [1nickt]: In Spanish, to argue (like a fight) is discutir -- does not mean to discuss !
    [1nickt]: Sigh, this is why I gave up human-only languages and became a Perl linguist :-)
    [Discipulus]: i just, rarely, downvote unpolite posts and spam, and wrong advices

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (6)
    As of 2017-06-22 12:21 GMT
    Find Nodes?
      Voting Booth?
      How many monitors do you use while coding?

      Results (519 votes). Check out past polls.