Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Unless I've been wasting my time pursuing the wrong goal (which is quite possible), this isn't going to achieve the original aim. I'll try to explain, but sorry if I miss the mark.

The original post said:

Using another example:

Set1: A,B,C,D Set2: A,B,C,E set3: A,B,C,F,G

They share sets ABC, AB, AC, BC, A, B, and C so why generate them three times?

Which suggests the idea is to avoid regenerating those subsets that are a part of a second powerset, if you've already generated them for a previous powerset. To that end, you might use your code to generate the first powerset completely (metaphorically answering 'no' to the prompt), and then save a stringified version of each subset generated in a hash. $seen{ "@subset" } = 1

Then, when generating a second (or subsequent) powerset, each time you get back a subset, you look it up in the hash and use it's existance or otherwise to decide whether to skip the rest of that subset tree.

skip() if exists $seen{ "@subset" }

The problem is, for this to work, the code has to produce the same (number and order), subset sequence from any given starting point, but your code does not do this(*).

Running your code on B C D and answering alternately 'yes' and 'no' at the top level shows what should be skipped when answering yes to that subset:

C:\test>tilly-powersets B C D C:\test>tilly-powersets B C D BCD: skip? n BCD: skip? y BC: skip? n Done B: skip? n C: skip? n BD: skip? n D: skip? n CD: skip? n Done

Now, running it on the set A B C D, we should be able to skip the generation of the 6 subsets [BC], [B], [C], [BD], [D], [CD], but when we run the code and answer 'yes' each time we see a sequence we saw during the generation of the B C D powerset, we still end up generating the same number of outputs:

>tilly-powersets A B C D >tilly-powersets A B C D ABCD: skip? n ABCD: skip? n ABC: skip? n ABC: skip? n AB: skip? n AB: skip? n A: skip? n A: skip? n B: skip? n B: skip? y AC: skip? n AC: skip? n C: skip? n C: skip? y BC: skip? n BC: skip? y ABD: skip? n ABD: skip? n AD: skip? n AD: skip? n D: skip? n D: skip? y BD: skip? n BD: skip? y ACD: skip? n ACD: skip? n CD: skip? n CD: skip? y BCD: skip? n BCD: skip? y Done Done

This is because when generating a subtree that contains identical characters, but that is located at a different position in the base set, the order of generation is different, so the skipahead has a different effect. Occasionally, this will result in some saving, but often, as in the case above, none at all, and very rarely (from my analysis), will it achieve the full potential saving.

(*). I've been rapidly reaching the conclusion in my own attempts that there isn't any way to do this.

I've come up with 4 different ways of producing the powersets, each of which produce a different ordering, but none of them allow the potential saving to be achieved because the orderings are dependant upon the positions of the data in the set, not the values in those positions. Ie. * A B C * will never produced the same subtree as A B C * or * * A B C.

Basically, in every generator I've seen or come up with, the values of the data are entirely transparent to the algorithm, as they manipulate positions in the set only. Which means any comparison/decisions about skipping based upon the values of the data are bound to fail.

The only approach I've had any success with is to store not only the subset produced, but also their offset in the original base set (which is messy and complicated), and then only skip if you encounter both situations. The same data at the same offset. Whilst this would work, the reduction in the frequency with which the savings can be achieved, combined with the space requirements (and allocation costs) for storage, and the complexity of the comparisons, means that it's simply quicker to generate the sequence with a fast iterator.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: Powerset short-circuit optimization by BrowserUk
in thread Powerset short-circuit optimization by Limbic~Region

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (4)
    As of 2020-10-27 13:03 GMT
    Find Nodes?
      Voting Booth?
      My favourite web site is:

      Results (256 votes). Check out past polls.