Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Powerset short-circuit optimization

by BrowserUk (Pope)
on Oct 04, 2006 at 20:46 UTC ( #576406=note: print w/replies, xml ) Need Help??


in reply to Re: Powerset short-circuit optimization
in thread Powerset short-circuit optimization

Unless I've been wasting my time pursuing the wrong goal (which is quite possible), this isn't going to achieve the original aim. I'll try to explain, but sorry if I miss the mark.

The original post said:

Using another example:

Set1: A,B,C,D Set2: A,B,C,E set3: A,B,C,F,G

They share sets ABC, AB, AC, BC, A, B, and C so why generate them three times?

Which suggests the idea is to avoid regenerating those subsets that are a part of a second powerset, if you've already generated them for a previous powerset. To that end, you might use your code to generate the first powerset completely (metaphorically answering 'no' to the prompt), and then save a stringified version of each subset generated in a hash. $seen{ "@subset" } = 1

Then, when generating a second (or subsequent) powerset, each time you get back a subset, you look it up in the hash and use it's existance or otherwise to decide whether to skip the rest of that subset tree.

skip() if exists $seen{ "@subset" }

The problem is, for this to work, the code has to produce the same (number and order), subset sequence from any given starting point, but your code does not do this(*).

Running your code on B C D and answering alternately 'yes' and 'no' at the top level shows what should be skipped when answering yes to that subset:

C:\test>tilly-powersets B C D C:\test>tilly-powersets B C D BCD: skip? n BCD: skip? y BC: skip? n Done B: skip? n C: skip? n BD: skip? n D: skip? n CD: skip? n Done

Now, running it on the set A B C D, we should be able to skip the generation of the 6 subsets [BC], [B], [C], [BD], [D], [CD], but when we run the code and answer 'yes' each time we see a sequence we saw during the generation of the B C D powerset, we still end up generating the same number of outputs:

>tilly-powersets A B C D >tilly-powersets A B C D ABCD: skip? n ABCD: skip? n ABC: skip? n ABC: skip? n AB: skip? n AB: skip? n A: skip? n A: skip? n B: skip? n B: skip? y AC: skip? n AC: skip? n C: skip? n C: skip? y BC: skip? n BC: skip? y ABD: skip? n ABD: skip? n AD: skip? n AD: skip? n D: skip? n D: skip? y BD: skip? n BD: skip? y ACD: skip? n ACD: skip? n CD: skip? n CD: skip? y BCD: skip? n BCD: skip? y Done Done

This is because when generating a subtree that contains identical characters, but that is located at a different position in the base set, the order of generation is different, so the skipahead has a different effect. Occasionally, this will result in some saving, but often, as in the case above, none at all, and very rarely (from my analysis), will it achieve the full potential saving.


(*). I've been rapidly reaching the conclusion in my own attempts that there isn't any way to do this.

I've come up with 4 different ways of producing the powersets, each of which produce a different ordering, but none of them allow the potential saving to be achieved because the orderings are dependant upon the positions of the data in the set, not the values in those positions. Ie. * A B C * will never produced the same subtree as A B C * or * * A B C.

Basically, in every generator I've seen or come up with, the values of the data are entirely transparent to the algorithm, as they manipulate positions in the set only. Which means any comparison/decisions about skipping based upon the values of the data are bound to fail.

The only approach I've had any success with is to store not only the subset produced, but also their offset in the original base set (which is messy and complicated), and then only skip if you encounter both situations. The same data at the same offset. Whilst this would work, the reduction in the frequency with which the savings can be achieved, combined with the space requirements (and allocation costs) for storage, and the complexity of the comparisons, means that it's simply quicker to generate the sequence with a fast iterator.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Powerset short-circuit optimization
by Limbic~Region (Chancellor) on Oct 04, 2006 at 22:51 UTC
    BrowserUk,
    You have nailed it. When I posted this, I originally thought that having each set to be processed in alphabetical sorting the sets by length in descending order would allow for this kind of optimization.

    In examining the problem this post is meant to help solve further, I have realized I will not be able to process the list in descending length even if it would make things work.

    Grrrrr!

    Cheers - L~R

Re^3: Powerset short-circuit optimization
by tilly (Archbishop) on Oct 04, 2006 at 22:08 UTC
    Let me say what my code does, and what it doesn't do.

    It generates the subsets in the exact order that the original post gave, and does so in a way in which you can easily skip all subsets of the current subset. Furthermore it does this with bounded memory requirements, and with somewhat reasonable performance. However it does not guarantee that it generates all longer sets before generating a short set.

    If you wish to guarantee that it generates all longer sets before generating a short set, then you'll have to explicitly remember all of the skips you were asked to make (with potentially large memory requirements to do that), and the skipping step is going to become very complex. Limbic-Region will have to confirm one way or another, but I think I made the performance tradeoff that he would want.

      tilly,
      I was hoping to guarantee all the longer sets were processed first by ordering them according to length. Unfortunately I have discovered that this will not be possible in the larger problem space this is to help solve.

      I am very happy to have your code though even if I can't use it this time. I am sure it will come in handy in the future or at a minimum - I will learn a lot from it.

      Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://576406]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2020-10-27 11:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (256 votes). Check out past polls.

    Notices?