more useful options PerlMonks

### comment on

 Need Help??

This is a friggin' awesome problem. Lots of fun working on it. :-) But I think a solution is at hand.

My previous solution assumed that there was some sort of external criteria that would determine whether to skip generation of the powerset. Extensive private conversation with Limbic~Region plus his re-statement of the problem made it clear that that's not the case. Existence is the condition AND we need to worry about multiple sets and god knows what else. This will take time to explain.

I will run through some theory an explanation, then post the modified code, then further explain that.

The issue that other people were having was placement of the characters in the set. So if you start off with set ABC, you'll know to ignore AB. However, you need to keep track of "AB" somewhere in memory and then do a string comparison or the like. Expensive.

My original approach simply used set enumeration to flag sets as being valid or dupes or whatnot. But that wont work here. With the example set ABC, you'd flag BC (idx 6 by how my function counts). BCD comes along, but its idx 6 is CD. So you'd improperly skip over CD. Further, its BC is idx 3, which you'd duplicate. Not good.

We're going to fix this by using placeholders in the set by means of dead bits. This is far cheaper than you would initially think.

For set A B C, you can enumerate the powerset in binary: 111 ABC 110 AB 101 A C 100 A 011 BC 010 B 001 C 000 For BCD, it would be thus: 111 BCD 110 BC 101 B C 100 B 011 CD 010 C 001 D 000 Instead, we want to offset the values in BCD and leave an empty slot f +or A. We'll end up with this: 0111 BCD 0110 BC 0101 B C 0100 B 0011 CD 0010 C 0001 D 0000

BC is 011 in our original set, and hey, in our newly modified set with the dead bit at position 0, BC is once again 011. So now we can safely exclude this set.

Note - this does not require the elements to be in sorted order, or even for you to know them in advance. We'll use an LZW style approach to keep assigning slots as we see them.

(for ABC) Have we seen A before? No? A = slot 0 Have we seen B before? No? B = slot 1 Have we seen C before? No? C = slot 2 (for BCD) Have we seen B before? YES - return 1 Have we seen C before? YES - return 2 Have we seen D before? No? D == slot 3

The order is irrelevent, but it does require that we build up an additional hash table mapping elements -> slots. Memory intensive, but relatively cheap - we only need to look into this index at the start of processing on a set.

But wait! You say, I glossed over a whole set of values up at the top! Let's return to them:

By using a dead bit for A, we actually have this: 1111 invalid 1110 invalid 1101 invalid 1100 invalid 1011 invalid 1010 invalid 1001 invalid 1000 invalid 0111 BCD 0110 BC 0101 B C 0100 B 0011 CD 0010 C 0001 D 0000

That's hugely inefficient. That's 8 checks we need to do just to get to a valid set. It spirals horribly out of control as you add more dead bits. Say, for example, you were using all letters of the alphabet. Your last set is just (z). By that point, z is in slot 25. So you'd have:

11111111111111111111111111 invalid 11111111111111111111111110 invalid 11111111111111111111111101 invalid . . . 00000000000000000000000010 invalid 00000000000000000000000001 valid (z)

Thats 67,108,864 iterations through invalid sets before you find your one valid one. Fortunately, an xor can cause us to solve this in constant time.

Examples are in order:

From up above, set (z) start at 11111111111111111111111111 deadbit string = 11111111111111111111111110 start & deadbit = 11111111111111111111111110 this is > 0, so xor. (start ^ deadbit) & start = 00000000000000000000000001 Bang! one and, a conditional, and an xor jumped you past 67 million so +me odd invalid sets. Another one, from above, set (ABCD) start at 1111 deadbit string = 1110 start & deadbit = 1110 this is > 0, so xor (start ^ deadbit) & start = 0111 (starts at BCD Here's a more complex example. Assume our first set was ABC, and our n +ext was ABD. (A = 0, B = 1, C = 2, D = 3). When we hit ABD, we flag C + as our deadbit and our deadbit string would be 0010. Let's try a complete runthrough: 1111 1111 & 0010 > 0 -> (1111 ^ 0010) & 1111 -> new index is 1101 1101 ABD (1101 & 0010 == 0) 1100 AB (1100 & 0010 == 0) 1011 (1011 & 0010 == 0) 1011 & 0010 > 0 -> (1011 ^ 0010) & 1011 -> new index is 1001 1001 A D (1001 & 0010 == 0) 1000 A (1000 & 0010 == 0) 0111 0111 & 0010 > 0 -> (0111 ^ 0010) & 0111 -> new index is 0101 0101 B D (0101 & 0010 == 0) 0100 B (0100 & 0010 == 0) 0011 0011 & 0010 > 0 -> (0011 ^ 0010) & 0010 -> new index is 0001 0001 D (0001 & 0010 == 0) 0000 () (0000 & 0010 == 0)

The code must be modified to incorporate the deadbits, as well as given the ability to return and re-use our previously skipped indexes. Modifications are as follows. There's a fair bit of additional set up an logic required vs. the previous version, but it can spit out all the numbers as desired from node 580625 with only 39 calls to the iterator and by storing only 32 integers in memory. Most of the additional work and overhead are bitwise operations.

Update: Fixed a minor logic glitch that could cause dupes with specially crafted sets. This code should now be complete.

Update 2: Changed to pack as a little endian long (V) instead of just a long (L) to make cross platform compatible.

Update 3: Okay, I'm man enough to admit that all of my slick binary logic actually introduced a horrrible inefficiency (looping to see if a set should be skipped). Fixed with a hash lookup instead. Duh.

Update4: I think I've squeezed as much raw power out of this as I'm going to get. The newest version here requires the set elements be known in advance and it doen't print out the null set. Otherwise? It does work. Comparing against Limbic~Region's code below, mine is indeed still slower. Dang.

Current state of the art. Expects the data file to be passed on the command line:

In reply to Re^3: Powerset short-circuit optimization by jimt
in thread Powerset short-circuit optimization by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2020-10-27 12:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favourite web site is:

Results (256 votes). Check out past polls.

Notices?