http://www.perlmonks.org?node_id=584748

I opted to post this as a meditation rather than a SoPW since I'm just relaying back what I've done. But if somebody wants to optimize, that'd rock.

I continued to tinker with Removing redundant powersets with minimal RAM an awful lot. If you're gonna read this node, go read that node (and the one it references) for background info). My original approach included storing a sorted version of all of the sets, from biggest set to smallest set. This would require 65 million lookup checks and ran in about 2.5 minutes. If I omitted the sort, it would drop to requiring only 45 million checks and running in about 1.5 minutes. Go figure. I think I got lucky with the data.

But my main question was, I have several sets of sets. Rapidly tell me if an arbitrary set is an element of each of those sets.. Alternatively, it can be stated as, Given a set of sets, and an arbitrary additional set, rapidly tell me if that arbitrary set is a subset of any set in my set of sets.

I have made progress towads a solution, but incredibly ended up optimizing even further for RAM, not speed. Whoops. I'm posting the progress here in case the techniques prove useful for anyone else. There's a potentially useful code generation technique about 3/4s of the way down. I can be a bit verbose.

Okay, now let's jump right into it. Let's say this is my data:

```
Set of sets:
ABCD
ABCE
AEF
DFG
ABFG

arbitrary set:
BG

Internally, this is all stored a integers, which I'll write out here i
+n binary:

ABCDEFG
1111000
1110100
1000110
0001011
1100011

arbitrary set:

0100001

I'm going to keep using the letters in the rest of the post, though, t

I want to quickly determine if BG is a subset of any of those sets I have. It turns out, it is, it's a subset of ABFG. But, I need to linearly run through all of the sets until I get to it, and it happens to be the one at the end. Not bad in this case, with only 5 sets, but if you had, say, 5,000 involved, it gets pricey. Especially if you need to loop through multiple times.

I was tempted to try sorting. Since they're all internally integers, this is fast and easy. Except for the fact that it's flawed. You don't want to sort on the size of the integer, you would want to sort on the number of elements in a set. So 01111 actually comes before 10000, since it has more elements. The goal was to put larger sets first, since they'd be more likely to contain a hit.

Unfortunately, I didn't get good results with it. Sure, some of the time, it helps out, but since the sets are arbitrary, you'll also end up sorting small sets to the end of the list. If those small sets contain the elements you need, you may have just increased your amount of work. Someone may hae a more clever technique.

I went with a different approach, trying to get constant time. Or, at a minimum, better performance.

The tricks I employed are to reduce the search space. In my previous post, I explained how I'd implemented a multi-level array to store the elements based upon what they contained, so you ended up with slots:

```
A: (ABCD, ABCE, AEF, ABFG)
B: (ABCD, ABCE, ABFG)
C: (ABCD, ABCE)
D: (ABCD, DFG)
E: (ABCE, AEF)
F: (AEF, DFG, ABFG)
G: (DFG, ABFG)

Then, with our arbitrary set, BG, we just look at the sets in the B sl
+ot and the G slot.

B: (ABCD, ABCE, ABFG)
G: (DFG, ABFG)

If there's a superset, it's gotta be in one of those two slots. The ne
+xt trick is that we only look at the smaller of the two lists, to try
+ and minimize our work as much as possible:

G: (DFG, ABFG)

So now we only need to check and see if BG is a subset of DFG or ABFG.
+ That's 2 checks instead of 5. Progress.

But I'm never happy. I wanted faster. I wanted a way to look at the sets and figure out if anything matched. I'll segue to binary again.

```
ABCDEFG
1111000
1110100
1000110
0001011
1100011

arbitrary set:

0100001

Now, I'll transpose the matrix, since I think it makes it easier to ex
+plain.

SSSSS
eeeee
ttttt
12345
+-----
A|11101
B|11001
C|11000
D|10010
E|01100
F|00111
G|00011

So instead of listing the sets horizontally, I list them vertically. E
+ach horizontal row is an element. Internally, each of those horizonta
+l numbers there is stored as a new integer. So instead of storing "11
+11000" for the set "ABCD", I store "11101" for the set of all sets ta
+ht contain "A".

This was the eureka moment. Each horizontal column contains the list of sets that each element is in. So if you take any two elements and bitwise & them together, you can see if there's an element that's in multiple sets. Again, I'll use the same trick as before to whittle down my search.

```
SSSSS
eeeee
ttttt
12345
+-----
A|11101
B|11001
C|11000
D|10010
E|01100
F|00111
G|00011

Arbitrary set is BG. So take the B row and the G row.

SSSSS
eeeee
ttttt
12345
+-----
B|11001
G|00011

Now you & those two integers together. 11001 & 00011 = 00001. This is a true value (not the null character). Therefore, we know that BG is a subset of some previous set. It doesn't matter which one, and for this problem, we don't care. Existence is enough.

You, the human, can read it and see that the value is 00001, so that means it's the 5th set that is a superset, which is ABFG. You've just dropped down to needing just a single check to determine if your arbitrary set is a subset of the 5 previous ones. Naturally, this is bounded by the size of your ints (normally 32 bits. 64 if you're lucky!). In that case, I build an array of ints and check each set in turn.

Now, that anding was a bitch. Conceptually, it's easy to see that I need to & together two ints in this case. But the next time I may need to & together three ints. And the next time 15 ints. And so on. I didn't want to go looping over the structure and & each value together in turn, so instead I wrote up this little gizmo to generate a function to & together arbitrary elements.

```
my %dispatch = ();

sub and_maker {

my \$num_args = shift;

return \$dispatch{\$num_args} if \$dispatch{\$num_args};

my \$built = join '&', map {'\$_[' . \$_ . ']'} (0..\$num_args - 1);
return \$dispatch{\$num_args} = eval "sub { \$built }";
}

Call as:

my \$func = and_maker(3);
my \$results1 = \$func->(1,2,3);                # 3 args
my \$results2 = \$func->(5,6,7);                # 3 args
my \$func2 = and_maker(5);
my \$\$results3 = \$func2->(12,13,14,15,16);    # 5 args