http://www.perlmonks.org?node_id=782424


in reply to Sets of subsets

Any suggestions of a better/different way to do a grouping of this nature are of course completely appreciated.
I probably completely misunderstand what you are going to do, but it looks like a naive simple algorithm is O(n2k), where n is the number of sets, and k the number of thingies in a set. But the number of subsets is already Θ(2n). Going this way doesn't seem like an efficient solution to me.

Replies are listed 'Best First'.
Re^2: Sets of subsets
by jpearl (Scribe) on Jul 22, 2009 at 20:44 UTC

    I probably didn't phrase it very well. Really I'm just interested in the partitions of the main set of things. So if there were three "things", there would only be 4 sets of subsets:

    {t1}{t2}{t3} {t1,t2}{t3} {t1}{t2,t3} {t1,t3}{t2}

    Since ordering doesn't matter. From these partitions I can calculate the distance or similarity between the groups from the attribute table (which has already been calculated). So for the above example I might have a table like:

    t1 t2 t3 t1 0 1 2 t2 1 0 3 t3 2 3 0

    Representing the difference between each thing. When I have the subsets from above I can calculate what the average difference is between the subsets using the difference values as the metric.

    Important to note as well that the number of "things" themselves are relatively few. The attributes previously calculated are in the hundreds, but the things themselves number a few tens (max 24). So, I'm not too worried about the numbers exploding on me... yet.

      Just look at the group t1 is in. t1 can be a group in itself. Or with each of t2, t3, ..., tk. Or any combination of them. Which means that if you have N "things", there are 2N-1-1 ways to be t1 in a group with other "t's" (-1 comes from that you seem to exclude the subset of all things together). And that's not even counting the different ways you can split up the group of things that aren't in the subset t1 belongs to.
        Definitely a good point. However, I'm sort of used to analyses sometimes taking upwards of days, so "blowing up" I guess might mean something else to me. You are correct though, very computationally complex. This was really just a random thought I had, I was wondering what sort of information I could extract out of this data. I imagine I'll end up going with a k-means clustering or something of that nature.