Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Sets of subsets

by jpearl (Scribe)
on Jul 22, 2009 at 19:51 UTC ( #782421=perlquestion: print w/ replies, xml ) Need Help??
jpearl has asked for the wisdom of the Perl Monks concerning the following question:

I have an interesting problem that i've been thinking about. Basically we have a series of different "things" (say 4 of them). I would like to group each thing together by a given set of attributes. The way this is calculated is something like:

a1 a2 a3 t1 y y n t2 y n y t3 n y y t4 y y n

From the table we can see similarities and differences between the things. For example the difference between t1 and t4 is 0, and the similarity is 3. My idea is to group the "t#"s together by maximizing the average difference between them, and minimize the similarity. To do that, however I need to be able to split the "things" into multiple different subsets. For example:

Subset(s)1 = {t1,t2}{t3}{t4} Subset(s)2 = {t1}{t2,t3}{t4} Subset(s)3 = {t1}{t2}{t3,t4}

etc.

Then basically iterate over all different possible combinations and find the maximum average distance and/or minimum similarity resulting from a comparison of the groups. My question is: is there a module available which will return sets of subsets like this? I'm finding a lot of modules that return permutations and the like, but not subsets. I can of course write my own, but reinventing the wheel and all that. Any suggestions of a better/different way to do a grouping of this nature are of course completely appreciated.

Thanks!

Comment on Sets of subsets
Select or Download Code
Re: Sets of subsets
by JavaFan (Canon) on Jul 22, 2009 at 20:17 UTC
    Any suggestions of a better/different way to do a grouping of this nature are of course completely appreciated.
    I probably completely misunderstand what you are going to do, but it looks like a naive simple algorithm is O(n2k), where n is the number of sets, and k the number of thingies in a set. But the number of subsets is already Θ(2n). Going this way doesn't seem like an efficient solution to me.

      I probably didn't phrase it very well. Really I'm just interested in the partitions of the main set of things. So if there were three "things", there would only be 4 sets of subsets:

      {t1}{t2}{t3} {t1,t2}{t3} {t1}{t2,t3} {t1,t3}{t2}

      Since ordering doesn't matter. From these partitions I can calculate the distance or similarity between the groups from the attribute table (which has already been calculated). So for the above example I might have a table like:

      t1 t2 t3 t1 0 1 2 t2 1 0 3 t3 2 3 0

      Representing the difference between each thing. When I have the subsets from above I can calculate what the average difference is between the subsets using the difference values as the metric.

      Important to note as well that the number of "things" themselves are relatively few. The attributes previously calculated are in the hundreds, but the things themselves number a few tens (max 24). So, I'm not too worried about the numbers exploding on me... yet.

        Just look at the group t1 is in. t1 can be a group in itself. Or with each of t2, t3, ..., tk. Or any combination of them. Which means that if you have N "things", there are 2N-1-1 ways to be t1 in a group with other "t's" (-1 comes from that you seem to exclude the subset of all things together). And that's not even counting the different ways you can split up the group of things that aren't in the subset t1 belongs to.
Re: Sets of subsets
by ig (Vicar) on Jul 22, 2009 at 20:48 UTC
      Ah yes, not thinking of searching for "partition" was not doing me any favors. Algorithm::Combinatorics had exactly the method I was looking for (partitions) :-P. Thanks!
Re: Sets of subsets
by ELISHEVA (Prior) on Jul 22, 2009 at 20:49 UTC
Re: Sets of subsets
by mzedeler (Pilgrim) on Jul 22, 2009 at 20:59 UTC

    It is easier to do as many class divisions as you have positions in each tuple (I guess tuple is a better word in stead of set, since you have the same number of elements in each of them):

    Class(a1): {{t1, t2, t3}, {t4}} Class(a2): {{t1, t3, t4}, {t2}} Class(a3): {{t1, t4}, {t2, t3}}

    While you divide into classes, you'll easilly be able to maintain a distance map like so:

    After processing Class(a1): \ t1 | t2 | t3 | t4 t1 0 | 0 | 0 | 1 t2 0 | 0 | 1 t3 0 | 1 t4 0 After processing Class(a2): \ t1 | t2 | t3 | t4 t1 0 | 1 | 0 | 1 t2 0 | 1 | 2 t3 0 | 1 t4 0 After processing Class(a3): \ t1 | t2 | t3 | t4 t1 0 | 2 | 1 | 1 t2 0 | 1 | 3 t3 0 | 2 t4 0

    This way, you can get complexity O(m * n) where m is the number of tuples and n is the size of the tuples.

Re: Sets of subsets
by blokhead (Monsignor) on Jul 23, 2009 at 16:41 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://782421]
Approved by ELISHEVA
Front-paged by jrsimmon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2014-12-20 13:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls