Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Help thinking about an alternate algorithm

by Limbic~Region (Chancellor)
on Jan 15, 2014 at 16:49 UTC ( [id://1070706]=note: print w/replies, xml ) Need Help??


in reply to Re: Help thinking about an alternate algorithm
in thread Help thinking about an alternate algorithm

LanX,
Not sure I get the problem

I really like the outfit analogy because I can't discuss the real data (work). Imagine you have a classroom of kids. Each kid is told that the next day they must all wear a hat, a shirt, a pair of pants, a belt and a pair of shoes. They may choose to wear a either a red, white or a blue hat. The choices for shirt are larger (red, white, blue, green, yellow and orange), etc.

The number of students will always be small in comparison to the number of distinct possible outfits. What you need to do is identify the two students who had the largest number of differences between them. In my example, the greatest distance is 5 (hat, shirt, pants, belt and shoes). It doesn't matter how many items are in a particular category - if they chose a different color hat then the distance is 1, if they also chose a different color belt the distance goes up to 2.

As I have said, I know that comparing all students against all other students is necessary in the worst case. I was hoping there would be a way to say I currently have found two students with at least 3 things different so I shouldn't consider students who have at least 2 things the same because they can't possibly exceed my current high water mark.

Cheers - L~R

Replies are listed 'Best First'.
Re^3: Help thinking about an alternate algorithm
by LanX (Saint) on Jan 15, 2014 at 17:27 UTC
    > Each kid is told that the next day they must all wear a hat, a shirt, a pair of pants, a belt and a pair of shoes.

    So each kid has a coordinate in a 5 dimensional space

     hat X shirt X pants X belt X shoes

    You want to know the maximum Hamming Distance between of two kids.

    > I was hoping there would be a way to say I currently have found two students with at least 3 things different so I shouldn't consider students who have at least 2 thing ...

    One naive approach is to sort is to partition all students according to each dimension such that

     union ( @hat{ qw/red white blue/ } ) == @all_students and so on.

    those partitions are the layers in a 5-dimensional cube.

    for one student A=(a_1,a_2,a_3,a_4,a_5) the maximum distance students are those in the intersection of  @Not(a_n) for the smallest n.

    > current high water mark.

    well n has to be greater than your current maximum.

    I'm not very keen to code this, just wanna give you pointers for keywords to find appropriate algorithms.

    Another thing which comes to mind are implication bases.

    Something like "all kids with red hat and yellow shirt wear green pants" is an implication.

    The minimal set of implications which can be used to derive all other possible implications is called an "implication base".

    If all other students are in a 3 distance to A. this means that any 3 features A doesn't have already imply a feature A has.

    There are many algorithms effectively calculating such implications for a given point set, I wouldn't be surprised if they already lead to the maximum hamming distance as a side effect.

    AFAIK "Implication bases" are used for DB optimization.

    I'd start looking there.¹

    HTH! =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    ¹)I wouldn't be surprised if modifying next closure-algorithm of Bernhard Ganter can already be used for this.

Re^3: Help thinking about an alternate algorithm
by BrowserUk (Patriarch) on Jan 15, 2014 at 16:58 UTC
    I was hoping there would be a way to say I currently have found two students with at least 3 things different so I shouldn't consider students who have at least 2 things the same because they can't possibly exceed my current high water mark.

    To know that there are students that have at least 2 things different, you have to have already started comparing them.

    But you won't know if they have more than 2 things different until you have compared all the categories.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1070706]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-03-28 10:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found