BUU has asked for the
wisdom of the Perl Monks concerning the following question:
Another pet project of mine has recently branched to involve trying to determine whether sentences are "equivalent", that is, asking or referring to the same thing, despite different grammatical structures and so forth. Obviously this is a Very Hard Task and I'm not really looking to solve it perfectly. That would be very awesome though, but rather unlikely.
Anyway, my general idea at the moment basically involves stripping stop words, stemming the remainder and comparing the resultant set with my target set of the other sentence. This will probably be fairly fast for the small case and it's easy to think of and implement, but it has some flaws, for example, how do I store the resultant sets so I could easy reference them again? If I've got hundreds of thousands of these sets and I want to find the ones that match a new set I just created, how do I do that?
So that's my idea. Anyone else have any useful ideas? Pointers to research? Clever algorithms?