Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Re: Group Similar Items

by wufnik (Friar)
on May 28, 2003 at 07:30 UTC ( [id://261243]=note: print w/replies, xml ) Need Help??


in reply to Re: Group Similar Items
in thread Group Similar Items

the distance approach is great if you are considering biological sequences, but i am not sure how well it will scale if you are considering text or phrases;

the key problem you will face is determining the right substitution/gap penalties with your distance metric.not so important with words, but important for phrases. if the text is words, determining similarity via phonemes sounds more natural.

if you don't have an appropriate substitution/deletion penalty matrix, you could get quite dissimilar phrases clustered together.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://261243]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-16 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found