Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: What are the monks doing with Perl and Linguistics?

by Mur (Pilgrim)
on May 09, 2003 at 19:21 UTC ( #256984=note: print w/ replies, xml ) Need Help??


in reply to What are the monks doing with Perl and Linguistics?

Well, I dunno if this qualifies: we're using Lingua::* modules to analyze words for indexing on web pages. Specifically, if a user searches for "advertising", we check words for common stems and so find --

  • ... advert
  • ... advertise
  • ... advertised
  • ... advertiser
  • ... advertisers
  • ... advertises
--
Jeff Boes
Database Engineer
Nexcerpt, Inc.
vox 269.226.9550 ext 24
fax 269.349.9076
 http://www.nexcerpt.com
...Nexcerpt...Connecting People With Expertise


Comment on Re: What are the monks doing with Perl and Linguistics?
Download Code
Re: Re: What are the monks doing with Perl and Linguistics?
by allolex (Curate) on May 09, 2003 at 19:50 UTC

    Very interesting stuff. I had a look at the "nexcepts" on your site. Yes, the Lingua derivational morphology modules (looks like Stem, Infinitive, Inflect) have provided some good results. It made me think about how I might go about doing something similar.

    One thing that might make your searches better is some way to account for morphology that is not just stem + ending, like pronounce/pronunciation/pronouncement. Also, grouping (near-)synonyms like "brotherly" and "fraternal" may improve your results. Of course my examples are a bit textbookish, but I'm sure that you can refine things using your expert knowledge about what sort of information your clients might want to look up.

    --
    Allolex

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://256984]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2014-07-25 23:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls