Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Help Needed for Spellcheck

by srik4u (Novice)
on Apr 17, 2006 at 04:14 UTC ( #543750=perlquestion: print w/replies, xml ) Need Help??

srik4u has asked for the wisdom of the Perl Monks concerning the following question:

Hi, We need to check the spelling of a word which is actually a Domain name. For example we have to check the word onlinetradeing . When checked with the spell checkers we are getting the words which are unrelated such as on, obliterating, incinerating, intruding etc. But exactly what we want was online trading . So we would like to have the word to be split into phrases and check the spelling too. The normal spell checkers are just checking the words in the dictionary but not splitting the word into phrases. Can any one give me any help??? Regards, Srikanth.

Replies are listed 'Best First'.
Re: Help Needed for Spellcheck
by kvale (Monsignor) on Apr 17, 2006 at 04:36 UTC
    The compound word could also be split "on line trading" and in general, there is more than one way to do it. I would create a personal dictionary of atomic words that you want and test against those. You would do this by implementing the grammar
    <compound-word> := <word> <compound-word> <word> := word1 | word2 | ... | wordn
    Regexes can do this for you, for a reasonably small number of atomic words:
    my $compound = "onlinetrading"; my $words = 'online|trading'; my ($first, $second); if ($compound =~ /^($words)($words)$/) { $first = $1; $second = $2; } print "$first, $second\n";
    Alternatively, check out Aspell, as it has some support for compound words.

    -Mark

Re: Help Needed for Spellcheck
by saintmike (Vicar) on Apr 17, 2006 at 07:10 UTC
    Think you need something smarter. How about Yahoo's spell check?
    $ typo onlinetradeing $ Corrected: online trading
    Note that you should register with their developer site (it's free) to get your own developer token and use their web API with Yahoo::Search from CPAN and the following script:
    #!/usr/bin/perl # typo - Ask Yahoo for spell corrections use strict; my $term = "@ARGV"; die "usage: $0 word/phrase ..." unless length $term; use Yahoo::Search AppId => "your_yahoo_token"; my($suggestion) = Yahoo::Search->Terms( Spell => $term); if(defined $suggestion) { print "Corrected: $suggestion\n"; } else { print "No suggestions\n"; }
    Here's an article with more detailed info.
Re: Help Needed for Spellcheck
by sanPerl (Friar) on Apr 17, 2006 at 07:11 UTC
    I think you need to develope a script which would split word.
    Now, In case of 'onlinetradeing', You can follow the approach mentioned below
    assume that you want to split the word in only 2 words then you need only 14 combinations for the said case. (i.e. Total Letters). Your script can create a list of these words and check every word against dictionary and pop-up valid suggestions. Your 'TO BE VALIDATED' list would be
    a) onlinetradeing
    b) o nlinetradeing
    c) on linetradeing
    d) onl inetradeing
    .
    .
    .
    n) onlinetradein g

    I think splitting the words should be cakewalk through regex.
      In addition, you need a dictionary for prefixes and suffixes (and silencing those silent e's). Plus conjugating verbs, pluralizing singulars, and heck, you might as well frobnicate the zoftigs.

      (But if you find a dictionary like that, I'd be interested.)

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Help Needed for Spellcheck
by sgifford (Prior) on Apr 17, 2006 at 15:30 UTC
Re: Help Needed for Spellcheck
by davidj (Priest) on Apr 18, 2006 at 05:48 UTC
    srik4u,
    the solution I was given in a recent post of mine, find all paths of length n in a graph, might offer a good place to start. Using the idea of a trie, you could recursively build the phrases as you go along. I am not good at writing recursive functions, but the idea is basically this: recursively iterate over the string and pop off substrings that match whole words. The phrase fails when you have a substring that is not a partial word. You have a valid phrase when you have reached the end of the string without failing on a substring. A function might look something like this:
    sub check_string($word) { foreach $chr (split //, $word) { $check .= $chr; if whole_word($check) { push @phrase, $check; $rem = substr($word, length($check), length($word) - lengt +h($check)); check_string($remainder); } elsif not_valid($check) { @phrase = (); return; } } print "@phrase\n"; @phrase = (); }
    as I say I am not good at recursive functions, so the above is merely a starting place. (getting the recursive element to work always befuddles me). More visually, this is what would happen with the string "mycarrot"

    m valid partial string, so continue my found a whole word, so push and recurse @phrase = ("my") c valid partial string, so continue ca valid partial string, so continue car found a whole word, so push and recurse @phrase = ("my", "car") r valid partial string, so continue ro valid partial string, so continue rot found a whole word, so push at end of string, so print valid phrase @phrase = ("my", "car", "rot") (backup to last iteration and continue) @phrase = ("my") carr valid partial string, so continue carro valid partial string, so continue carrot found a whole so push at end of string, so print valid phrase ("my", "carrot") back to last iteration and continue) @phrase = () myc invalid partial string, so quit @phrase = ()
    In the scary place I call my mind, this makes sense. I hope it makes sense to you. Maybe if someone else understands what I am trying to explain, they might be able to clarify it better than I.

    davidj

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://543750]
Approved by kvale
Front-paged by neversaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2020-10-21 05:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (212 votes). Check out past polls.

    Notices?