Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

problem spellchecker

by Jurafsky (Initiate)
on Nov 18, 2012 at 11:40 UTC ( #1004395=perlquestion: print w/ replies, xml ) Need Help??
Jurafsky has asked for the wisdom of the Perl Monks concerning the following question:

I've created a little spellchecker. This script works in this way:
After reading each line of the text, realizes some corrections thanks to a comparison between a dictionary and the text itself.
When it finds a word that doesn't exist in the dictionary, it corrects the words (giving one or more suggestions) and pushes it into an array.

Here there's my problem:


I would like to give to the user the possibility to choose the correct word among the words suggested. Something like this:
We found the word "wlak" in your text which isn't correct.
The suggested possibilities are:
  • 1. walk
  • 2. work

  • type the number associated to the word or 0 if you can't find the correct word.
    Then I would like to replace the correct word on the original text (creating a new .txt).

    How can I do this?

    use diagnostics; use warnings; my ($file_dictionary, $word, $line, $line1, $alph, $elt, $w, $transpos +ition, $letter1, $letter2); my (@word, @altered_word, @filedictionary, @filetext, @dictionary, @ad +dition, @replacement, @transposition, @removal); $file_dictionary = "lexique.txt"; $file_text = "texte.txt"; #I create an array for the dictionary open (L, "<", $file_dictionary); while (defined( $line1 = <L>)) { chomp($line1); @filedictionary = split (/\s/, $line1); push (@dictionary, @filedictionary); } #I create an array for the text open (T, "<", $file_text); while (defined( $line = <T>)) { chomp($line); @filetext = split (/(\s|\pP)/, $line); for ($i = 0; $i < @filetext; $i++) { if (!grep(/^$filetext[$i]$/, @dictionary)) { push (@word, $filetext[$i]); } } } #then I create an array for each word foreach $w(@word) { @altered_word = split (//, $w); #I create an array for the dictionary open (L, "<", $file_dictionary); while (defined( $line1 = <L>)) { chomp($line1); @filedictionary = split (/\s/, $line1); push (@dictionary, @filedictionary); } #first operation --> "palrer" will be "parler" for (my $i=0; $i < $#altered_word ; $i++) { @transposition = @altered_word; $letter1 = $transposition[$i]; $letter2 = $transposition[$i+1]; $transposition[$i] = $letter2; $transposition[$i+1] = $letter1; $transposition = join "", @transposition; if (grep(/^$transposition$/, @dictionary)) { print "post transposition : $transposition\n"; } } foreach $elt (0 .. $#altered_word) { #second operation --> parller will be parler @removal = @altered_word; splice(@removal, $elt, 1); $removal = join "", @removal; if (grep(/^$removal$/, @dictionary)) { print "post enlevement : $removal\n"; } #third operation --> parer will be parler foreach $alph('a' .. 'z') { @addition = @altered_word; splice(@addition, $elt, 0, $alph); $addition = join "", @addition; if (grep(/^$addition$/, @dictionary)) { print "post addition : $addition\n"; } #last operation : mancer will be manger @replacement = @altered_word; splice(@replacement, $elt, 1, $alph); $replacement = join "", @replacement; if (grep(/^$replacement$/, @dictionary)) { print "post replacement : $replacement\n"; } } } }
    https://www.dropbox.com/s/t9fc2dk5mqbsb20/texte.txt
    this is the text
    https://www.dropbox.com/s/717rczou0mkrp0s/lexique.txt"
    this is the French Dictionary

    Comment on problem spellchecker
    Download Code
    Re: problem spellchecker
    by Anonymous Monk on Nov 18, 2012 at 12:13 UTC

      I've created a little spellchecker

      Have you heard of aspell?

      How can I do this?

      Write function, functions that take arguments, functions that return values, functions that can be tested outside of jurafskyspell.pl, functions that don't operate on global variables

      my %dictionary; PopulateDict( \%dictionary, $file );

      my $answer = SuggestThese( @possibilities );

      use quotemeta when generating regexes

      use a hash (also known as a dictionary ), see Tutorials Data Types and Variables,the basic datatypes, three

      Hey, use Tie::DictFile - tie a hash to local dictionary file

      For prompting you can use Term::Interact

      You might even be able to use Text::Aspell

    Re: problem spellchecker
    by linuxkid (Sexton) on Nov 19, 2012 at 15:42 UTC

      Why? just use ispell mode

      --linuxkid


      imrunningoutofideas.co.cc

          Emacs

          --linuxkid


          imrunningoutofideas.co.cc
    Re: problem spellchecker
    by Anonymous Monk on Nov 30, 2012 at 10:30 UTC

      The problem is that is an exercice and I can't use subroutines or modules.
      Now it works but I've just one problem: the script reads several time each line of the Input text.. Why? How can I resolve this problem?

      use diagnostics; use warnings; my ($word, $file_dict, $txt, $line, $line2, $i); my ($first_letter, $second_letter, $letter, $alphabet); my ($user, $exchange, $transposition, $removal, $addition); my (@text, @words, @dictionary, @dict, @single_letters); my (@transposition, @removal, @addiction, @exchange, @correct); $file_dict = "dict.txt"; $txt = "txt.txt"; $new_txt = "output.txt"; #open the dictionary and save it in an array open (D, "<", $file_dict); while(defined($line = <D>)) { chomp($line); @dict = split(/\s/, $line); push (@dictionary, @dict); } close (D); #open the file for output open (T2, ">", $new_txt); #open and save the text open (T, "+<", $txt); while(defined($line2 = <T>)) { chomp($line2); @text = split (/ /, $line2); push (@words, @text); #foreach word of the text, I reset the array of correct words #then I verify if the word is in the dictionary #if it isn't there, I split the word in letters #then I apply the correction that will be saved in @correct array +. foreach $word(@words){ @correct = "exit"; if (!grep(/^$word$/, @dictionary)) { print "Word : '$word' isn't in the dictionary.\n"; @single_letters = split (//, $word); #transposition for (my $i = 0; $i < $#single_letters; $i++) { @transposition = @single_letters; $first_letter = $transposition[$i]; $second_letter = $transposition[$i+1]; $transposition[$i] = $second_letter; $transposition[$i+1] = $first_letter; $transposition = join "", @transposition; if (grep(/^$transposition$/, @dictionary)) { push (@correct, $transposition); } } #removal foreach $lettera ( 0 .. $#single_letters) { @removal = @single_letters; splice (@removal, $lettera, 1); $removal = join "", @removal; if (grep(/^$removal$/, @dictionary)) { push (@correct, $removal); } #addition foreach $alphabet ( 'a' .. 'z') { @addition = @single_letters; splice (@addition, $lettera, 0, $alphabet); + $addition = join "", @addition; if (grep(/^$addition$/, @dictionary)) { push (@correct, $addition); } #exchange @exchange = @single_letters; splice(@exchange, $lettera, 1, $alphabet); $exchange = join "", @exchange; if (grep(/^$exchange$/, @dictionary)) { push (@correct, $exchange); } } } #now I display the solutions and user can choose one o +f them print "These are the correction of word $word\n"; for (my $c = 0; $c < @correct; $c++) { print "$c. : $correct[$c]\n"; } print "Are you interested on one of this solution? Typ +e the number or type 'exit'.\n"; $user = <STDIN>; chomp ($user); if ("$user" eq 'exit') { print "Next word.\n"; } else { $line2 =~ s/$word/$correct[$user]/; } } } print T2 "$line2 \n"; } close(T); close(T2);
        Ok, got it. What do you think about this solution?
        #now I display the solutions and user can choose one o +f them print "These are the correction of word $word\n"; for (my $c = 0; $c < @correct; $c++) { print "$c. : $correct[$c]\n"; } print "Are you interested on one of this solution? Typ +e the number or type 'exit'.\n"; $user = <STDIN>; chomp ($user); if ("$user" eq 'exit') { print "Next word.\n"; } else { $line2 =~ s/$word/$correct[$user]/; $word = $error; @words = grep { $_ ne $error } @words; } } } print T2 "$line2 \n"; }

        The problem is that is an exercice and I can't use subroutines or modules.

        That sounds unbelievable -- sure modules, if part of the exercise is reinventing this particular wheel, but subroutines? the fundamental unit of reusable code? I'd ask for my money back.

          LoL however.. I got it in this way. I think the problem was the first foreach loop..

          else { $line2 =~ s/$word/$correct[$user]/; $word = $error; @words = grep { $_ ne $error } @words; }

          I made a substitution in line, then I copied $word in another scalar, $error, and then I created again the @words array without $error !

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: perlquestion [id://1004395]
    Approved by 2teez
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (18)
    As of 2014-09-02 14:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite cookbook is:










      Results (24 votes), past polls