http://www.perlmonks.org?node_id=1004395

Jurafsky has asked for the wisdom of the Perl Monks concerning the following question:

I've created a little spellchecker. This script works in this way:
After reading each line of the text, realizes some corrections thanks to a comparison between a dictionary and the text itself.
When it finds a word that doesn't exist in the dictionary, it corrects the words (giving one or more suggestions) and pushes it into an array.

Here there's my problem:


I would like to give to the user the possibility to choose the correct word among the words suggested. Something like this:
We found the word "wlak" in your text which isn't correct.
The suggested possibilities are:
  • 1. walk
  • 2. work

  • type the number associated to the word or 0 if you can't find the correct word.
    Then I would like to replace the correct word on the original text (creating a new .txt).

    How can I do this?

    use diagnostics; use warnings; my ($file_dictionary, $word, $line, $line1, $alph, $elt, $w, $transpos +ition, $letter1, $letter2); my (@word, @altered_word, @filedictionary, @filetext, @dictionary, @ad +dition, @replacement, @transposition, @removal); $file_dictionary = "lexique.txt"; $file_text = "texte.txt"; #I create an array for the dictionary open (L, "<", $file_dictionary); while (defined( $line1 = <L>)) { chomp($line1); @filedictionary = split (/\s/, $line1); push (@dictionary, @filedictionary); } #I create an array for the text open (T, "<", $file_text); while (defined( $line = <T>)) { chomp($line); @filetext = split (/(\s|\pP)/, $line); for ($i = 0; $i < @filetext; $i++) { if (!grep(/^$filetext[$i]$/, @dictionary)) { push (@word, $filetext[$i]); } } } #then I create an array for each word foreach $w(@word) { @altered_word = split (//, $w); #I create an array for the dictionary open (L, "<", $file_dictionary); while (defined( $line1 = <L>)) { chomp($line1); @filedictionary = split (/\s/, $line1); push (@dictionary, @filedictionary); } #first operation --> "palrer" will be "parler" for (my $i=0; $i < $#altered_word ; $i++) { @transposition = @altered_word; $letter1 = $transposition[$i]; $letter2 = $transposition[$i+1]; $transposition[$i] = $letter2; $transposition[$i+1] = $letter1; $transposition = join "", @transposition; if (grep(/^$transposition$/, @dictionary)) { print "post transposition : $transposition\n"; } } foreach $elt (0 .. $#altered_word) { #second operation --> parller will be parler @removal = @altered_word; splice(@removal, $elt, 1); $removal = join "", @removal; if (grep(/^$removal$/, @dictionary)) { print "post enlevement : $removal\n"; } #third operation --> parer will be parler foreach $alph('a' .. 'z') { @addition = @altered_word; splice(@addition, $elt, 0, $alph); $addition = join "", @addition; if (grep(/^$addition$/, @dictionary)) { print "post addition : $addition\n"; } #last operation : mancer will be manger @replacement = @altered_word; splice(@replacement, $elt, 1, $alph); $replacement = join "", @replacement; if (grep(/^$replacement$/, @dictionary)) { print "post replacement : $replacement\n"; } } } }
    https://www.dropbox.com/s/t9fc2dk5mqbsb20/texte.txt
    this is the text
    https://www.dropbox.com/s/717rczou0mkrp0s/lexique.txt"
    this is the French Dictionary

    Replies are listed 'Best First'.
    Re: problem spellchecker
    by Anonymous Monk on Nov 18, 2012 at 12:13 UTC

      I've created a little spellchecker

      Have you heard of aspell?

      How can I do this?

      Write function, functions that take arguments, functions that return values, functions that can be tested outside of jurafskyspell.pl, functions that don't operate on global variables

      my %dictionary; PopulateDict( \%dictionary, $file );

      my $answer = SuggestThese( @possibilities );

      use quotemeta when generating regexes

      use a hash (also known as a dictionary ), see Tutorials Data Types and Variables,the basic datatypes, three

      Hey, use Tie::DictFile - tie a hash to local dictionary file

      For prompting you can use Term::Interact

      You might even be able to use Text::Aspell

    Re: problem spellchecker
    by linuxkid (Sexton) on Nov 19, 2012 at 15:42 UTC

      Why? just use ispell mode

      --linuxkid


      imrunningoutofideas.co.cc

          Emacs

          --linuxkid


          imrunningoutofideas.co.cc
    Re: problem spellchecker
    by Anonymous Monk on Nov 30, 2012 at 10:30 UTC

      The problem is that is an exercice and I can't use subroutines or modules.
      Now it works but I've just one problem: the script reads several time each line of the Input text.. Why? How can I resolve this problem?

      use diagnostics; use warnings; my ($word, $file_dict, $txt, $line, $line2, $i); my ($first_letter, $second_letter, $letter, $alphabet); my ($user, $exchange, $transposition, $removal, $addition); my (@text, @words, @dictionary, @dict, @single_letters); my (@transposition, @removal, @addiction, @exchange, @correct); $file_dict = "dict.txt"; $txt = "txt.txt"; $new_txt = "output.txt"; #open the dictionary and save it in an array open (D, "<", $file_dict); while(defined($line = <D>)) { chomp($line); @dict = split(/\s/, $line); push (@dictionary, @dict); } close (D); #open the file for output open (T2, ">", $new_txt); #open and save the text open (T, "+<", $txt); while(defined($line2 = <T>)) { chomp($line2); @text = split (/ /, $line2); push (@words, @text); #foreach word of the text, I reset the array of correct words #then I verify if the word is in the dictionary #if it isn't there, I split the word in letters #then I apply the correction that will be saved in @correct array +. foreach $word(@words){ @correct = "exit"; if (!grep(/^$word$/, @dictionary)) { print "Word : '$word' isn't in the dictionary.\n"; @single_letters = split (//, $word); #transposition for (my $i = 0; $i < $#single_letters; $i++) { @transposition = @single_letters; $first_letter = $transposition[$i]; $second_letter = $transposition[$i+1]; $transposition[$i] = $second_letter; $transposition[$i+1] = $first_letter; $transposition = join "", @transposition; if (grep(/^$transposition$/, @dictionary)) { push (@correct, $transposition); } } #removal foreach $lettera ( 0 .. $#single_letters) { @removal = @single_letters; splice (@removal, $lettera, 1); $removal = join "", @removal; if (grep(/^$removal$/, @dictionary)) { push (@correct, $removal); } #addition foreach $alphabet ( 'a' .. 'z') { @addition = @single_letters; splice (@addition, $lettera, 0, $alphabet); + $addition = join "", @addition; if (grep(/^$addition$/, @dictionary)) { push (@correct, $addition); } #exchange @exchange = @single_letters; splice(@exchange, $lettera, 1, $alphabet); $exchange = join "", @exchange; if (grep(/^$exchange$/, @dictionary)) { push (@correct, $exchange); } } } #now I display the solutions and user can choose one o +f them print "These are the correction of word $word\n"; for (my $c = 0; $c < @correct; $c++) { print "$c. : $correct[$c]\n"; } print "Are you interested on one of this solution? Typ +e the number or type 'exit'.\n"; $user = <STDIN>; chomp ($user); if ("$user" eq 'exit') { print "Next word.\n"; } else { $line2 =~ s/$word/$correct[$user]/; } } } print T2 "$line2 \n"; } close(T); close(T2);

        The problem is that is an exercice and I can't use subroutines or modules.

        That sounds unbelievable -- sure modules, if part of the exercise is reinventing this particular wheel, but subroutines? the fundamental unit of reusable code? I'd ask for my money back.

          LoL however.. I got it in this way. I think the problem was the first foreach loop..

          else { $line2 =~ s/$word/$correct[$user]/; $word = $error; @words = grep { $_ ne $error } @words; }

          I made a substitution in line, then I copied $word in another scalar, $error, and then I created again the @words array without $error !

        Ok, got it. What do you think about this solution?
        #now I display the solutions and user can choose one o +f them print "These are the correction of word $word\n"; for (my $c = 0; $c < @correct; $c++) { print "$c. : $correct[$c]\n"; } print "Are you interested on one of this solution? Typ +e the number or type 'exit'.\n"; $user = <STDIN>; chomp ($user); if ("$user" eq 'exit') { print "Next word.\n"; } else { $line2 =~ s/$word/$correct[$user]/; $word = $error; @words = grep { $_ ne $error } @words; } } } print T2 "$line2 \n"; }