http://www.perlmonks.org?node_id=743338

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks I have a Persian text and the list of stop words, I would like to remove all stop words but the results is not satisfactory. here is my code:
open (STOPWORDS, $ARGV[1]) || die "Error opening the stopwords file\n" +; $count = 0; while ($word = <STOPWORDS>) { chop($word); $stopword[$count] = lc($word); $count++; } close(STOPWORDS); open (INFILE, $ARGV[0]) || die "Error opening the input file\n"; while ($line = <INFILE>) { chop($line); @entry = split(/ /, $line); $i = 0; while ($entry[$i]) { $found = 0; $j = 0; while (($j<=$count) && ($found==0)) { if (lc($entry[$i]) eq $stopword[$j]) { $found = 1; } $j++; } if ($found == 0) { print "$entry[$i]\n"; } $i++; } } close(INFILE);
I cant put sample of my stop word list since it doesnt appear here, its one word per line and my input text is not tokenized and its just a raw uni-code text. any idea how can i make it work? Thanks in advance.