It is uft8 as well ... It does not change the results but also it gives some errors as well regarding the wide character ...
even i change my code to this:
in reply to Re^3: arabic alphabet ... how to deal with?
in thread arabic alphabet ... how to deal with?
still does not work , it does not remove my stop words :(
open (STOPWORDS, $ARGV) || die "Error opening the stopwords file\n"
$count = 0;
while ($word = <STOPWORDS>)
$stopword[$count] = lc($word);
open (INFILE , $ARGV) || die "Error opening the input file\n";
while ($line = <INFILE>)
@entry = split(/ /, $line);
$i = 0;
$found = 0;
$j = 0;
while (($j<=$count) && ($found==0))
if (lc($entry[$i]) eq $stopword[$j])
$found = 1;
if ($found == 0)
print "$entry[$i]\n ";