Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: arabic alphabet ... how to deal with?

by kennethk (Abbot)
on Feb 12, 2009 at 16:41 UTC ( #743386=note: print w/ replies, xml ) Need Help??


in reply to arabic alphabet ... how to deal with?

Read through perlunicode. All your I/O operations need to be performed in UTF-8. That means not only open (STOPWORDS, '<:encoding(UTF-8)', $ARGV[1]) as ForgotPasswordAgain suggests and open (INFILE, '<:encoding(UTF-8)', $ARGV[0]) as derby suggests, but also binmode STDOUT, ":encoding(utf8)" before you try to print. The fact that it works with "standard" text says it is almost guaranteed to be a Unicode problem.


Comment on Re: arabic alphabet ... how to deal with?
Select or Download Code
Re^2: arabic alphabet ... how to deal with?
by Anonymous Monk on Feb 12, 2009 at 16:53 UTC
    I tried this way as well before, this way no output ;)
    #!/usr/bin/perl open (STOPWORDS, '<:encoding(UTF-8)', $ARGV[1]) || die "Error opening +the stopwords file\n"; $count = 0; while ($word = <STOPWORDS>) { chop($word); $stopword[$count] = lc($word); $count++; } close(STOPWORDS); open (INFILE ,'<:encoding(UTF-8)', $ARGV[0]) || die "Error opening the + input file\n"; while ($line = <INFILE>) { chop($line); @entry = split(/ /, $line); $i = 0; while ($entry[$i]) { $found = 0; $j = 0; while (($j<=$count) && ($found==0)) { if (lc($entry[$i]) eq $stopword[$j]) { $found = 1; } $j++; } if ($found == 0) { print FH "$entry[$i]\n"; } $i++; } } close(INFILE);
      In this case, you have an orphaned file handle FH which is never associated with a file or channel.
        when I write in this way also :
        open (OUTFILE ,'>>:encoding(UTF-8)', $ARGV[2]) || die "Error opening t +he output file\n"; ... ... ... print OUTFILE "$entry[$i]\n"; ... ... ...
        still my words in the list of stop words would remain there ... :(

      Use Devel::Peek to get an ASCII-printable representation of the strings you're comparing, and then verify that what you think should match is in fact identical:

      use Devel::Peek; ... Dump lc($entry[$i]); Dump $stopword[$j]; if (lc($entry[$i]) eq $stopword[$j]) { ...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://743386]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (14)
As of 2015-07-06 20:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (83 votes), past polls