Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: arabic alphabet ... how to deal with?

by kennethk (Abbot)
on Feb 12, 2009 at 16:41 UTC ( #743386=note: print w/replies, xml ) Need Help??

in reply to arabic alphabet ... how to deal with?

Read through perlunicode. All your I/O operations need to be performed in UTF-8. That means not only open (STOPWORDS, '<:encoding(UTF-8)', $ARGV[1]) as ForgotPasswordAgain suggests and open (INFILE, '<:encoding(UTF-8)', $ARGV[0]) as derby suggests, but also binmode STDOUT, ":encoding(utf8)" before you try to print. The fact that it works with "standard" text says it is almost guaranteed to be a Unicode problem.

Replies are listed 'Best First'.
Re^2: arabic alphabet ... how to deal with?
by Anonymous Monk on Feb 12, 2009 at 16:53 UTC
    I tried this way as well before, this way no output ;)
    #!/usr/bin/perl open (STOPWORDS, '<:encoding(UTF-8)', $ARGV[1]) || die "Error opening +the stopwords file\n"; $count = 0; while ($word = <STOPWORDS>) { chop($word); $stopword[$count] = lc($word); $count++; } close(STOPWORDS); open (INFILE ,'<:encoding(UTF-8)', $ARGV[0]) || die "Error opening the + input file\n"; while ($line = <INFILE>) { chop($line); @entry = split(/ /, $line); $i = 0; while ($entry[$i]) { $found = 0; $j = 0; while (($j<=$count) && ($found==0)) { if (lc($entry[$i]) eq $stopword[$j]) { $found = 1; } $j++; } if ($found == 0) { print FH "$entry[$i]\n"; } $i++; } } close(INFILE);
      In this case, you have an orphaned file handle FH which is never associated with a file or channel.
        when I write in this way also :
        open (OUTFILE ,'>>:encoding(UTF-8)', $ARGV[2]) || die "Error opening t +he output file\n"; ... ... ... print OUTFILE "$entry[$i]\n"; ... ... ...
        still my words in the list of stop words would remain there ... :(

      Use Devel::Peek to get an ASCII-printable representation of the strings you're comparing, and then verify that what you think should match is in fact identical:

      use Devel::Peek; ... Dump lc($entry[$i]); Dump $stopword[$j]; if (lc($entry[$i]) eq $stopword[$j]) { ...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://743386]
[erix]: and what's the "Vineyard"? Don't tell me it's doing supernatural stuff too.
[shmem]: go figure. Might help to open views to other ways of perceiving reality. No, no supernatural stuff.
[shmem]: the "Vineyard" is a biblic term, also. But that's not the point.
[erix]: sorry, my attention is a rar commodity. I will not squander it on such pro-russia sites :)
[erix]: *rare commodity
[shmem]: erix: you are utterly mistaken in marking that site as "pro-russian".
[erix]: how do you mean? the bear, explanations of Putin's "election", the fearsome new russian weaponry. I came across them immediately. Just coincidence?

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2018-03-19 21:01 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (246 votes). Check out past polls.