Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Issue with reading a unicode file

by cstar (Initiate)
on Jan 07, 2013 at 05:04 UTC ( #1011948=perlquestion: print w/ replies, xml ) Need Help??
cstar has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have a unciode file with some chinese text having the titles of some winodws which i have to search for. When i read the file for window title, perl is getting unicode data different from input file Below 2 snippets explains my problem

Script 1:

use Encode; use Win32::GuiTest qw(FindWindowLike); open(MYFILE, '<:encoding(UTF-8)',"saml.txt") || die "cannot open: $!" +; open(OUTFILE,'>:encoding(UTF-8)',"out.txt") || die "cannot open: $!"; $WindowTitle=<MYFILE>; #reading the chinese window title from input f +ile chomp($WindowTitle); binmode(STDOUT, ":utf8"); print "$WindowTitle\n"; #===> Here perl prints out some chinese text +to the command console, but different from what is given in input fil +e my @hwnd=Win32::GuiTest::FindWindowLike(undef,$WindowTitle); if($hwnd[0]) { print "window found\n"; } else { print "window not found\n"; } print OUTFILE $WindowTitle; #==> Here perl prints out same chinese te +xt as input to the outfile

Script 2

use Encode; use Win32::GuiTest qw(FindWindowLike); $WindowTitle="V VM"; #Hardcoded the window title in chinese binmode(STDOUT, ":utf8"); print "$WindowTitle\n"; #===> Here perl prints out some chinese text +to the command console, but different from what is given in input fil +e my @hwnd=Win32::GuiTest::FindWindowLike(undef,$WindowTitle); if($hwnd[0]) { print "window found\n"; } else { print "window not found\n"; }

Script 1 reads the chinese window title from unicode file and says that window is not present though window is actually present. Script 2 has chinese window title hardcoded, hence it is giving proper output as window present. What am i doing wrong while trying to read the unicode file. Please help

Comment on Issue with reading a unicode file
Select or Download Code
Re: Issue with reading a unicode file
by quester (Vicar) on Jan 07, 2013 at 06:36 UTC
    As a guess, since you seem to be on Windows, your input file is likely to begin with a Byte Order Mark (BOM), which Microsoft uses as a convention to distinguish the various flavors of UTF. A UTF-8 byte order mark would be three bytes long,  0xEF,0xBB,0xBF. In perl, it appears as the code point  "\N{U+FEFF}". You could try  tr/\N{U+FEFF}//d to remove it.
Re: Issue with reading a unicode file
by Anonymous Monk on Jan 07, 2013 at 07:33 UTC
    And the bytes of this mysterious file are?
Re: Issue with reading a unicode file
by choroba (Canon) on Jan 07, 2013 at 09:48 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1011948]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2015-07-02 18:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (44 votes), past polls