Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: How can I get a Unicode @ARGV?

by nikosv (Hermit)
on Aug 31, 2012 at 17:48 UTC ( #991055=note: print w/ replies, xml ) Need Help??


in reply to Re^2: How can I get a Unicode @ARGV?
in thread How can I get a Unicode @ARGV?

cp950 is not Unicode, cp65001 is. When you do the drag and drop operation on the packaged executable an API call occurs which probably works with ANSI but even then you start with a UTF16 file which when you drop on it uses the Language for non-Unicode programs.So if the file path is in Japanese and the system page is cp950 what happens is UTF16 -> cp950 which is Big5/Chinese not Japanese and the Unicode mapping is not correct thus the question marks


Comment on Re^3: How can I get a Unicode @ARGV?
Re^4: How can I get a Unicode @ARGV?
by exilepanda (Monk) on Sep 01, 2012 at 03:31 UTC
    In cp950: dir/b It shows the files with unicode char correctly. However, dir/b > list.txt the content inside list turns into "????"

    In cp65001: dir/b It shows the files with monster char. However, dir/b > list.txt gives the correct list.

    Most confused me is that when you can see the String right, it doesn't mean the Data is right, vice versa. And I actually have no idea why Unicode chars can show correctly when dir (or drop a file path in the cmd console) in cp950, but then can't manipulate(@ARGV) later.

      the 'dir' command in Win cmd works in Unicode regardless the code page;it's one of those Windows quirks.Go ahead change the code page to eg Cyrillic and try it out,you'll get the same result

      In cp950: dir/b It shows the files with Unicode char correctly. However, dir/b > list.txt the content inside list turns into "????"

      Is list.txt saved as ANSI (default)?. Save list.txt as Unicode and try again

      In cp65001: dir/b It shows the files with monster char. However, dir/b > list.txt gives the correct list.

      I guess cp65001 sets the file i/o to Unicode,that is why you see list.txt with correct list what is a monster char? maybe a font issue? what font are you using, Lucida Console?

      Hello.

      Can you do this ? or not ?

      1. Paste the function below into your script.
      2. This will ouput encoding information of $ARGV[0] to logfile.txt

      troubled_string($ARGV[0], "logfile.txt");
      3. Examine logfile.txt with your browser, and find normal string, changing "Encodings" of your browser.
      sub troubled_string{ my ($str,$logfile_path)=@_; use Encode qw(decode encode from_to encodings); open (my $fh, ">", $logfile_path) or die $!; printf $fh "utf8 flag:%s\n",utf8::is_utf8($str) ? "utf8 flagged" : + "not utf8 flagged"; printf $fh "hexdump:[%s]\n",utf8::is_utf8($str) ? unpack('U0H*', $ +str):unpack('H*', $s tr); printf $fh "length:[%s]\n", length($str); printf $fh "%s\n", "-" x 20; if ( utf8::is_utf8($str) ){ printf $fh "[encode trial]\n"; printf $fh "%-25s:%s\n",$_,encode($_, $str) for (Encode->encod +ings(":all")); } else { printf $fh "[decode -> encode trial]\n"; printf $fh "%-25s:%s\n",$_,encode($_, decode($_,$str)) for (En +code->encodings(":all")); } close $logfile_path; }
      What does it say?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://991055]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-09-21 08:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (168 votes), past polls