http://www.perlmonks.org?node_id=990927


in reply to How can I get a Unicode @ARGV?

the translation from Unicode will be done with your "System Default Code page - Language for non-Unicode programms"
So if you set it to Japanese it should get the correct characters.
But since this setting has an instalation wide effect you can't have both Japanese and Simplified Chi, you must choose.

Replies are listed 'Best First'.
Re^2: How can I get a Unicode @ARGV?
by exilepanda (Friar) on Aug 31, 2012 at 13:28 UTC
    I don't sure about this.. as I didn't see any proper / possible place to insert an intercept to investigate... however, if I open a cmd console, the default chcp is 950, and when I run my perl code, with Win32::Codepage, it is still telling me I am working with cp950.

    However, if I left a clean cmd open, and drop a file there, the unicode file/dir name can show correctly. Is that possible mean there is not much to deal with the codepage?

    So I have a guess, what if the strings already turned to ANSI before able to pipe to my script?

      cp950 is not Unicode, cp65001 is. When you do the drag and drop operation on the packaged executable an API call occurs which probably works with ANSI but even then you start with a UTF16 file which when you drop on it uses the Language for non-Unicode programs.So if the file path is in Japanese and the system page is cp950 what happens is UTF16 -> cp950 which is Big5/Chinese not Japanese and the Unicode mapping is not correct thus the question marks
        In cp950: dir/b It shows the files with unicode char correctly. However, dir/b > list.txt the content inside list turns into "????"

        In cp65001: dir/b It shows the files with monster char. However, dir/b > list.txt gives the correct list.

        Most confused me is that when you can see the String right, it doesn't mean the Data is right, vice versa. And I actually have no idea why Unicode chars can show correctly when dir (or drop a file path in the cmd console) in cp950, but then can't manipulate(@ARGV) later.