Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

How can I get a Unicode @ARGV?

by exilepanda (Pilgrim)
on Aug 31, 2012 at 02:19 UTC ( #990899=perlquestion: print w/replies, xml ) Need Help??
exilepanda has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a script that packed as Executable. The usage is drop files or folder to the executable and then prompt for "What to do" option.

Everything goes fine, until a new problem comes, which is @ARGV can't recognize Unicode file/dir name. and the path passing in becomes ?????. Then unable to locate the path for action.

Since the client side do not have have Perl installed, so this can't be perl -CA "" or if I try to use #!perl -CA and it said It's too late for "-CA" option

I've did some research, some may suggest to use I18N::Langinfo This drive me another problem, it is I can't install the module whatever I tried.

I tried ppm install, don't find this module.
I tried cpan install, fail!
cpan force install, still fail
cpan fforce install, the download and fetching starts.... but ends up said program too big to fit in memory.
btw, I tried both on Perl 5.10 and 5.14.

Anyway, my question is how to retrieve the @ARGV correctly? I am working on a Big5(Traditional Chi) XP system, however my script need to read GB(Simplified Chi), and Japanese char(path). What can I do about it? Please help, Thanks in advance!

Replies are listed 'Best First'.
Re: How can I get a Unicode @ARGV?
by Anonymous Monk on Aug 31, 2012 at 03:04 UTC

    I have a script that packed as Executable

    Which packer?

    and the path passing in becomes ?????.

    This part cmd.exe does on its own, you have to chcp 65001 to get it to not molest unicode, or use powershell

    See also Win32::Unicode, Win32::Unicode::Native decodes @ARGV

      Which packer?
      I have Cava and PerlApp, but yet I only tried with PerlApp

      becomes ?????.
      Yes. it turns into question marks.

      ...chcp 65001...
      I think it's too late to chcp too. What I think is while the moment the file path is dropped to the exe file, it's already turned into "?????" before perl accept the argv.. But I have no way to know about...

      I use Win32::Unicode::* for other occasions and works fine, only require I have put the right String source first. And I am dead at the very first moment...

        Yes. it turns into question marks.... I think it's too late to chcp too.

        :) That wasn't a question, I was quoting the line I was responding to :)

        But I totally missed the drag/drop thing

        So you're seeing "??????" in the console? What is Data::Dumper output?

        I imagine using Win32::Unicode::Native ought to work, but if it doesn't, these two might

        Win32::CommandLine - Retrieve and reparse the Win32 command line

        update: it probably won't, no mention of unicode/wchar or GetCommandLineW

        Win32::Process::CommandLine - Perl extension for getting win32 process command line parameters

        But if they don't, then I think recompiling perl ( runperl.c ) with wmain ought to work, but then that might be tough to manage with with perlapp/cava, though it wouldn't surprise me if this step isn't necessary

        But, you know :) you could always compile a foo.exe which uses wmain and calls your perlapp packed perl.exe with -CSD or whatever :)

Re: How can I get a Unicode @ARGV?
by remiah (Hermit) on Aug 31, 2012 at 05:18 UTC

    When dropped, your program is kicked with path name of the dropped file and it was set to @ARGV?

    Can you examine the encoding of @ARGV? For example, decode it with UTF16LE, GB2312, CP932, or other possibilities. Show it and choose correct one.

    If it is from command line, CP932 will decode @ARG properly with Japanese XP.

    But your case seems different...

Re: How can I get a Unicode @ARGV?
by nikosv (Chaplain) on Aug 31, 2012 at 06:22 UTC
    the translation from Unicode will be done with your "System Default Code page - Language for non-Unicode programms"
    So if you set it to Japanese it should get the correct characters.
    But since this setting has an instalation wide effect you can't have both Japanese and Simplified Chi, you must choose.
      I don't sure about this.. as I didn't see any proper / possible place to insert an intercept to investigate... however, if I open a cmd console, the default chcp is 950, and when I run my perl code, with Win32::Codepage, it is still telling me I am working with cp950.

      However, if I left a clean cmd open, and drop a file there, the unicode file/dir name can show correctly. Is that possible mean there is not much to deal with the codepage?

      So I have a guess, what if the strings already turned to ANSI before able to pipe to my script?

        cp950 is not Unicode, cp65001 is. When you do the drag and drop operation on the packaged executable an API call occurs which probably works with ANSI but even then you start with a UTF16 file which when you drop on it uses the Language for non-Unicode programs.So if the file path is in Japanese and the system page is cp950 what happens is UTF16 -> cp950 which is Big5/Chinese not Japanese and the Unicode mapping is not correct thus the question marks
Re: How can I get a Unicode @ARGV?
by philiprbrenan (Monk) on Aug 31, 2012 at 11:51 UTC

    An alternative might be to place the unicode text in a file and then drop that file name on your executable. It would then be staight forward to read the unicode text via:

    open(my $F, "<:encoding(UTF-8)", $f) or die "Cannot open $f for unic +ode input";
Re: How can I get a Unicode @ARGV?
by freonpsandoz (Acolyte) on Jun 10, 2017 at 22:19 UTC

    I am having this problem also. Where can I get the packages recommended as possible solutions, like Win32-Unicode and Win32-CommandLine? I am using ActiveState Perl v5.20.2 and ppm doesn't show these packages. Thanks.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://990899]
Approved by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2017-09-20 20:18 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (239 votes). Check out past polls.