Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: How can I get a Unicode @ARGV?

by exilepanda (Monk)
on Sep 01, 2012 at 03:31 UTC ( #991114=note: print w/ replies, xml ) Need Help??


in reply to Re^3: How can I get a Unicode @ARGV?
in thread How can I get a Unicode @ARGV?

In cp950: dir/b It shows the files with unicode char correctly. However, dir/b > list.txt the content inside list turns into "????"

In cp65001: dir/b It shows the files with monster char. However, dir/b > list.txt gives the correct list.

Most confused me is that when you can see the String right, it doesn't mean the Data is right, vice versa. And I actually have no idea why Unicode chars can show correctly when dir (or drop a file path in the cmd console) in cp950, but then can't manipulate(@ARGV) later.


Comment on Re^4: How can I get a Unicode @ARGV?
Select or Download Code
Replies are listed 'Best First'.
Re^5: How can I get a Unicode @ARGV?
by remiah (Hermit) on Sep 01, 2012 at 13:24 UTC
    Hello.

    Can you do this ? or not ?

    1. Paste the function below into your script.
    2. This will ouput encoding information of $ARGV[0] to logfile.txt

    troubled_string($ARGV[0], "logfile.txt");
    3. Examine logfile.txt with your browser, and find normal string, changing "Encodings" of your browser.
    sub troubled_string{ my ($str,$logfile_path)=@_; use Encode qw(decode encode from_to encodings); open (my $fh, ">", $logfile_path) or die $!; printf $fh "utf8 flag:%s\n",utf8::is_utf8($str) ? "utf8 flagged" : + "not utf8 flagged"; printf $fh "hexdump:[%s]\n",utf8::is_utf8($str) ? unpack('U0H*', $ +str):unpack('H*', $s tr); printf $fh "length:[%s]\n", length($str); printf $fh "%s\n", "-" x 20; if ( utf8::is_utf8($str) ){ printf $fh "[encode trial]\n"; printf $fh "%-25s:%s\n",$_,encode($_, $str) for (Encode->encod +ings(":all")); } else { printf $fh "[decode -> encode trial]\n"; printf $fh "%-25s:%s\n",$_,encode($_, decode($_,$str)) for (En +code->encodings(":all")); } close $logfile_path; }
    What does it say?

Re^5: How can I get a Unicode @ARGV?
by nikosv (Hermit) on Sep 01, 2012 at 13:21 UTC

    the 'dir' command in Win cmd works in Unicode regardless the code page;it's one of those Windows quirks.Go ahead change the code page to eg Cyrillic and try it out,you'll get the same result

    In cp950: dir/b It shows the files with Unicode char correctly. However, dir/b > list.txt the content inside list turns into "????"

    Is list.txt saved as ANSI (default)?. Save list.txt as Unicode and try again

    In cp65001: dir/b It shows the files with monster char. However, dir/b > list.txt gives the correct list.

    I guess cp65001 sets the file i/o to Unicode,that is why you see list.txt with correct list what is a monster char? maybe a font issue? what font are you using, Lucida Console?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://991114]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2015-07-31 01:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls