Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

unicode DWIM?

by Sec (Monk)
on Feb 24, 2005 at 09:55 UTC ( #433998=perlquestion: print w/replies, xml ) Need Help??
Sec has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to find my way around Unicode in perl. I want my scripts to have correct output no matter if they are called on latin1 or on utf8 terminals. I have already found the incredible useful use open OUT => ':locale'. However, I'm having trouble with commandline arguments. Is there a more sensible way than to do it this way:
use charnames ':full'; use open ':locale'; use Encode; for (PerlIO::get_layers(STDOUT)){ if (/encoding\((.*)\)/ || /(utf8)/ ){ $lc||=$1; } } print "Locale is: $lc\n"; $_="@ARGV"; $_=decode($lc,$_); $_=~s<([^\x{000a}\x{0020}-\x{007e}])>{ '\N{'.charnames::viacode(ord $1).'}' }ge; print $_,"\n";
This seems overly ugly, just to get perl to do the right thing.

Replies are listed 'Best First'.
Re: unicode DWIM?
by dragonchild (Archbishop) on Feb 24, 2005 at 13:56 UTC
    Wouldn't another way be to treat @ARGV as IO::Scalar and binmode them to :utf8 (or whatever)? At that point, you can use built-in functions (such as lc) and they will DWIM with regards to the Unicode-ness of the strings.

    Of course, that doesn't handle the locale stuff, but I've yet to encounter a situation where that was actually useful (unlike Redhat where it's specifically not useful ...)

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: unicode DWIM?
by penguinfuz (Pilgrim) on Feb 24, 2005 at 13:35 UTC
    I too have had an issue with input/output of German characters in my terminal and scripts - my solution has been to use Unicode::UTF8simple which works for my needs. (slurping in German text files, processing the text a bit and then writing the output to HTML or straight through to sendmail)

    I'm not sure that I understand your issue completely, but with the following code I can enter a word with German chars - - and the output is exactly what I expect. I hope this helps.

    use strict; if (@ARGV) { use Unicode::UTF8simple; my $uref = new Unicode::UTF8simple; my $utf8string=$uref->fromUTF8("iso-8859-1",@ARGV); my $string=$uref->toUTF8("iso-8859-1",$utf8string); print "I give you: $string\n"; }
Re: unicode DWIM?
by tphyahoo (Vicar) on Feb 24, 2005 at 10:42 UTC
    Could you give some examples of the commandline arguments you're having trouble with? I'm not that familiar with locale, but this is something that's given me trouble myself, and I'd like to learn more.

    Personally, I've had trouble with dos command line arguments that contain special german characters. My workaround was to save the files in "OEM" using editpad. (Whatever "OEM" means.) I've wanted a locale based solution for a while, but since I don't really understand locale... it's on the back burner.

    I'm not sure if this is the same sort of problem you're having.

      I am talking about non-us-ascii characters. Such as the german umlauts äöü - they are passed on the command line encoded in the current locale of the terminal (as one would expect), but i haven't been able to find a sane way to tell perl that "stdin, stdout and commandline" are in the locale that the user has set.
        Wrde mich auch mal interessieren wie man diese blden Umlaute sauber hinkriegt unter Windows
Re: unicode DWIM?
by dakkar (Hermit) on Feb 24, 2005 at 13:29 UTC

    I'm not sure get_layers is the best way to get the name of the locale's encoding, but assuming it is, this works: (Perl 5.8.2 linux)

    use open ':locale'; use Encode; use strict; my $locEnc; for (PerlIO::get_layers(STDOUT)){ if (/encoding\((.*)\)/ || /(utf8)/ ){ $locEnc=$1;last; } } print "Locale encoding: $locEnc\n"; for (@ARGV) { $_=Encode::decode($locEnc,$_) } my $i=1; for (@ARGV) { print $i++," '",$_,"'(",length($_),")\n"; }

    It decodes (in place) each command-line argument from the locale's encoding, then prints them alongside their length. This helps in understanding what kind of strings Perl thinks it is working with.

            dakkar - Mobilis in mobile

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://433998]
Approved by Happy-the-monk
Front-paged by Courage
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2018-05-21 09:57 GMT
Find Nodes?
    Voting Booth?