Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

unicode DWIM?

by Sec (Monk)
on Feb 24, 2005 at 09:55 UTC ( [id://433998]=perlquestion: print w/replies, xml ) Need Help??

Sec has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to find my way around Unicode in perl. I want my scripts to have correct output no matter if they are called on latin1 or on utf8 terminals. I have already found the incredible useful use open OUT => ':locale'. However, I'm having trouble with commandline arguments. Is there a more sensible way than to do it this way:
use charnames ':full'; use open ':locale'; use Encode; for (PerlIO::get_layers(STDOUT)){ if (/encoding\((.*)\)/ || /(utf8)/ ){ $lc||=$1; } } print "Locale is: $lc\n"; $_="@ARGV"; $_=decode($lc,$_); $_=~s<([^\x{000a}\x{0020}-\x{007e}])>{ '\N{'.charnames::viacode(ord $1).'}' }ge; print $_,"\n";
This seems overly ugly, just to get perl to do the right thing.

Replies are listed 'Best First'.
Re: unicode DWIM?
by dragonchild (Archbishop) on Feb 24, 2005 at 13:56 UTC
    Wouldn't another way be to treat @ARGV as IO::Scalar and binmode them to :utf8 (or whatever)? At that point, you can use built-in functions (such as lc) and they will DWIM with regards to the Unicode-ness of the strings.

    Of course, that doesn't handle the locale stuff, but I've yet to encounter a situation where that was actually useful (unlike Redhat where it's specifically not useful ...)

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: unicode DWIM?
by penguinfuz (Pilgrim) on Feb 24, 2005 at 13:35 UTC
    I too have had an issue with input/output of German characters in my terminal and scripts - my solution has been to use Unicode::UTF8simple which works for my needs. (slurping in German text files, processing the text a bit and then writing the output to HTML or straight through to sendmail)

    I'm not sure that I understand your issue completely, but with the following code I can enter a word with German chars - äöüß - and the output is exactly what I expect. I hope this helps.

    use strict; if (@ARGV) { use Unicode::UTF8simple; my $uref = new Unicode::UTF8simple; my $utf8string=$uref->fromUTF8("iso-8859-1",@ARGV); my $string=$uref->toUTF8("iso-8859-1",$utf8string); print "I give you: $string\n"; }
Re: unicode DWIM?
by tphyahoo (Vicar) on Feb 24, 2005 at 10:42 UTC
    Could you give some examples of the commandline arguments you're having trouble with? I'm not that familiar with locale, but this is something that's given me trouble myself, and I'd like to learn more.

    Personally, I've had trouble with dos command line arguments that contain special german characters. My workaround was to save the files in "OEM" using editpad. (Whatever "OEM" means.) I've wanted a locale based solution for a while, but since I don't really understand locale... it's on the back burner.

    I'm not sure if this is the same sort of problem you're having.

      I am talking about non-us-ascii characters. Such as the german umlauts äöü - they are passed on the command line encoded in the current locale of the terminal (as one would expect), but i haven't been able to find a sane way to tell perl that "stdin, stdout and commandline" are in the locale that the user has set.
        Würde mich auch mal interessieren wie man diese blöden Umlaute sauber hinkriegt unter Windows
Re: unicode DWIM?
by dakkar (Hermit) on Feb 24, 2005 at 13:29 UTC

    I'm not sure get_layers is the best way to get the name of the locale's encoding, but assuming it is, this works: (Perl 5.8.2 linux)

    use open ':locale'; use Encode; use strict; my $locEnc; for (PerlIO::get_layers(STDOUT)){ if (/encoding\((.*)\)/ || /(utf8)/ ){ $locEnc=$1;last; } } print "Locale encoding: $locEnc\n"; for (@ARGV) { $_=Encode::decode($locEnc,$_) } my $i=1; for (@ARGV) { print $i++," '",$_,"'(",length($_),")\n"; }

    It decodes (in place) each command-line argument from the locale's encoding, then prints them alongside their length. This helps in understanding what kind of strings Perl thinks it is working with.

    -- 
            dakkar - Mobilis in mobile
    

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://433998]
Approved by Happy-the-monk
Front-paged by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-23 14:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found