Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

DWIM with non ASCII characters

by andreas1234567 (Vicar)
on May 07, 2010 at 06:49 UTC ( #838867=perlquestion: print w/replies, xml ) Need Help??
andreas1234567 has asked for the wisdom of the Perl Monks concerning the following question:

This question relates to a bug in Parse::Dia::SQL but is general in nature.

The question is how to e.g. handle my list of favorite Icelandic volcanoes[1]:

use strict; use warnings; print "Eyjafjallaj\x{f6}kull"; print "\x{de}\x{f3}r\x{f3}lfsfell"; print "\x{d6}r\x{e6}faj\x{f5}kull"; __END__
This will typicially print



use open qw/:std :utf8/;
makes the non ascii characters display as I expect, but I guess I could also have used locale settings.

What do you think is the best strategy for handling non ASCII characters?

[1] Chosen entirely for their non-ascii-ness.

No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]

Replies are listed 'Best First'.
Re: DWIM with non ASCII characters
by moritz (Cardinal) on May 07, 2010 at 07:17 UTC
      Decode everything that comes from the outside. Encode everything that leaves your program. use utf8;

      Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8 (so you can do e.g. my $ņ = 'foo'; where 'ņ' is not a single byte). It even says "Do not use this pragma for anything else than telling Perl that your script is written in UTF-8".

      I thought the preferred way to decode/encode the program's input/output was by using Encode.

       David Serrano
       (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).

        Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8

        Yes, that way you avoid concatenating decoded and non-decoded strings.

        Of course it requires your script to be actually stored in UTF-8. But since the more general solution (use encoding $your_encoding) is severly broken (wrt to AUTOLOAD, thread safety and other issues), that's currently the only sane way to store non-ASCII Perl programs.

        As for the rest, I can only agree to what ikegami wrote; using IO layers is much more convenient than using encode() and decode() on every IO operation. More importantly since there are fewer spots you have to care about encoding, the probability of forgetting it somewhere (and getting Mojibake in response) is much lower.

        Perl 6 - links to (nearly) everything that is Perl 6.

        More importantly, use utf8; allows you to do

        my $foo = 'ņ';

        So far, I've stuck to ASCII in my sources, so use utf8; wouldn't do anything for me.

        I thought the preferred way to decode/encode the program's input/output was by using Encode.

        No way. Why encode and decode everything yourself when you can let PerlIO do it. At least, that's the way I see it.

Re: DWIM with non ASCII characters
by cdarke (Prior) on May 07, 2010 at 07:13 UTC
    It could be that the terminal session you are printing to does not support the character set. For example cmd.exe is terrible at displaying anything non-American. I tried your code on Linux, and it displayed correctly if I used a Baltic character set such as ISO 8895-4.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://838867]
Approved by Old_Gray_Bear
Front-paged by Old_Gray_Bear
[Corion]: marto: Yeah, even though I didn't write any code :)
[Corion]: But at least I have a plan of action to move the site to https, played some (free!) VR games with friends and watched the plans for the next German Perl workshop progress ;)
[Corion]: AltSpace VR is amazingly good - highly polished and with some of the games you get for free what you'd pay EUR 20 or EUR 40 otherwise
[Corion]: But maybe it's also due to that I play with friends, which makes a game more enjoyable anyway ;)
[Corion]: Oh - I released a new version of some module, thanks to a pull request. But I don't consider "update Makefile.PL" and "update author tests" as "writing code" ;-D

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2017-08-21 09:24 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (319 votes). Check out past polls.