Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

DWIM with non ASCII characters

by andreas1234567 (Vicar)
on May 07, 2010 at 06:49 UTC ( #838867=perlquestion: print w/ replies, xml ) Need Help??
andreas1234567 has asked for the wisdom of the Perl Monks concerning the following question:

This question relates to a bug in Parse::Dia::SQL but is general in nature.

The question is how to e.g. handle my list of favorite Icelandic volcanoes[1]:

use strict; use warnings; print "Eyjafjallaj\x{f6}kull"; print "\x{de}\x{f3}r\x{f3}lfsfell"; print "\x{d6}r\x{e6}faj\x{f5}kull"; __END__
This will typicially print

Eyjafjallaj�kull
��r�lfsfell
�r�faj�kull

Adding

use open qw/:std :utf8/;
makes the non ascii characters display as I expect, but I guess I could also have used locale settings.

What do you think is the best strategy for handling non ASCII characters?

[1] Chosen entirely for their non-ascii-ness.

--
No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]

Comment on DWIM with non ASCII characters
Select or Download Code
Re: DWIM with non ASCII characters
by cdarke (Prior) on May 07, 2010 at 07:13 UTC
    It could be that the terminal session you are printing to does not support the character set. For example cmd.exe is terrible at displaying anything non-American. I tried your code on Linux, and it displayed correctly if I used a Baltic character set such as ISO 8895-4.
Re: DWIM with non ASCII characters
by moritz (Cardinal) on May 07, 2010 at 07:17 UTC
      Decode everything that comes from the outside. Encode everything that leaves your program. use utf8;

      Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8 (so you can do e.g. my $ = 'foo'; where '' is not a single byte). It even says "Do not use this pragma for anything else than telling Perl that your script is written in UTF-8".

      I thought the preferred way to decode/encode the program's input/output was by using Encode.

      --
       David Serrano
       (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).

        More importantly, use utf8; allows you to do

        my $foo = '';

        So far, I've stuck to ASCII in my sources, so use utf8; wouldn't do anything for me.

        I thought the preferred way to decode/encode the program's input/output was by using Encode.

        No way. Why encode and decode everything yourself when you can let PerlIO do it. At least, that's the way I see it.

        Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8

        Yes, that way you avoid concatenating decoded and non-decoded strings.

        Of course it requires your script to be actually stored in UTF-8. But since the more general solution (use encoding $your_encoding) is severly broken (wrt to AUTOLOAD, thread safety and other issues), that's currently the only sane way to store non-ASCII Perl programs.

        As for the rest, I can only agree to what ikegami wrote; using IO layers is much more convenient than using encode() and decode() on every IO operation. More importantly since there are fewer spots you have to care about encoding, the probability of forgetting it somewhere (and getting Mojibake in response) is much lower.

        Perl 6 - links to (nearly) everything that is Perl 6.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://838867]
Approved by Old_Gray_Bear
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2014-12-25 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls