Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Whether 'use utf8;' is good style

by McA (Curate)
on Dec 18, 2012 at 16:56 UTC ( #1009426=note: print w/ replies, xml ) Need Help??


in reply to Re: Whether 'use utf8;' is good style
in thread Whether 'use utf8;' is good style

Hi,

thank you for your comment. After your answer I've seen that the context of my question was not precise enough.

Take the following example:

die ("Was für ein Müll!");

This is a German string saying "What a rubbish!". There are umlauts in that string. When I store this source code file Latin-1 encoded there is one byte per German umlaut. The string is interpreted as a byte string. And this byte string gets interpreted correctly as Perl assumes Latin-1 encoding. But: When you run in an UTF-8 environment you would see a square and not an 'ü' when the program dies. When you use ONLY Ascii characters it doesn't matter and you're never aware of this subtle difference.

So with use utf8; and a correct source code file encoding I would force a character semantic of this string which would result in a subtle different semantic of the string thrown.

And I want to know whether there are pitfalls, when someone is using a module with that pragma probably expecting the good old byte string world.

Best regards
McA


Comment on Re^2: Whether 'use utf8;' is good style
Select or Download Code
Re^3: Whether 'use utf8;' is good style
by chromatic (Archbishop) on Dec 18, 2012 at 17:03 UTC
    And I want to know whether there are pitfalls, when someone is using a module with that pragma probably expecting the good old byte string world.

    What would you expect to happen in that case? (I can't imagine things working correctly except by accident.)

      Hi chromatic,

      does that mean you suggest avoiding use utf8; in public packages?

      Best regards
      McA

        Quite the contrary. If you have literal strings you intend for Perl to deal with in the UTF-8 encoding, you'd better use that pragma.

        If you're not explicit about Unicode and encodings, you're going to make a mess for other people to deal with. If you put UTF-8 in literals in your programs and don't use the utf8 pragma, you've acted irresponsibly.

Re^3: Whether 'use utf8;' is good style
by space_monk (Chaplain) on Dec 19, 2012 at 12:22 UTC

    Incidentally if your original question is not precise enough you can edit it or add more information. Its traditional to highlight the fact that you've put more information in by adding the word Update to show what you've added.

    A Monk aims to give answers to those who have none, and to learn from those who know more.
Re^3: Whether 'use utf8;' is good style
by grantm (Parson) on Dec 20, 2012 at 03:59 UTC
    There are umlauts in that string. When I store this source code file Latin-1 encoded there is one byte per German umlaut. The string is interpreted as a byte string. And this byte string gets interpreted correctly as Perl assumes Latin-1 encoding.

    I don't thinks it's entirely accurate to say "Perl assumes Latin-1". It's probably better to say that Perl assumes binary or byte semantics - the bytes from your source code will be output unmodified. If the characters in the source code were in ISO8859-5 Cyrillic it would also "work" in the way you describe.

    However, I would recommend that if you have non-ASCII characters in your source code, you should save the source file in UTF8 format and add the use utf8; pragma.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1009426]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (14)
As of 2014-09-23 12:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (219 votes), past polls