Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
go ahead... be a heretic
 
PerlMonks  

Whether 'use utf8;' is good style

by McA (Deacon)
on Dec 18, 2012 at 14:53 UTC ( #1009387=perlquestion: print w/ replies, xml ) Need Help??
McA has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I just wanted to ask whether the usage of use utf8; in a package which gets published on CPAN is good style or not.

Is there a way to query which packages are currently using that pragma?

Opinions requested.

Best regards
McA

Comment on Whether 'use utf8;' is good style
Download Code
Re: Whether 'use utf8;' is good style
by SuicideJunkie (Priest) on Dec 18, 2012 at 15:12 UTC

    Are you actually using utf8 in your source code? If so, you must have it. If not, you don't.

    I don't see why you'd want to include something that doesn't help. If some future maintainer finds they want to use utf8, they can easily add use utf8 to the top.

      Hi,

      thank you for your comment. After your answer I've seen that the context of my question was not precise enough.

      Take the following example:

      die ("Was für ein Müll!");

      This is a German string saying "What a rubbish!". There are umlauts in that string. When I store this source code file Latin-1 encoded there is one byte per German umlaut. The string is interpreted as a byte string. And this byte string gets interpreted correctly as Perl assumes Latin-1 encoding. But: When you run in an UTF-8 environment you would see a square and not an 'ü' when the program dies. When you use ONLY Ascii characters it doesn't matter and you're never aware of this subtle difference.

      So with use utf8; and a correct source code file encoding I would force a character semantic of this string which would result in a subtle different semantic of the string thrown.

      And I want to know whether there are pitfalls, when someone is using a module with that pragma probably expecting the good old byte string world.

      Best regards
      McA

        And I want to know whether there are pitfalls, when someone is using a module with that pragma probably expecting the good old byte string world.

        What would you expect to happen in that case? (I can't imagine things working correctly except by accident.)

        Incidentally if your original question is not precise enough you can edit it or add more information. Its traditional to highlight the fact that you've put more information in by adding the word Update to show what you've added.

        A Monk aims to give answers to those who have none, and to learn from those who know more.
        There are umlauts in that string. When I store this source code file Latin-1 encoded there is one byte per German umlaut. The string is interpreted as a byte string. And this byte string gets interpreted correctly as Perl assumes Latin-1 encoding.

        I don't thinks it's entirely accurate to say "Perl assumes Latin-1". It's probably better to say that Perl assumes binary or byte semantics - the bytes from your source code will be output unmodified. If the characters in the source code were in ISO8859-5 Cyrillic it would also "work" in the way you describe.

        However, I would recommend that if you have non-ASCII characters in your source code, you should save the source file in UTF8 format and add the use utf8; pragma.

Re: Whether 'use utf8;' is good style
by ikegami (Pope) on Dec 19, 2012 at 12:40 UTC

    It's not a style issue. use utf8; tells Perl the file is encoded using UTF-8. If the file is encoded using UTF-8, use use utf8;. If your source code is encoded using US-ASCII or iso-8859-1, don't use use utf8;. It's that simple.

    (Whether you encode your file using UTF-8 or not would be a style issue, but I don't see any real difference.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1009387]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (15)
As of 2014-04-18 19:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (471 votes), past polls