http://www.perlmonks.org?node_id=1009387

McA has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I just wanted to ask whether the usage of use utf8; in a package which gets published on CPAN is good style or not.

Is there a way to query which packages are currently using that pragma?

Opinions requested.

Best regards
McA

Replies are listed 'Best First'.
Re: Whether 'use utf8;' is good style
by SuicideJunkie (Vicar) on Dec 18, 2012 at 15:12 UTC

    Are you actually using utf8 in your source code? If so, you must have it. If not, you don't.

    I don't see why you'd want to include something that doesn't help. If some future maintainer finds they want to use utf8, they can easily add use utf8 to the top.

      Hi,

      thank you for your comment. After your answer I've seen that the context of my question was not precise enough.

      Take the following example:

      die ("Was für ein Müll!");

      This is a German string saying "What a rubbish!". There are umlauts in that string. When I store this source code file Latin-1 encoded there is one byte per German umlaut. The string is interpreted as a byte string. And this byte string gets interpreted correctly as Perl assumes Latin-1 encoding. But: When you run in an UTF-8 environment you would see a square and not an 'ü' when the program dies. When you use ONLY Ascii characters it doesn't matter and you're never aware of this subtle difference.

      So with use utf8; and a correct source code file encoding I would force a character semantic of this string which would result in a subtle different semantic of the string thrown.

      And I want to know whether there are pitfalls, when someone is using a module with that pragma probably expecting the good old byte string world.

      Best regards
      McA

        And I want to know whether there are pitfalls, when someone is using a module with that pragma probably expecting the good old byte string world.

        What would you expect to happen in that case? (I can't imagine things working correctly except by accident.)

        There are umlauts in that string. When I store this source code file Latin-1 encoded there is one byte per German umlaut. The string is interpreted as a byte string. And this byte string gets interpreted correctly as Perl assumes Latin-1 encoding.

        I don't thinks it's entirely accurate to say "Perl assumes Latin-1". It's probably better to say that Perl assumes binary or byte semantics - the bytes from your source code will be output unmodified. If the characters in the source code were in ISO8859-5 Cyrillic it would also "work" in the way you describe.

        However, I would recommend that if you have non-ASCII characters in your source code, you should save the source file in UTF8 format and add the use utf8; pragma.

        Incidentally if your original question is not precise enough you can edit it or add more information. Its traditional to highlight the fact that you've put more information in by adding the word Update to show what you've added.

        A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: Whether 'use utf8;' is good style
by ikegami (Patriarch) on Dec 19, 2012 at 12:40 UTC

    It's not a style issue. use utf8; tells Perl the file is encoded using UTF-8. If the file is encoded using UTF-8, use use utf8;. If your source code is encoded using US-ASCII or iso-8859-1, don't use use utf8;. It's that simple.

    (Whether you encode your file using UTF-8 or not would be a style issue, but I don't see any real difference.)