http://www.perlmonks.org?node_id=1043473

McA has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

there is much stuff out there for utf8::is_utf8. My question: Is this a valid/accepted/reliable function to introspect a perl string? Is it valid to rely on the upgrading semantics when I concatenate a utf8 flagged string with an unflagged one?

Best regards
McA

Replies are listed 'Best First'.
Re: utf8::is_utf8 valid introspection?
by dave_the_m (Monsignor) on Jul 10, 2013 at 15:13 UTC
    Code that needs to use utf8::is_utf8 (apart from for debugging purposes) is, in general, likely to be buggy. Most of the time your code shouldn't need to care what internal format perl's currently using to store strings.

    Except of course for "the Unicode bug", where the state of the utf8 flag on strings effects things like regexes for chars in the range 0x80..0xff. This has been reduced in more recent perls by the addition of things like the //a match modifier.

    Dave.

Re: utf8::is_utf8 valid introspection?
by ikegami (Patriarch) on Jul 11, 2013 at 00:23 UTC

    If you need to work around a bug, just use

    utf8::upgrade($var);

    or

    utf8::downgrade($var);

    to get the the string in the expected storage format (regardless of the current storage format).

    The only use I can think of for utf8::is_utf8 is for debugging, but I use Devel::Peek's Dump when I want to peek at a scalar's internals.