http://www.perlmonks.org?node_id=857018

squentin has asked for the wisdom of the Perl Monks concerning the following question:

How do I get a utf8 time string using POSIX::strftime ?

When using a utf8 locale, for example fr_FR.utf8, the output of POSIX::strftime is encoded in utf8 but without the utf8 flag on, ie: returns a byte string and not a character string. So when using utf8::upgrade on it (what the gtk2 bindings do), or when printing it to a file using ">:utf8", the non-ascii characters become garbage.

And of course, when using a non-utf8 locale such as fr_FR, the return value of POSIX::strftime is encoded in a locale specific encoding.

So what is the best way to get a proper utf8 string ? Do I really have to look at the locale value myself to know how to convert the string ?

Shouldn't that behavior be considered a bug ? (though probably hard to fix without breaking some existing programs) It should at least be mentioned in the documentation.

example code:

use POSIX "strftime"; my $s=strftime("%c",localtime); open my($fh),">","without_utf8"; printf $fh $s; open my($fh2),">:utf8","with_utf8"; printf $fh2 $s;

(tested with perl v5.10.1 and v5.12.1)

Replies are listed 'Best First'.
Re: POSIX::strftime encoding
by Khen1950fx (Canon) on Aug 25, 2010 at 06:20 UTC
    "without_utf8" contains the correct utf8 date

    That is actually correct. "fr_FR.utf8" means that, because of the utf8 string, utf8 is default. You don't need to call utf8 because perl will do it automatically; however, if you call utf8, the wrong move, then you get some weird stuff in return.

    Try this with your locale fr_FR. Calling binmode should work for you:

    #!/usr/bin/perl use strict; use warnings; use Time::Piece; open STDOUT, '>', 'time.log'; my $t = localtime; print $t->strftime("%c"), "\n"; my $mt = localtime; binmode STDOUT, ":utf8"; print $mt->strftime("%c"), "\n";

      Yes, I understand what is happening, that's not what I'm asking, sorry if I wasn't clear, let me rephrase it.

      What I want is use the return value of POSIX::strftime in gtk2, the bindings use utf8::upgrade on all the strings sent to gtk2 functions.

      The question is how do I make sure the string isn't mangled by that, do I :

      a) consider the locale is utf8, and use utf8::decode on it to turn on its utf8 flag.

      b) use some unknown function that will use the locale to correctly decode the string from whatever encoding the locale is using, and turn it into a valid utf8 string.

      c) implement the unknown b) function myself

      And also 2 related questions:
      - is it a bug ?
      - shouldn't this be documented in the man page for POSIX::strftime ?

Re: POSIX::strftime encoding
by squentin (Sexton) on Sep 01, 2010 at 22:07 UTC

    For what it's worth, I did the following :

    use POSIX qw/setlocale LC_TIME strftime/; use Encode; my ($strftime_encoding)= setlocale(LC_TIME)=~m#\.([^@]+)#; sub strftime2 # try to return an utf8 value from strftime { $strftime_encoding ? Encode::decode($strftime_encoding, &strftime +) : &strftime; }

    It works if the encoding is specified in the locale, which seems to be usually the case when using utf8.

    And if the encoding is not in the locale, I keep the returned string as is, it won't work with all locales, but at least it works with fr_FR.

    It's not perfect, but it's better than before, and simple.