http://www.perlmonks.org?node_id=11103384

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

How to know if string/filename is utf8 encoded or decoded. how to avoid getting string encoded/decoded twice.
Because for me Encode::decode is working for one string while it is not working on another string.
for exa. it is working with filename "test1℗ὓ.txt" while not working with filename "1669-SCC-HôpitauxdeSaint-Maurice-POC.PIF".

Thank you.

  • Comment on How to know to know if string is utf8 encoded or decoded.

Replies are listed 'Best First'.
Re: How to know to know if string is utf8 encoded or decoded.
by haukex (Archbishop) on Jul 25, 2019 at 17:17 UTC
Re: How to know to know if string is utf8 encoded or decoded.
by choroba (Cardinal) on Jul 25, 2019 at 16:17 UTC
    What do you mean by "is working"? The following works when the script is saved as UTF-8 and run in a UTF-8 terminal:
    #!/usr/bin/perl
    use warnings;
    use strict;
    use feature qw{ say };
    use utf8;
     
    use Encode;
     
    my @strings = ('test1℗ὓ.txt', '1669-SCC-HôpitauxdeSaint-Maurice-POC.PIF');
     
    for my $string (@strings) {
        say encode('UTF-8', $string);
    }
    

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thank you for the reply.
      below is the line which seems to have issue.
      eval {$result = ConvertEncoding($string,"utf8",'MIME-Header')}
      #convertEncoding uses below code to decode string

      eval { $unicode = Encode::decode($from,$str); }; if ($@) { &ConvertEncodingError("($from -> utf8)\n$@"); return $str; }

        Wrapping the code in eval only makes sense if you ask decode to die if it can't decode:
        eval { $unicode = Encode::decode($from,$str,Encode::FB_CROAK); }; if ($@) { &ConvertEncodingError("($from -> utf8)\n$@"); return $str; }
        You should be aware that in case of a decoding error, $str will be overwritten.

        eval { $unicode = Encode::decode($from,$str); #here $from is +'MIME-Header' }; if ($@) { &ConvertEncodingError("($from -> utf8)\n$@"); return $str; }