http://www.perlmonks.org?node_id=832819

swilting has asked for the wisdom of the Perl Monks concerning the following question:

how to best way to encode all of data in input

like this

#!/usr/bin/perl use Encode qw(all); my $ENC_ASCII = 'ASCII'; # When all other defaults are exhausted, use UTF-8 my $result = undef; eval{ $result = Encode::encode_utf8($text);1; }; if($@){ } if($result){ return $result; } # Something is seriously wrong if we get to here return encode($ENC_ASCII, $text, undef);

like that

# IO layer: $handle now decodes all strings upon reading open my $handle, '<:encoding(UTF-8)', $file;

or

binmode $handle, ':encoding(UTF-8)';

Replies are listed 'Best First'.
Re: how to use Encode qw(all)
by almut (Canon) on Apr 05, 2010 at 15:24 UTC

    Depends on what exactly you mean by "input".  If you're referring to data read from an external file, using the PerlIO layer (your latter two variants) is certainly the way to go — except that reading from a file ("<:...") would decode, not encode, as you say in your question; in order to encode you'd need to write to the file...

    See also the open pragma, -C and utf8.

      I work webmail on @jaos software . incoming (input) are emails

      I thought this little preview of code

      package CGI::as_utf8; BEGIN { use strict; use warnings; use CGI 3.47; # earlier versions have a UTF-8 double-decoding bug { no warnings 'redefine'; my $param_org = \&CGI::param; my $might_decode = sub { my $p = shift; # make sure upload() filehandles are not modified return $p if !$p || ( ref $p && fileno($p) ); utf8::decode($p); # may fail, but only logs an error $p }; *CGI::param = sub { # setting a param goes through the original interface goto &$param_org if scalar @_ != 2; my $q = $_[0]; # assume object calls always my $p = $_[1]; return wantarray ? map { $might_decode->($_) } $q->$param_org($p) : $might_decode->( $q->$param_org($p) ); } } } 1

      but I must be careful in the webmail there is of course the posibility to upload. consider how the thing

      or else , : either. I can afford to encode the file handle as quoted above