Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

tr/// latin1 on utf8 string

by Marcello (Hermit)
on Mar 13, 2008 at 13:10 UTC ( #673954=perlquestion: print w/replies, xml ) Need Help??

Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm having a utf8 string in Perl's internal form, obtained from Encode::decode() and would like to use tr/// to translate a list of Latin1 characters (65 characters in total). Obviously, tr/// does not convert the Latin1 characters to utf8, because in the test case below the Euro sign is not translated.
use strict; use warnings; use Encode; my $utf8 = Encode::decode('utf-8', '[]'); $utf8 =~ tr![]!<>E!; print $utf8;
How can I accomplish this?

I can lookup the codepoints for all characters in a foreach and replace the using a hash, but that's what we have tr/// for right?

Thanks in advance!

Update: I cannot do the translation before converting to utf8 because the string can be in different character sets.

Replies are listed 'Best First'.
Re: tr/// latin1 on utf8 string
by duelafn (Vicar) on Mar 13, 2008 at 13:50 UTC

    You need to use utf8; to tell perl that the euro symbol in your script is utf8 (not bytes).

    (modified to read bytes from command line, else decode fails)

    use strict; use warnings; use utf8; use Encode; my $utf8 = Encode::decode('utf-8', shift); $utf8 =~ tr![]!<>E! print $utf8;

    Good Day,
        Dean

Re: tr/// latin1 on utf8 string
by moritz (Cardinal) on Mar 13, 2008 at 13:14 UTC
    The euro sign isn't a latin1 character, it's in ISO-8859-15 and in all the unicode transformation formats (utf-*).

    Maybe your source code is in a different encoding than you think?

Re: tr/// latin1 on utf8 string
by ikegami (Pope) on Mar 13, 2008 at 15:54 UTC

    I cannot do the translation before converting to utf8 because the string can be in different character sets.

    You can't work with strings if you don't know how they were encoded. Without that info, all you have is a bunch of meaningless bytes.

    Once you get your encoding problem straightened out, you might be interested in Text::Unidecode. I'm not sure how it handles the euro symbol specifically, but it's great at ASCIIfying data.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://673954]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2020-05-27 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (152 votes). Check out past polls.

    Notices?