Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

UTF-8 to ISO-8859-1 conversion of euro symbol

by Anonymous Monk
on Mar 03, 2005 at 13:54 UTC ( [id://436207]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to convert the euro symbol from utf-8 to ISO-8859-1. So far I am having no luck. I am using Perl 5.8.5.

I have a template, which is in utf-8 and filled with utf-8 data (from another source). I'm encoding the result in ISO-8859-1 using Unicode::MapUTF8::from_utf9() (version 1.09). This is what I have tried, with the following conversion code:

sub sms_encode { my $text = shift or return undef; my $new = from_utf8({-string => $text, -charset => 'ISO-8859-1'}); return $new; }
  1. Typing a literal € in the template -> a literal € in the result
  2. Using the utf-8 typed symbol in the template -> gives whitespace
  3. Using alt-0128 on Windows, transfering the file to the server, reading the file in to the template -> gives whitespace
  4. Using a literal € in the template -> a literal € in the result
At this point I wrote a small script and verified that Unicode::MapUTF8 was returning whitespace when given the input from attempt 2. I then changed my code to protect the symbol from Unicode::MapUTF8:
sub sms_encode { my $text = shift or return undef; my $placeholder = 'THIS_WILL_BE_THE_EURO_SYMBOL'; my $new =~ s/â¬/$placeholder/; # Regex A $new = from_utf8({-string => $text, -charset => 'ISO-8859-1'}); $new =~ s/$placeholder/€/; # Regex B return $new; }
It's quite likely the symbols aren't displaying correctly; in Regex A I am using the symbol that was used in attempt 2 above (literally typed character in utf-8), and in Regex B I am using the symbol that was used in attempt 3 above (literally typed character in ISO-8859-1). I then tried:
  1. The version displayed in the code, without utf-8 pragma -> whitespace
  2. Replacing the symbol in Regex A with a literal ISO-8859-1 character, keeping Regex B the same -> whitespace
  3. The version displayed in the code, with utf-8 pragma enabled -> whitespace, plus a warning about malformed utf-8 (the character from Regex B)
  4. Using some arbitrary text in the template and in Regex A, with Regex B as in the code, no utf-8 pragma -> the correct symbol
So although I have found a way to get symbols in my template to survive the encoding, I'm not at all satisfied with the solution for several reasons, one of which is that it isn't robust enough to deal with possible euro symbols in the data that is fed in to the template.

Can anyone offer suggestions on how I could convert from a typed euro symbol in utf-8 (as opposed to €) to a typed symbol in ISO-8859-1?

Replies are listed 'Best First'.
Re: UTF-8 to ISO-8859-1 conversion of euro symbol
by gellyfish (Monsignor) on Mar 03, 2005 at 14:09 UTC

      Actually if you read more carefully the article you'll see that Latin 9 is referenced as ISO 8859-15, not 9: The ISO Latin 9 (ISO 8859-15)...

      updated:: added text in italics, the original was a bit rude, sorry.

        Yes absolutely correct, the mistake arose between between brain and keyboard - I originally type 8859-1 in both places then changed the second one but, er, incorrectly. I blame the finance consultant trying to talk someone through configuring their VPN loudly on the other side of the office ;-)

        /J\

      Taking both comments in to consideration, I have no switched my encoding to ISO-8859-15. I'm now seeing utf-8 character 189 (ISO-8859 character 164: ¤) with my original code (s/8859-1/8859-15/) and template...

      These encodings seem to propogate like rabbits. I'm glad someone knows the difference between them.

      Thank you for the clarification. I had just ISO-8859 written on my notes and assumed I meant -1. I have changed my encoding to ISO-8859-9.

      Unfortunately, this has the same result using the utf-8 typed euro character: I am given whitespace by from_utf8(). A stand-alone test case also shows that as the response from the function.

        As mirod pointed out I was typing crap - it is (confusingly) iso-8859-15 you should be using. Are you sure that whatever it is that you are using to look at the output is actually using the correct character set to display the euro character from that encoding? There is also a windows cp1252 that includes the character but with a different numeric code.

        /J\

UTF-8 to ? conversion of euro symbol
by Anonymous Monk on Mar 03, 2005 at 15:12 UTC
    Since I have now tried several different things in response to postings, on different levels in the thread, I have decided to make a summary of what has happened.

    I have gone back to my original code, but tried 'ISO-8859-15' instead. The resulting character is utf-8 character 189.

    I then tried encoding cp1252, which transformed utf8 8364 in to whitespace.

    It would appear that I may require a different encoding, one which is like ISO-8859-15 for characters like ø, but with a different euro number.

    In the meantime I've also been pursuing the option of using ISO-8859-15 and manually transforming the euro symbol with a regex, but unfortunately the utf-8 euro symbol is not picked up in the first regex, with or without utf8 pragma on. I have verified that the symbol in my template is the utf-8 euro symbol by viewing it in an xhtml page with charset utf-8.

Re: UTF-8 to ISO-8859-1 conversion of euro symbol
by Anonymous Monk on Mar 24, 2005 at 15:35 UTC
    A small update: I haven't found the solution, but I have found the problem: the recipient expects ISO-8859-1 encoding except for certain characters, such as the Euro symbol, which is encoded in cp1252. With arbitrary non-standard encoding I think the problem is with the recipient.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://436207]
Approved by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2025-03-26 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When you first encountered Perl, which feature amazed you the most?










    Results (67 votes). Check out past polls.

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.