I am trying to convert the euro symbol from utf-8 to ISO-8859-1. So far I am having no luck. I am using Perl 5.8.5.
I have a template, which is in utf-8 and filled with utf-8 data (from another source). I'm encoding the result in ISO-8859-1 using Unicode::MapUTF8::from_utf9() (version 1.09).
This is what I have tried, with the following conversion code:
sub sms_encode {
my $text = shift or return undef;
my $new = from_utf8({-string => $text, -charset => 'ISO-8859-1'});
return $new;
}
- Typing a literal € in the template -> a literal € in the result
- Using the utf-8 typed symbol in the template -> gives whitespace
- Using alt-0128 on Windows, transfering the file to the server, reading the file in to the template -> gives whitespace
- Using a literal € in the template -> a literal € in the result
At this point I wrote a small script and verified that Unicode::MapUTF8 was returning whitespace when given the input from attempt 2. I then changed my code to protect the symbol from Unicode::MapUTF8:
sub sms_encode {
my $text = shift or return undef;
my $placeholder = 'THIS_WILL_BE_THE_EURO_SYMBOL';
my $new =~ s/â¬/$placeholder/; # Regex A
$new = from_utf8({-string => $text, -charset => 'ISO-8859-1'});
$new =~ s/$placeholder/€/; # Regex B
return $new;
}
It's quite likely the symbols aren't displaying correctly; in Regex A I am using the symbol that was used in attempt 2 above (literally typed character in utf-8), and in Regex B I am using the symbol that was used in attempt 3 above (literally typed character in ISO-8859-1). I then tried:
- The version displayed in the code, without utf-8 pragma -> whitespace
- Replacing the symbol in Regex A with a literal ISO-8859-1 character, keeping Regex B the same -> whitespace
- The version displayed in the code, with utf-8 pragma enabled -> whitespace, plus a warning about malformed utf-8 (the character from Regex B)
- Using some arbitrary text in the template and in Regex A, with Regex B as in the code, no utf-8 pragma -> the correct symbol
So although I have found a way to get symbols in my template to survive the encoding, I'm not at all satisfied with the solution for several reasons, one of which is that it isn't robust enough to deal with possible euro symbols in the data that is fed in to the template.
Can anyone offer suggestions on how I could convert from a typed euro symbol in utf-8 (as opposed to €) to a typed symbol in ISO-8859-1?