Skeeve has asked for the wisdom of the Perl Monks concerning the following question:
Dear fellow monks!
I'm a bit lost with utf-8 conversion. For a FictionBook 2 eReader conversion script, I need to have "translations" for some UTF-8 characters to the appropriate eReader characters.
For this I used a part of the table found at eReader.com and stored it as a UTF8 file:
¡ ¡ ¡ \a161 Inverted exclamation ¢ ¢ ¢ \a162 Cent sign £ £ £ \a163 Pound sign : : skipped : œ œ œ \a156 Small combined oe Ÿ Ÿ Ÿ \a159 Large Y with diaeresis
Next I wanted to prepend the first character with it's UTF-8 unicode 4 digit code by using a oneliner (splitted here for better readability):
Unfortunately I seem to miss something. I get data like this:perl -i.bak -pe '\ binmode STDIN,":utf8"; \ binmode STDOUT,":utf8"; \ if (/^([^[:ascii:]])/) { \ $_= sprintf("%04x",ord $1).$_ \ }' pml.txt
3 time 00c2 can't be true.00c2¡ ¡ ¡ \a161 Inverted exclamation 00c2¢ ¢ ¢ \a162 Cent sign 00c2£ £ £ \a163 Pound sign
Do you see my mistake?
Update: Experimenting and reading perldoc perlrun, especially about -C led me to this version, which seems to work quite well:
perl -i.bar -CDS -pe ' \ if (/^([^[:ascii:]])/) { \ $_= sprintf("%04x",ord $1).$_ \ }' pml.txt
Update2: No... It still doesn't work
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: get UTF-8 character codes
by theorbtwo (Prior) on Oct 31, 2005 at 16:21 UTC | |
by Skeeve (Parson) on Oct 31, 2005 at 19:09 UTC | |
by Errto (Vicar) on Nov 01, 2005 at 00:40 UTC |
Back to
Seekers of Perl Wisdom