Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: How to convert grabled characters into their real value

by Your Mother (Canon)
on Nov 26, 2012 at 18:18 UTC ( #1005733=note: print w/ replies, xml ) Need Help??


in reply to Re: How to convert grabled characters into their real value
in thread How to convert grabled characters into their real value

use utf8;
use strict;
use warnings;
use Text::Unidecode;

print unidecode("ต้มยำกุ้ง"), $/;

When necessary, there are still <pre/> tags for this stuff. If it's short, there's no real problem (well, no download link but…).


Comment on Re^2: How to convert grabled characters into their real value
Download Code
Re^3: How to convert grabled characters into their real value
by rcrews (Novice) on Nov 26, 2012 at 19:23 UTC

    The unsaid base problem here is that someone started with the string "ต้มยำกุ้ง" but then incorrectly decoded it (probably to Windows cp1258) to create a new string of "ต้มยำกุ้ง"

    Note that in UTF-8, each of the nine charcters in the Thai string takes three bytes. Therefore the Latin 1 decoding includes 9 x 3 = 27 characters and each triplet begins with . It is often the case that an incorrectly decoded UTF-8 string into a Latin 1 character set will show each original character as beginning with some accented form of the letter a or A.

    I was not able to repair the string in place, but writing it to a file, I can use Perl's IO Layers and the Encode module to repair the encoding.

    use strict; use Encode; $|++; my $t = 'thai.txt'; # contains => ต้มยำกุ้ง open my $fh, '<:raw', $t or die "Couldn't open $t: $!"; my $content = do { local $/; <$fh> }; close $fh; $content = decode('UTF-8', $content); binmode *STDOUT, ':encoding(UTF-8)'; print "$content\n";

    However, note that rather than writing this program you can use Perl's wonderful character encoder/decoder without writing any code:

    piconv -t UTF-8 thai.txt > thai_fixed.txt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-09-21 02:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (165 votes), past polls