Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^2: How to convert grabled characters into their real value

by Your Mother (Chancellor)
on Nov 26, 2012 at 18:18 UTC ( #1005733=note: print w/replies, xml ) Need Help??

in reply to Re: How to convert grabled characters into their real value
in thread How to convert grabled characters into their real value

use utf8;
use strict;
use warnings;
use Text::Unidecode;

print unidecode("ต้มยำกุ้ง"), $/;

When necessary, there are still <pre/> tags for this stuff. If it's short, there's no real problem (well, no download link but…).

Replies are listed 'Best First'.
Re^3: How to convert grabled characters into their real value
by rcrews (Novice) on Nov 26, 2012 at 19:23 UTC

    The unsaid base problem here is that someone started with the string "ต้มยำกุ้ง" but then incorrectly decoded it (probably to Windows cp1258) to create a new string of "ต้มยำกุ้ง"

    Note that in UTF-8, each of the nine charcters in the Thai string takes three bytes. Therefore the Latin 1 decoding includes 9 x 3 = 27 characters and each triplet begins with . It is often the case that an incorrectly decoded UTF-8 string into a Latin 1 character set will show each original character as beginning with some accented form of the letter a or A.

    I was not able to repair the string in place, but writing it to a file, I can use Perl's IO Layers and the Encode module to repair the encoding.

    use strict; use Encode; $|++; my $t = 'thai.txt'; # contains => ต้มยำกุ้ง open my $fh, '<:raw', $t or die "Couldn't open $t: $!"; my $content = do { local $/; <$fh> }; close $fh; $content = decode('UTF-8', $content); binmode *STDOUT, ':encoding(UTF-8)'; print "$content\n";

    However, note that rather than writing this program you can use Perl's wonderful character encoder/decoder without writing any code:

    piconv -t UTF-8 thai.txt > thai_fixed.txt

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005733]
[stevieb]: aaaand Mickey fixed the MetaCPAN::Client bug I just finished talking about already ;)

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2017-06-25 23:46 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (572 votes). Check out past polls.