Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

UTF8 issues again

by ultranerds (Pilgrim)
on Sep 13, 2011 at 08:51 UTC ( #925650=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to work out how to convert this value:

my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5";

..into its proper UTF8 value:

"Вопрос строительства в Ере"

This is a 3rd party script (and I'm just doing this as a favor, as its a charity) ... I've got all the other stuff they wanted done, but can't seem to work out this encoding issue. The way they save the data is a bit crappy:

29|5|lastrepoly|09/13/2011|01:48:39|Name|%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5|UTF-8 test...%0A%0A%D0%92%D0%BE%D0%BF%D1||2||0

TIA - this is driving me up the wall!

Andy

Comment on UTF8 issues again
Select or Download Code
Re: UTF8 issues again
by moritz (Cardinal) on Sep 13, 2011 at 08:59 UTC
Re: UTF8 issues again
by ikegami (Pope) on Sep 13, 2011 at 09:02 UTC
    use URI::Escape qw( uri_unescape ); use Encode qw( decode_utf8 ); my $value = decode_utf8(uri_unescape($uri_component));

    URI::Escape, Encode.

Re: UTF8 issues again
by Khen1950fx (Canon) on Sep 13, 2011 at 09:18 UTC
    Don't forget binmode...Following ikegami's advice:
    #!/usr/bin/perl use strict; use warnings; use Encode; use URI::Escape; my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D +1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0 +%B5"; binmode STDOUT, ":utf8"; print my $value = decode_utf8(uri_unescape($test)), "\n";
Re: UTF8 issues again
by graff (Chancellor) on Sep 13, 2011 at 09:49 UTC
    use Encode; # ... assign goofy string value to $test ... $test =~ s/%([0-9A-F]{2})/chr(hex($1))/eg; # convert hex digits to oct +ets $test = decode( "utf8", $test ); # convert octets to unicode characte +rs
    The "decode" call (provided the Encode module) might not be necessary, depending on what you need to do with the string value. If you're just going to print it to a "raw" file handle, just print it with no further ado. But to use it as utf8 text (or print to a file handle that has been set to use utf8 mode) you need to "decode" it first.

    UPDATE: Of course, ikegami's approach is the better way to go.

      Thanks everyone - was me being stupid and not defining utf8 to STDOUT! Works a charm now :)

      Cheers

      Andy

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://925650]
Approved by moritz
Front-paged by chrestomanci
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2014-09-22 09:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (185 votes), past polls