Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

UTF8 issues again

by ultranerds (Friar)
on Sep 13, 2011 at 08:51 UTC ( #925650=perlquestion: print w/replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:


I'm trying to work out how to convert this value:

my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5";

..into its proper UTF8 value:

"Вопрос строительства в Ере"

This is a 3rd party script (and I'm just doing this as a favor, as its a charity) ... I've got all the other stuff they wanted done, but can't seem to work out this encoding issue. The way they save the data is a bit crappy:

29|5|lastrepoly|09/13/2011|01:48:39|Name|%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5|UTF-8 test...%0A%0A%D0%92%D0%BE%D0%BF%D1||2||0

TIA - this is driving me up the wall!


Replies are listed 'Best First'.
Re: UTF8 issues again
by ikegami (Pope) on Sep 13, 2011 at 09:02 UTC
    use URI::Escape qw( uri_unescape ); use Encode qw( decode_utf8 ); my $value = decode_utf8(uri_unescape($uri_component));

    URI::Escape, Encode.

Re: UTF8 issues again
by Khen1950fx (Canon) on Sep 13, 2011 at 09:18 UTC
    Don't forget binmode...Following ikegami's advice:
    #!/usr/bin/perl use strict; use warnings; use Encode; use URI::Escape; my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D +1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0 +%B5"; binmode STDOUT, ":utf8"; print my $value = decode_utf8(uri_unescape($test)), "\n";
Re: UTF8 issues again
by moritz (Cardinal) on Sep 13, 2011 at 08:59 UTC
Re: UTF8 issues again
by graff (Chancellor) on Sep 13, 2011 at 09:49 UTC
    use Encode; # ... assign goofy string value to $test ... $test =~ s/%([0-9A-F]{2})/chr(hex($1))/eg; # convert hex digits to oct +ets $test = decode( "utf8", $test ); # convert octets to unicode characte +rs
    The "decode" call (provided the Encode module) might not be necessary, depending on what you need to do with the string value. If you're just going to print it to a "raw" file handle, just print it with no further ado. But to use it as utf8 text (or print to a file handle that has been set to use utf8 mode) you need to "decode" it first.

    UPDATE: Of course, ikegami's approach is the better way to go.

      Thanks everyone - was me being stupid and not defining utf8 to STDOUT! Works a charm now :)



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://925650]
Approved by moritz
Front-paged by chrestomanci
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2017-01-19 07:01 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (168 votes). Check out past polls.