Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

UTF8 issues again

by ultranerds (Friar)
on Sep 13, 2011 at 08:51 UTC ( #925650=perlquestion: print w/replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:


I'm trying to work out how to convert this value:

my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5";

..into its proper UTF8 value:

"Вопрос строительства в Ере"

This is a 3rd party script (and I'm just doing this as a favor, as its a charity) ... I've got all the other stuff they wanted done, but can't seem to work out this encoding issue. The way they save the data is a bit crappy:

29|5|lastrepoly|09/13/2011|01:48:39|Name|%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5|UTF-8 test...%0A%0A%D0%92%D0%BE%D0%BF%D1||2||0

TIA - this is driving me up the wall!


Replies are listed 'Best First'.
Re: UTF8 issues again
by ikegami (Pope) on Sep 13, 2011 at 09:02 UTC
    use URI::Escape qw( uri_unescape ); use Encode qw( decode_utf8 ); my $value = decode_utf8(uri_unescape($uri_component));

    URI::Escape, Encode.

Re: UTF8 issues again
by Khen1950fx (Canon) on Sep 13, 2011 at 09:18 UTC
    Don't forget binmode...Following ikegami's advice:
    #!/usr/bin/perl use strict; use warnings; use Encode; use URI::Escape; my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D +1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0 +%B5"; binmode STDOUT, ":utf8"; print my $value = decode_utf8(uri_unescape($test)), "\n";
Re: UTF8 issues again
by moritz (Cardinal) on Sep 13, 2011 at 08:59 UTC
Re: UTF8 issues again
by graff (Chancellor) on Sep 13, 2011 at 09:49 UTC
    use Encode; # ... assign goofy string value to $test ... $test =~ s/%([0-9A-F]{2})/chr(hex($1))/eg; # convert hex digits to oct +ets $test = decode( "utf8", $test ); # convert octets to unicode characte +rs
    The "decode" call (provided the Encode module) might not be necessary, depending on what you need to do with the string value. If you're just going to print it to a "raw" file handle, just print it with no further ado. But to use it as utf8 text (or print to a file handle that has been set to use utf8 mode) you need to "decode" it first.

    UPDATE: Of course, ikegami's approach is the better way to go.

      Thanks everyone - was me being stupid and not defining utf8 to STDOUT! Works a charm now :)



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://925650]
Approved by moritz
Front-paged by chrestomanci
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2018-01-17 06:12 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (196 votes). Check out past polls.