Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

UTF8 issues again

by ultranerds (Pilgrim)
on Sep 13, 2011 at 08:51 UTC ( #925650=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to work out how to convert this value:

my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5";

..into its proper UTF8 value:

"Вопрос строительства в Ере"

This is a 3rd party script (and I'm just doing this as a favor, as its a charity) ... I've got all the other stuff they wanted done, but can't seem to work out this encoding issue. The way they save the data is a bit crappy:

29|5|lastrepoly|09/13/2011|01:48:39|Name|%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0%B5|UTF-8 test...%0A%0A%D0%92%D0%BE%D0%BF%D1||2||0

TIA - this is driving me up the wall!

Andy

Comment on UTF8 issues again
Select or Download Code
Re: UTF8 issues again
by moritz (Cardinal) on Sep 13, 2011 at 08:59 UTC
Re: UTF8 issues again
by ikegami (Pope) on Sep 13, 2011 at 09:02 UTC
    use URI::Escape qw( uri_unescape ); use Encode qw( decode_utf8 ); my $value = decode_utf8(uri_unescape($uri_component));

    URI::Escape, Encode.

Re: UTF8 issues again
by Khen1950fx (Canon) on Sep 13, 2011 at 09:18 UTC
    Don't forget binmode...Following ikegami's advice:
    #!/usr/bin/perl use strict; use warnings; use Encode; use URI::Escape; my $test = "%D0%92%D0%BE%D0%BF%D1%80%D0%BE%D1%81 %D1%81%D1%82%D1%80%D0%BE%D0%B8%D +1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%B0 %D0%B2 %D0%95%D1%80%D0 +%B5"; binmode STDOUT, ":utf8"; print my $value = decode_utf8(uri_unescape($test)), "\n";
Re: UTF8 issues again
by graff (Chancellor) on Sep 13, 2011 at 09:49 UTC
    use Encode; # ... assign goofy string value to $test ... $test =~ s/%([0-9A-F]{2})/chr(hex($1))/eg; # convert hex digits to oct +ets $test = decode( "utf8", $test ); # convert octets to unicode characte +rs
    The "decode" call (provided the Encode module) might not be necessary, depending on what you need to do with the string value. If you're just going to print it to a "raw" file handle, just print it with no further ado. But to use it as utf8 text (or print to a file handle that has been set to use utf8 mode) you need to "decode" it first.

    UPDATE: Of course, ikegami's approach is the better way to go.

      Thanks everyone - was me being stupid and not defining utf8 to STDOUT! Works a charm now :)

      Cheers

      Andy

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://925650]
Approved by moritz
Front-paged by chrestomanci
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-10-01 07:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (389 votes), past polls