Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

string conversion problem

by stormbow (Initiate)
on Mar 07, 2013 at 05:48 UTC ( #1022149=perlquestion: print w/ replies, xml ) Need Help??
stormbow has asked for the wisdom of the Perl Monks concerning the following question:

I have a text string like the following one: %E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%BE%E5%B0%94

I know this is probably a utf-8 encoded Chinese phrase, or more precisely, a Chinese name for a Turkey place:) Now, how can I get the actual Chinese character string out of this percent-encoded gibberish?

Tried URI::Escape::uri_unescape() but it didn't produce the right result. Tried Encode::decode() but that doesn't work either.

Can anyone help?

Comment on string conversion problem
Re: string conversion problem
by Khen1950fx (Canon) on Mar 07, 2013 at 06:06 UTC
    How about this? Is that the correct string?
    #!usr/bin/perl -l use strict; use warnings; use URI::Encode qw(uri_decode); my $encoded = "%E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%BE%E5%B0%94"; print uri_decode($encoded);
Re: string conversion problem
by Your Mother (Canon) on Mar 07, 2013 at 06:07 UTC

    You tried what now? :P

    perl -MURI::Escape -le 'print uri_unescape("%E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%BE%E5%B0%94")'
    于斯屈达尔

      What the.....

      That piece of Chinese charater string you get is absolutely right. But I simply CANNOT get the same result. The exact same code produces "浜庢柉灞堣揪灏" on my computer. Is there some configuration stuff that I'm missing? BTW I'm coding on a winxp machine with activePerl.

      Also, the solution that Khen1950fx presents probably should produce the correct result but it just does not work for me either.

        Try this...? This is along the lines aitap wrote about.

        perl -CSD -MEncode -MURI::Escape -le 'print decode "UTF-8", uri_unescape("%E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%BE%E5%B0%94")'

        You might need to reverse the quotes for your box.

Re: string conversion problem
by aitap (Deacon) on Mar 07, 2013 at 07:36 UTC
    The result of uri_unescape contains bytes, not characters:
    $ perl -Mutf8 -MURI::Escape -E'say utf8::is_utf8(uri_unescape shift) ? + "characters" : "bytes"' %E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%BE%E5%B0%9 +4 bytes
    Perhaps you need to decode these bytes to the internal character representation and then encode them on output as you need:
    $ perl -MEncode=decode -MData::Dumper -MURI::Escape -e 'print Dumper d +ecode utf8 => uri_unescape shift' %E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%B +E%E5%B0%94 $VAR1 = "\x{4e8e}\x{65af}\x{5c48}\x{8fbe}\x{5c14}";
    Once you have characters instead of bytes you can encode them as you want using :encoding IOLayer or encode function from Encode module.
    Sorry if my advice was wrong.
Re: string conversion problem
by stormbow (Initiate) on Mar 08, 2013 at 02:43 UTC

    YourMother and aitap. The exact code snippets you guys gave do not produce the correct string on my machine. But with your helpful discussion, I finally figured out what worked for me:

    perl -MEncode -MURI::Escape -le "print encode('gbk',decode('utf8',URI::Escape::uri_unescape shift))" %E4%BA%8E%E6%96%AF%E5%B1%88%E8%BE%BE%E5%B0%94

    Thank you very very much for the help. Also, it would be great if any of you can explain to me why I need the above code on my computer while a simple uri_unescape works for YM.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1022149]
Approved by BrowserUk
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2014-08-23 05:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (172 votes), past polls