nikolay has asked for the wisdom of the Perl Monks concerning the following question:

Hi. My script on apache web server gets URL-encoded, through POST method, form's several fields, that contain Russian charcters in UTF-8 encoding, like follows: %d0%be%d0%b1/%d1%81%d1%82%d0%b5%d0%bd. How do i decode those fields back to initial Russian characters? Thank you for advance.

Replies are listed 'Best First'.
Re: To decode URL-decoded UTF-8 string.
by choroba (Archbishop) on Aug 28, 2018 at 09:57 UTC
    You need URL::Encode to decode the percent notation into octets, and Encode to turn the octets into unicode characters:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use open ':encoding(UTF-8)', ':std'; use Encode; use URL::Encode qw{ url_decode }; my $string = '%d0%be%d0%b1/%d1%81%d1%82%d0%b5%d0%bd'; my $octets = url_decode($string); my $unicode = decode('UTF-8', $octets); say $unicode;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      I suppose your solution is the best. But i can not use it as i can not find the Devuan package for that. Do you know one, by the way? I do not want to support myself the one, downloaded from CPAN, it is better when it goes from a distro.
Re: To decode URL-decoded UTF-8 string.
by TheloniusMonk (Sexton) on Aug 28, 2018 at 09:53 UTC
    You mean you want to decode url-encoded and the result happens to be in UTF-8
    while(<>) { print &urldecode($_); } sub urlencode { my $s = shift; $s =~ s/ /+/g; $s =~ s/([^A-Za-z0-9\+-])/sprintf("%%%02X", ord($1))/seg; return $s; } sub urldecode { my $s = shift; $s =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg; $s =~ s/\+/ /g; return $s; }
    Produces from your input: об/стен

      There are reasons we recommend modules instead of cargo-culting–

      print urlencode("41 + 1 = 42"); # 41 1 = 42 use URI::Escape; print uri_escape("41 + 1 = 42"); # 41%20%2B%201%20%3D%2042
      Wow! Awesome! That's exactly what i looked for! I read perlfunc man for the pack function, but thought that i need to use h template, and not c . So, my approach failed. Thank you veru much, TheloniusMonk!
Re: To decode URL-decoded UTF-8 string.
by Your Mother (Bishop) on Aug 28, 2018 at 10:04 UTC

    Same answer as others but since it's a one-liner and I did it before I saw they posted and I'm just slow, I'll add it.

    perl -CSD -MEncode -MURI::Encode=uri_decode -le 'print decode("utf-8",uri_decode("%d0%be%d0%b1/%d1%81%d1%82%d0%b5%d0%bd"))'
    об/стен
    
      Thank you.
Re: To decode URL-decoded UTF-8 string.
by thanos1983 (Parson) on Aug 28, 2018 at 09:55 UTC

    Hello nikolay,

    Is this working for you?

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use Encode;
    use URI::Escape;
    
    binmode STDOUT, ":utf8";
    
    my $in = "%d0%be%d0%b1/%d1%81%d1%82%d0%b5%d0%bd";
    my $text = Encode::decode('utf8', uri_unescape($in));
    
    print $text . "\n";
    
    __END__
    
    $ perl test.pl
    об/стен
    

    Update: Some time ago there was a similar question PDF::API2 printing non ascii characters. Although the tittle is not the same check it out it will help to review some information.

    Looking forward to your reply, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
      No. I have tried that already before. It only changes %-chars to \x-ones.

        Hello again nikolay,

        Then try this (it should work as expected):

        #!/usr/bin/perl
        use utf8;
        use strict;
        use warnings;
        use URI::Escape;
        use feature 'say';
        use Encode qw/ decode /;
        
        binmode STDOUT, ':utf8';
        
        sub decodedUri {
            return decode 'UTF-8', uri_unescape( shift );
        }
        
        say decodedUri('%d0%be%d0%b1/%d1%81%d1%82%d0%b5%d0%bd');
        
        __END__
        
        $ perl test.pl
        об/стен
        

        BR / Thanos

        Seeking for Perl wisdom...on the process of learning...not there...yet!