http://www.perlmonks.org?node_id=1024752


in reply to Percent encoding of URIs with UTF-8 characters

The URI module does not help you because orsr.sk is not conforming to the standard RFC 3986 which clearly states to use UTF-8 encoding, not Windows-1250. So let's piece together the URI manually.

use utf8;
use URI::Escape qw(uri_escape);
use Encode qw(encode);

for my $name (
    'Ján Slota',
    'Peter Kažimír',
    'Alojz Hlina',
    'František Mikloško',
    'Ján Počiatek',
) {
    my ($pr, $meno) = split ' ', $name;
    printf "http://www.orsr.sk/hladaj_osoba.asp?PR=%s&MENO=%s&SID=0&R=on\n",
        map { uri_escape encode('Windows-1250', $_) } $meno, $pr;
}

__END__
http://www.orsr.sk/hladaj_osoba.asp?PR=Slota&MENO=J%E1n&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Ka%9Eim%EDr&MENO=Peter&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Hlina&MENO=Alojz&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Miklo%9Ako&MENO=Franti%9Aek&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Po%3Fiatek&MENO=J%E1n&SID=0&R=on
edit: Windows-1250, not -1252. choroba++
  • Comment on Re: Percent encoding of URIs with UTF-8 characters

Replies are listed 'Best First'.
Re^2: Percent encoding of URIs with UTF-8 characters
by andal (Hermit) on Mar 22, 2013 at 08:45 UTC

    Just a small note. In the above example, the strings are coming from the source, so the "encode" function is used to convert them to CP1250. In the OP script, the strings are coming from external file and they are already "octet sequences", so instead of "encode" one should use "from_to".

    uri_escape Encode::from_to($_, "UTF-8", "CP1250");
    Of course, an alternative would be to specify "encoding" for the file, but current version of the script does not do it.

Re^2: Percent encoding of URIs with UTF-8 characters
by McA (Priest) on Mar 22, 2013 at 09:10 UTC

    Hi daxim,

    just stumbled on the following. You said:

    ...is not conforming to the standard RFC 3986 which clearly states to use UTF-8 encoding...

    I'm a bit surprised about the part is not conforming. Can you direct my to the part in the RFC?

    Best regards
    McA