Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Percent encoding of URIs with UTF-8 characters

by daxim (Chaplain)
on Mar 21, 2013 at 13:27 UTC ( #1024752=note: print w/ replies, xml ) Need Help??


in reply to Percent encoding of URIs with UTF-8 characters

The URI module does not help you because orsr.sk is not conforming to the standard RFC 3986 which clearly states to use UTF-8 encoding, not Windows-1250. So let's piece together the URI manually.

use utf8;
use URI::Escape qw(uri_escape);
use Encode qw(encode);

for my $name (
    'Ján Slota',
    'Peter Kažimír',
    'Alojz Hlina',
    'František Mikloško',
    'Ján Počiatek',
) {
    my ($pr, $meno) = split ' ', $name;
    printf "http://www.orsr.sk/hladaj_osoba.asp?PR=%s&MENO=%s&SID=0&R=on\n",
        map { uri_escape encode('Windows-1250', $_) } $meno, $pr;
}

__END__
http://www.orsr.sk/hladaj_osoba.asp?PR=Slota&MENO=J%E1n&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Ka%9Eim%EDr&MENO=Peter&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Hlina&MENO=Alojz&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Miklo%9Ako&MENO=Franti%9Aek&SID=0&R=on
http://www.orsr.sk/hladaj_osoba.asp?PR=Po%3Fiatek&MENO=J%E1n&SID=0&R=on
edit: Windows-1250, not -1252. choroba++


Comment on Re: Percent encoding of URIs with UTF-8 characters
Re^2: Percent encoding of URIs with UTF-8 characters
by andal (Friar) on Mar 22, 2013 at 08:45 UTC

    Just a small note. In the above example, the strings are coming from the source, so the "encode" function is used to convert them to CP1250. In the OP script, the strings are coming from external file and they are already "octet sequences", so instead of "encode" one should use "from_to".

    uri_escape Encode::from_to($_, "UTF-8", "CP1250");
    Of course, an alternative would be to specify "encoding" for the file, but current version of the script does not do it.

Re^2: Percent encoding of URIs with UTF-8 characters
by McA (Curate) on Mar 22, 2013 at 09:10 UTC

    Hi daxim,

    just stumbled on the following. You said:

    ...is not conforming to the standard RFC 3986 which clearly states to use UTF-8 encoding...

    I'm a bit surprised about the part is not conforming. Can you direct my to the part in the RFC?

    Best regards
    McA

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1024752]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2014-09-17 00:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (55 votes), past polls