Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

How to parse URL in CGI.pm

by asafp (Novice)
on Dec 05, 2010 at 15:38 UTC ( [id://875489]=perlquestion: print w/replies, xml ) Need Help??

asafp has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a CGI program which gets a URL looks like this:

http://example.mycgi.com:9999/service?func=search&institute=DEMO&calling_app=ABC&url=http://return.link.com:8997/F?local_base=admin&func=save&base=staff

All what comes after the "&url=" is a backlink URL which I need to find.
Now, I am trying to parse the part from the "&url=" to the end but cannot find a way of doing it.
When using the "query_string" function, I get the following string:

func=search;func=save;institute=DEMO;calling_app=ABC;url=http%3A%2F%2Freturn.link.com%3A8997%2FF%3Flocal_base%3Dadmin;base=staff

Notice that the "func=save" is moved from its place to be after the "func=search", and now I cannot find it when I create the back URL.

Is there a way to simply get the full URL as is and parse it by myself?

Thanks

Replies are listed 'Best First'.
Re: How to parse URL in CGI.pm
by kennethk (Abbot) on Dec 05, 2010 at 16:02 UTC
    If local_base=admin&func=save&base=staff are supposed to be part of the url parameter, then the source of the problem is that you are dealing with a bad URL - see Percent_encoding. Again, that is not a legal URL. If a parameter must contain any of the reserved characters (in this case, it contains 5 different reserved characters), you must escape it. See URI::Escape. Are you generating this link in a different script, or is it coming from the outside? This should really be fixed where it is coming from - the URL should look like: http://example.mycgi.com:9999/service?func=search&institute=DEMO&calling_app=ABC&url=http%3A%2F%2Freturn.link.com%3A8997%2FF%3Flocal_base%3Dadmin%26func%3Dsave%26base%3Dstaff

    If this is coming from the outside and there is no way to get them to fix their code, you can check the environmental variables for the request URL: $ENV{REQUEST_URI}.

    For some intro material on working with Perl CGI, check out Ovid's CGI Course - Resurrected and Updated!. In particular, part of your issue is discussed in lesson 2.

      that is not a legal URL

      You are mistaken. Not only is the url legal, it is parsed identically whether those characters are escaped or not. Only "#" must be escaped in the query component of HTTP urls since no other character "would conflict with a reserved character's purpose as a delimiter" in that part of the url. Other limitations are self-imposed.

      Where it makes a difference is how the query is parsed. In this case, "?" and ";" must be escaped in addition to "#" because CGI (the module) expects the query to be a url-encoded form (application/x-www-form-urlencoded) with the extension that ";" is equivalent to "?". (It also supports ISINDEX-style queries.)

      If he did his own query parsing, all that comes after the "&url=" could be considered part of the backlink url. But since he's using CGI's parser, all that comes after the "&url=" but only until the next "&" and ";" is considered part of the backlink url.

      Thanks for the help!

      The URL is coming from outside and I cannot ask the sender to encode it.

      So, I think I'll use the ENV{REQUEST_URI} option.

      Is this safe? Can I count on it to always give me the correct URI?

        Is this safe? Can I count on it to always give me the correct URI?

        Its safe in the sense that it is only data.

        Since REQUEST_URI is not part CGI spec, it won't be available on every server, so its better to rely on other variables (PATH_INFO/QUERY_STRING...)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://875489]
Approved by kennethk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-03-28 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found