Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: Normalizing URLs

by derby (Abbot)
on Jul 21, 2005 at 15:00 UTC ( #476867=note: print w/replies, xml ) Need Help??

in reply to Normalizing URLs

I haven't tried it but wouldn't URI and it canonical and eq methods work for you?

Update: Looks like URI will not normalize query params. Something like this should work (note, I did not check all cases - feel free to fix!)

!/usr/bin/perl -wd use URI; my $u1 = URI->new(""); my $u2 = URI->new(""); my $u1c = $u1->canonical; my $u2c = $u2->canonical; if( urlsEqual( $u1c, $u2c ) ) { print "equal\n"; } else { print "not equal\n"; } sub urlsEqual { my( $u1, $u2 ) = @_; my( $q1, $q2 ); # First try URI eq return 1 if( $u1->eq( $u2 ) ); # nope ... adjust query $q1 = $u1->query(); $q2 = $u2->query(); $q1 = join( '&', sort( split( /[&;]/, $q1 ) ) ) if $q1; $q2 = join( '&', sort( split( /[&;]/, $q2 ) ) ) if $q2; $u1->query( $q1 ); $u2->query( $q2 ); return $u1->eq( $u2 ); }


Replies are listed 'Best First'.
Re^2: Normalizing URLs
by ikegami (Pope) on Jul 21, 2005 at 15:45 UTC

    From what I saw, URI

    • Lowercases the scheme.
    • Lowercases the domain name. (1)
    • Removes the port if it's the default. (2)
    • Removes port fields consisting of just ':'. (3)
    • Adds trailing '/' if no path or query is specified. (6, partial)

    • Doesn't do (4), (5), (7) and (8), but easy to do.
    • Doesn't do (9) and (10), but might not be possible.
    • Doesn't set the path to '/' if no path is specified and a query is specified. (6, partial)
    • Doesn't normalize IP addresses in to dotted form.
    • Doesn't remove the trailing '.' from domain names, if any.
    • Doesn't touch the query.
Re^2: Normalizing URLs
by Anonymous Monk on Jul 22, 2005 at 12:30 UTC
    You can't expect a module called 'URI' to normalize CGI parameters. and are two different URIs. The fact the two different URIs are treated the same by the receiving server is outside of the URI realm.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://476867]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2020-05-29 10:55 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (169 votes). Check out past polls.