Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Determine absolute URL given a relative URL and the location where it was found.

by ehdonhon (Curate)
on Oct 30, 2003 at 20:41 UTC ( #303405=snippet: print w/ replies, xml ) Need Help??

Description:

So you are scraping URL's out of a web page (using a CPAN module, of course), and they take all sorts of different forms: some absolute, some relative. The relative ones may or may not begin with a slash, etc.. Your job is to figure out where all of these links really go. URI::URL is your friend.

use strict;
use URI::URL;
use constant BASE => 'http://www.pair.com/pair/support/index.html';

print "BASE is ", BASE, "\n\n";

while ( chomp(my $path = <DATA>) ) {
    &tryit( $path );
}

sub tryit {
    my $relative = shift;
    my $path = URI::URL->new($relative)->abs( BASE, 1 );
    print "$relative ->\n\t$path\n\n";
}

__DATA__
http://www.pair.com
/index.html
https://www.pairnic.com/faq.m
search/
library.html

Here's the output:

BASE is http://www.pair.com/pair/support/index.html

http://www.pair.com ->
        http://www.pair.com/

/index.html ->
        http://www.pair.com/index.html

https://www.pairnic.com/faq.m ->
        https://www.pairnic.com/faq.m

search/ ->
        http://www.pair.com/pair/support/search/

library.html ->
        http://www.pair.com/pair/support/library.html
Comment on Determine absolute URL given a relative URL and the location where it was found.
Download Code

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://303405]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2015-07-04 01:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls