Description: |
So you are scraping URL's out of a web page (using a CPAN module, of course), and they take all sorts of different forms: some absolute, some relative. The relative ones may or may not begin with a slash, etc.. Your job is to figure out where all of these links really go. URI::URL is your friend. |
use strict;
use URI::URL;
use constant BASE => 'http://www.pair.com/pair/support/index.html';
print "BASE is ", BASE, "\n\n";
while ( chomp(my $path = <DATA>) ) {
&tryit( $path );
}
sub tryit {
my $relative = shift;
my $path = URI::URL->new($relative)->abs( BASE, 1 );
print "$relative ->\n\t$path\n\n";
}
__DATA__
http://www.pair.com
/index.html
https://www.pairnic.com/faq.m
search/
library.html
Here's the output:
BASE is http://www.pair.com/pair/support/index.html
http://www.pair.com ->
http://www.pair.com/
/index.html ->
http://www.pair.com/index.html
https://www.pairnic.com/faq.m ->
https://www.pairnic.com/faq.m
search/ ->
http://www.pair.com/pair/support/search/
library.html ->
http://www.pair.com/pair/support/library.html
|