Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Unable to split $ARGV[0] variable. Can it be done?

by McDarren (Abbot)
on Dec 10, 2012 at 15:50 UTC ( #1008125=note: print w/ replies, xml ) Need Help??


in reply to [SOLVED]Unable to split $ARGV[0] variable. Can it be done?

..so that I can get the bare host name 'google.com'

um, google.com is not a hostname, it's a domain name.
Also, you start your example with 'www.google.com', and then you say you want 'google.com'
Is that correct, or was it a typo?

I'll assume you want to extract the Fully Qualified Domain Name

..appreciate some advice and whether split is the right function to use or not?

Although you could get what you want with split, I wouldn't consider it the best thing to use here. Especially if you're dealing with more complex URL's.
Personally, I'd use URI::Split

use URI::Split qw/uri_split/; my $url = 'http://www.google.com'; my ($proto, $fqdn) = uri_split($url); print "Protocol:$proto Domain:$fqdn\n";
Prints:
Protocol:http Domain:www.google.com

Cheers,
Darren


Comment on Re: Unable to split $ARGV[0] variable. Can it be done?
Select or Download Code
Re^2: Unable to split $ARGV[0] variable. Can it be done?
by Doozer (Beadle) on Dec 10, 2012 at 16:03 UTC
    Sorry, domain name was what I meant yes. No it wasn't a typo. 'http://www.google.com' is passed in to the script and a 'get' request is made against that URL using LWP. If the get request fails, it then tries a different prefix 'https://www.google.com' or 'http://google.com' for example. I want to split the domain name away from the prefix so I can chop and change the combinations as I please. It may be easier to have just the domain name passed in to the script and then the script can handle ALL of the prefixes itself.

    I appreciate all the responses and am currently working through the suggestions to see what I can work with.

      It may be easier to have just the domain name passed in to the script and then the script can handle ALL of the prefixes itself.

      Yeah, that sounds sensible.
      Here is an example of how you might implement that approach:

      #!/usr/bin/perl use strict; use warnings; use LWP::Simple; DOMAIN: while (my $domain = <DATA>) { chomp($domain); for my $protocol (qw/http https/) { next DOMAIN if test_url("$protocol://$domain"); for my $sub (qw/www web/) { next DOMAIN if test_url("$protocol://$sub.$domain"); } } print "Couldn't get anything from $domain\n"; } sub test_url { my $url = shift; print "Trying $url ..."; my $ua = LWP::UserAgent->new( timeout => 5, agent => 'Mozilla/5.0', ssl_opts => { verify_hostname => 0 }, ); my $response = $ua->get($url); if ($response->is_success) { print "OK\n"; return 1; } else { print "FAILED because " . $response->status_line . "\n"; return undef; } } __DATA__ google.com apple.com fred.com dschjksdbckjqh.com
      Output:
      Trying http://google.com ...OK Trying http://apple.com ...OK Trying http://fred.com ...OK Trying http://dschjksdbckjqh.com ...FAILED because 500 Can't connect t +o dschjksdbckjqh.com:80 (Bad hostname 'dschjksdbckjqh.com') Trying http://www.dschjksdbckjqh.com ...FAILED because 500 Can't conne +ct to www.dschjksdbckjqh.com:80 (Bad hostname 'www.dschjksdbckjqh.com +') Trying http://web.dschjksdbckjqh.com ...FAILED because 500 Can't conne +ct to web.dschjksdbckjqh.com:80 (Bad hostname 'web.dschjksdbckjqh.com +') Trying https://dschjksdbckjqh.com ...FAILED because 500 Can't connect +to dschjksdbckjqh.com:443 (getaddrinfo: nodename nor servname provide +d, or not known) Trying https://www.dschjksdbckjqh.com ...FAILED because 500 Can't conn +ect to www.dschjksdbckjqh.com:443 (getaddrinfo: nodename nor servname + provided, or not known) Trying https://web.dschjksdbckjqh.com ...FAILED because 500 Can't conn +ect to web.dschjksdbckjqh.com:443 (getaddrinfo: nodename nor servname + provided, or not known) Couldn't get anything from dschjksdbckjqh.com

      HTH,
      Darren

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008125]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2014-09-21 20:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (175 votes), past polls