Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Extracting a domain name from a url

by abachus (Monk)
on Oct 15, 2006 at 14:21 UTC ( #578383=perlquestion: print w/replies, xml ) Need Help??

abachus has asked for the wisdom of the Perl Monks concerning the following question:

Good day all,

I would like to extract a domain name from an arbitrary url. A nice easy example :

From ->
extract only ->

The data will be coming from a UserAgent request header, so i will need to grab the GET *url* HTTP/1.1 line and work from there. Any thoughts on the best way to do this ?

many thanks,

Isaac Close.

Replies are listed 'Best First'.
Re: Extracting a domain name from a url
by rhesa (Vicar) on Oct 15, 2006 at 14:30 UTC
    Check out URI.
    use URI; my $uri = URI->new( '' ); print $uri->host;
Re: Extracting a domain name from a url
by blazar (Canon) on Oct 15, 2006 at 15:27 UTC
    From ->
    extract only ->

    rhesa already suggested using a specialized module, which yields a superior solution, which is superior because it uses a a specialized module (and that's generally the case), but this shouldn't be hard to do with a match or split, in which case some familiarity with elementary regexen should help you. In particular the following should be fine for you:

    my $url=''; my $host=(split m(/+), $url)[1];
      Yup, your solution does work for the majority of urls.

      For the record, here are some urls that wouldn't be handled properly with your regex:
        thanks monks, as always you show light on things i cannot see :)
        This one is one I've been using in production for a while, and seems to hold up well:

        my $url=""; my($host)=$url=~/http:\/\/([^\/]+)/;


        Update: Sorry, I'm a moe - I forgot to add the point of this post in my example!

        Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Re: Extracting a domain name from a url
by fenLisesi (Priest) on Oct 16, 2006 at 09:03 UTC
    When you match your url against $RE{URI}{HTTP}{-keep}, what you want will be in $3, which you should extract immediately after the match. This sample code: prints: => => => => => No match => No match potato:// => No match => http 80 /notfound.html?fn=john&ln=doe notfound.html?fn=john&ln=doe notfound.html fn=john&ln=doe
    See Regexp::Common and Regexp::Common::URI::http.

    Update: Added <readmore>

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://578383]
Approved by planetscape
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (1)
As of 2023-09-27 04:27 GMT
Find Nodes?
    Voting Booth?

    No recent polls found