Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Perl RegEx (url explode)

by U_nix$_@ (Initiate)
on Nov 01, 2012 at 22:18 UTC ( #1001877=perlquestion: print w/replies, xml ) Need Help??
U_nix$_@ has asked for the wisdom of the Perl Monks concerning the following question:


I think its an banal bug, but would be happy if someone can help me with this.. Here` s the code:

($CON,$WWW,$HOST,$SLD,$TLD,$PORT) = $conf[1] =~ m|(http(?:s?))?(?:(?:: +//)?(w{0,3})\.{0,1})?((.*)(?:\.)(.*))(?::(\d{0,10})?)|;

Following "types" of URLs must come through:


and if they come with a port, it must work too:

If a URl with port is used everything works fine. Without Port nothing works


Prints following:

http(s) www example de 443

if something is missing: :

http "empty" example de 80

Somewhere must be a little bug.

No Variable gets a value if a URL with no Port is given


I guess the reason is "?::". No ":" no match. If I change it both URLs are accepted but it does not split up the Port. The port remains at the TopLevelDomain and is joined to the host variable.

Replies are listed 'Best First'.
Re: Perl RegEx (url explode)
by choroba (Bishop) on Nov 01, 2012 at 22:53 UTC
    I noticed just one problem: the placement of the final question mark. The whole port part is optional, together with the colon:
    m% (http(?:s?))? # http (?:(?:://)? (w{0,3})\.{0,1})? # www ((.*)(?:\.)([^:]*)) # domains (?::(\d+))? %x; # port
    Edit: . changed to [^:] in # domains.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Perl RegEx (url explode)
by aitap (Curate) on Nov 02, 2012 at 18:54 UTC
    Isn't URI better in this case? Bigger, but simpler code:
    use URI; for (URI::->new($conf[1],"http")) { my @domain = split /\./, $_->host; my $tld = pop @domain; my $sld = join ".",@domain; my $www = @domain > 2 && $domain[0] eq "www" ? shift @domain : ""; my $host = join ".",(@domain,$tld); print ($_->scheme,$www,$host,$sld,$tld,$_->port); }
    (this code will work even in weird cases like perfectly valid
    Sorry if my advice was wrong.
Re: Perl RegEx (url explode)
by U_nix$_@ (Initiate) on Nov 01, 2012 at 23:07 UTC

    thanks. Thats one of the ways I tried before. It allows both types. With and without Port but produces the following output:

    http www //this one should be "" example de:9944 //this one should be only "de" ## PORT is empty ##

      (.*) Seems to ignore whats coming after it if ":" is optional.
      And the port becomes a part of this:


      But how to fix it? A fixed set of commonly used TopLevelDomains is not felxible enough.

        Try to match a character class that does not contain ':' (i.e. [^:]):

        use strict; use warnings; for my $uri( qw( ) ) { print "in ($uri):\n"; my (@spl) = $uri =~ m|(http(?:s?))? (?:(?:://)? (w{0,3})\.{0,1})? ((.*)(?:\.)([^:/]*)) # match if it is not a ":" (?::(\d{0,10}))? |x; print 'out: ', join(', ', map { defined $_ ? $_ : '-' } @spl), "\n\ +n"; } __DATA__ in ( out: https, www,, example, de, - in ( out: http, www,, example, de, - in ( out: https, ,, example, de, - in ( out: http, ,, example, de, - in ( out: -, www,, example, de, - in ( out: -, ,, example, de, 123 in ( out: http, www,, example, de, 445 in ( out: http, www,, example, de, - in ( out: http, www,, example, de, 445
        Update: Added '/' to character class and example '#foo'

        Ok. The spirit reached me. This fixed it:


        Your "match if not" version is the cleaner one. Merci.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001877]
Approved by Perlbotics
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2018-03-24 01:39 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (297 votes). Check out past polls.