Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Perl RegEx (url explode)

by U_nix$_@ (Initiate)
on Nov 01, 2012 at 22:18 UTC ( #1001877=perlquestion: print w/replies, xml ) Need Help??
U_nix$_@ has asked for the wisdom of the Perl Monks concerning the following question:


I think its an banal bug, but would be happy if someone can help me with this.. Here` s the code:

($CON,$WWW,$HOST,$SLD,$TLD,$PORT) = $conf[1] =~ m|(http(?:s?))?(?:(?:: +//)?(w{0,3})\.{0,1})?((.*)(?:\.)(.*))(?::(\d{0,10})?)|;

Following "types" of URLs must come through:


and if they come with a port, it must work too:

If a URl with port is used everything works fine. Without Port nothing works


Prints following:

http(s) www example de 443

if something is missing: :

http "empty" example de 80

Somewhere must be a little bug.

No Variable gets a value if a URL with no Port is given


I guess the reason is "?::". No ":" no match. If I change it both URLs are accepted but it does not split up the Port. The port remains at the TopLevelDomain and is joined to the host variable.

Replies are listed 'Best First'.
Re: Perl RegEx (url explode)
by choroba (Bishop) on Nov 01, 2012 at 22:53 UTC
    I noticed just one problem: the placement of the final question mark. The whole port part is optional, together with the colon:
    m% (http(?:s?))? # http (?:(?:://)? (w{0,3})\.{0,1})? # www ((.*)(?:\.)([^:]*)) # domains (?::(\d+))? %x; # port
    Edit: . changed to [^:] in # domains.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Perl RegEx (url explode)
by aitap (Curate) on Nov 02, 2012 at 18:54 UTC
    Isn't URI better in this case? Bigger, but simpler code:
    use URI; for (URI::->new($conf[1],"http")) { my @domain = split /\./, $_->host; my $tld = pop @domain; my $sld = join ".",@domain; my $www = @domain > 2 && $domain[0] eq "www" ? shift @domain : ""; my $host = join ".",(@domain,$tld); print ($_->scheme,$www,$host,$sld,$tld,$_->port); }
    (this code will work even in weird cases like perfectly valid
    Sorry if my advice was wrong.
Re: Perl RegEx (url explode)
by cnd (Sexton) on Mar 31, 2018 at 06:12 UTC

    This answer caters for usernames and passwords too:



    #!perl use strict; use warnings; for my $uri( qw( wss:// +ine_1m/ethbtc@kline_1m/btcusdt@kline_1m +ingobang=&" ftp://username@hostname/ ftp://username:password@hostname/ ) ) { print "in ($uri):\n"; my @parts=($uri=~/^(\w+):\/\/ # scheme (ftp http wss etc) (?:([^:@\/]*) # optional username (?::([^@\/]+)|) # optional password \@|) # username and password are op +tional ( # group all the bits of the UR +L and its dots (?:[a-zA-Z0-9]+\.|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA- +Z0-9]\.)*(?:[a-zA-Z0-9]+|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9]) ) (?::(\d{1,5})|) # optional port (.*)/xo); # path and query parms come la +st for(my $i=0;$i<=$#parts;$i++) { print " $i: $parts[$i]\n" if($parts +[$i]); } }
Re: Perl RegEx (url explode)
by U_nix$_@ (Initiate) on Nov 01, 2012 at 23:07 UTC

    thanks. Thats one of the ways I tried before. It allows both types. With and without Port but produces the following output:

    http www //this one should be "" example de:9944 //this one should be only "de" ## PORT is empty ##

      (.*) Seems to ignore whats coming after it if ":" is optional.
      And the port becomes a part of this:


      But how to fix it? A fixed set of commonly used TopLevelDomains is not felxible enough.

        Try to match a character class that does not contain ':' (i.e. [^:]):

        use strict; use warnings; for my $uri( qw( ) ) { print "in ($uri):\n"; my (@spl) = $uri =~ m|(http(?:s?))? (?:(?:://)? (w{0,3})\.{0,1})? ((.*)(?:\.)([^:/]*)) # match if it is not a ":" (?::(\d{0,10}))? |x; print 'out: ', join(', ', map { defined $_ ? $_ : '-' } @spl), "\n\ +n"; } __DATA__ in ( out: https, www,, example, de, - in ( out: http, www,, example, de, - in ( out: https, ,, example, de, - in ( out: http, ,, example, de, - in ( out: -, www,, example, de, - in ( out: -, ,, example, de, 123 in ( out: http, www,, example, de, 445 in ( out: http, www,, example, de, - in ( out: http, www,, example, de, 445
        Update: Added '/' to character class and example '#foo'

        Ok. The spirit reached me. This fixed it:


        Your "match if not" version is the cleaner one. Merci.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001877]
Approved by Perlbotics
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2018-05-25 09:44 GMT
Find Nodes?
    Voting Booth?