Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: Perl RegEx (url explode)

by U_nix$_@ (Initiate)
on Nov 01, 2012 at 23:20 UTC ( #1001884=note: print w/ replies, xml ) Need Help??


in reply to Re: Perl RegEx (url explode)
in thread Perl RegEx (url explode)

(.*) Seems to ignore whats coming after it if ":" is optional.
And the port becomes a part of this:

((.*)(?:\.)(.*))

But how to fix it? A fixed set of commonly used TopLevelDomains is not felxible enough.


Comment on Re^2: Perl RegEx (url explode)
Download Code
Re^3: Perl RegEx (url explode)
by Perlbotics (Abbot) on Nov 01, 2012 at 23:28 UTC

    Try to match a character class that does not contain ':' (i.e. [^:]):

    use strict; use warnings; for my $uri( qw(https://www.example.de http://www.example.de https://example.de http://example.de www.example.de example.de:123 http://www.example.de:445/can?this=happen&too=1#lalala http://www.example.de/can?this=happen&too=1#foo http://www.example.de:445 ) ) { print "in ($uri):\n"; my (@spl) = $uri =~ m|(http(?:s?))? (?:(?:://)? (w{0,3})\.{0,1})? ((.*)(?:\.)([^:/]*)) # match if it is not a ":" (?::(\d{0,10}))? |x; print 'out: ', join(', ', map { defined $_ ? $_ : '-' } @spl), "\n\ +n"; } __DATA__ in (https://www.example.de): out: https, www, example.de, example, de, - in (http://www.example.de): out: http, www, example.de, example, de, - in (https://example.de): out: https, , example.de, example, de, - in (http://example.de): out: http, , example.de, example, de, - in (www.example.de): out: -, www, example.de, example, de, - in (example.de:123): out: -, , example.de, example, de, 123 in (http://www.example.de:445/can?this=happen&too=1#lalala): out: http, www, example.de, example, de, 445 in (http://www.example.de/can?this=happen&too=1#foo): out: http, www, example.de, example, de, - in (http://www.example.de:445): out: http, www, example.de, example, de, 445
    Update: Added '/' to character class and example '#foo'

Re^3: Perl RegEx (url explode)
by U_nix$_@ (Initiate) on Nov 01, 2012 at 23:33 UTC

    Ok. The spirit reached me. This fixed it:

    ((.*)(?:\.)([a-zA-Z]*))(?::(\d{0,10}))?

    Edit:
    @Perlbotics,
    Your "match if not" version is the cleaner one. Merci.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001884]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-09-21 01:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (165 votes), past polls