Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Perl RegEx (url explode)

by U_nix$_@ (Initiate)
on Nov 01, 2012 at 23:07 UTC ( [id://1001883]=note: print w/replies, xml ) Need Help??


in reply to Perl RegEx (url explode)

Hi,
thanks. Thats one of the ways I tried before. It allows both types. With and without Port but produces the following output:

http www example.de:9944 //this one should be "example.de" example de:9944 //this one should be only "de" ## PORT is empty ##

Replies are listed 'Best First'.
Re^2: Perl RegEx (url explode)
by U_nix$_@ (Initiate) on Nov 01, 2012 at 23:20 UTC

    (.*) Seems to ignore whats coming after it if ":" is optional.
    And the port becomes a part of this:

    ((.*)(?:\.)(.*))

    But how to fix it? A fixed set of commonly used TopLevelDomains is not felxible enough.

      Try to match a character class that does not contain ':' (i.e. [^:]):

      use strict; use warnings; for my $uri( qw(https://www.example.de http://www.example.de https://example.de http://example.de www.example.de example.de:123 http://www.example.de:445/can?this=happen&too=1#lalala http://www.example.de/can?this=happen&too=1#foo http://www.example.de:445 ) ) { print "in ($uri):\n"; my (@spl) = $uri =~ m|(http(?:s?))? (?:(?:://)? (w{0,3})\.{0,1})? ((.*)(?:\.)([^:/]*)) # match if it is not a ":" (?::(\d{0,10}))? |x; print 'out: ', join(', ', map { defined $_ ? $_ : '-' } @spl), "\n\ +n"; } __DATA__ in (https://www.example.de): out: https, www, example.de, example, de, - in (http://www.example.de): out: http, www, example.de, example, de, - in (https://example.de): out: https, , example.de, example, de, - in (http://example.de): out: http, , example.de, example, de, - in (www.example.de): out: -, www, example.de, example, de, - in (example.de:123): out: -, , example.de, example, de, 123 in (http://www.example.de:445/can?this=happen&too=1#lalala): out: http, www, example.de, example, de, 445 in (http://www.example.de/can?this=happen&too=1#foo): out: http, www, example.de, example, de, - in (http://www.example.de:445): out: http, www, example.de, example, de, 445
      Update: Added '/' to character class and example '#foo'

      Ok. The spirit reached me. This fixed it:

      ((.*)(?:\.)([a-zA-Z]*))(?::(\d{0,10}))?

      Edit:
      @Perlbotics,
      Your "match if not" version is the cleaner one. Merci.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1001883]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-25 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found