Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Validate Ip address Regexp

by akr8986 (Initiate)
on Nov 28, 2015 at 11:48 UTC ( [id://1148750]=perlquestion: print w/replies, xml ) Need Help??

akr8986 has asked for the wisdom of the Perl Monks concerning the following question:

hi i am new to perl and was trying to write a script to check if ip address is valid. Though i found some solutions online i wanted to try it on my own

#! /usr/bin/perl -w
print "Enter IP Address:"; my $ip = <STDIN>; if ( $ip =~ /(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/) { print "yes match $1 $2 $3 $4\n"; } else { print "no match\n"; }

when i enter 10.3.4.5 it says ip address matches as expected. But when enter 1000.3.4.5 too it says match is found and the values of $1 to $4 are printed as 000 3 4 5. How could this happen as i am saying to match only 3 digits and not more than that using "\d{1-3}" englighten me!

Replies are listed 'Best First'.
Re: Validate Ip address Regexp
by ww (Archbishop) on Nov 28, 2015 at 12:10 UTC

    You're correct in believing that your regex says match one to three digits... but more precisely, it says, match one to three digits before a dot... and if the suspect value you feed it has something before the three-digits-before-a-dot it will match that value, too, ignoring, as does your regex, the number(s) (or letters, symbols, etc) which precede the three-digits-before-a-dot.

    Illustrating anchors (where ^ means start of string (to the regex) and $ marks end of string:

    # 1148750 print "Testing IP Addresses in array, \@ip: \n"; my @ip = ("0123.456.789.654", "a123.456.789.654", "123.456.789.6543", "111.222.333.444") ; for $ip (@ip) { if ( $ip =~ /(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/) { print "yes $ip has a match: $1 $2 $3 $4\n"; } else { print "no match\n"; } } print "\n\t whereas, with anchors in the ip, \n"; for $ip(@ip) { if ( $ip =~ /^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/) { print "Second regex matches $1 $2 $3 $4\n"; } else { print "no match in $ip \n"; } } =head execution results: Testing IP Addresses in array, @ip: yes 0123.456.789.654 has a match: 123 456 789 654 yes a123.456.789.654 has a match: 123 456 789 654 yes 123.456.789.6543 has a match: 123 456 789 654 yes 111.222.333.444 has a match: 111 222 333 444 whereas, with anchors in the ip, no match in 0123.456.789.654 no match in a123.456.789.654 no match in 123.456.789.6543 Second regex matches 111 222 333 444 =cut

    Two Updates in para 1: s/n/in/ and edited for clarity my explanation of what happens when no anchor is specified.

      Thanks a ton!! helped me to understand a lot and significance of ^ and $

        Now, and without using or looking at Regexp::Common::net or other such modules or Super Search, write, as an exercise (for it may be better in practice to use multiple regexes and separate tests), a single regex that will accept '255.1.12.123' and reject '256.1.12.123' or '300.1.12.123'. (Think of the alternate ranges (hint) of numbers involved: 0-9, 00-99, 000-199, 200-249, 250-255.) Then write a regex that will extract (or parse) valid IPv4 decimal octet addresses from an arbitrary string:
            'foo255.1.12.123bar1.2.3.4 x 11.2.33.44 y 300.123.12.1z000123.3.2.11111'
        Now you've got something.

        See perlre, perlretut, and perlrequick.


        Give a man a fish:  <%-{-{-{-<

Re: Validate Ip address Regexp
by neilwatson (Priest) on Nov 28, 2015 at 13:36 UTC

    Regexp::Common already has a regex for you. Don't stress building your own.

    use Regexp::Common qw/ net number /; # these become available: # $RE{net}{IPv6} # $RE{net}{IPv4} if ( $ip =~ m/$RE{net}{IPv4}/ ){ print 'match!' }

    Neil Watson
    watson-wilson.ca

      Don't stress building your own.

      ... except as an exercise for building your familiarity with and confidence in regex construction.


      Give a man a fish:  <%-{-{-{-<

Re: Validate Ip address Regexp
by BillKSmith (Monsignor) on Nov 28, 2015 at 13:28 UTC
Re: Validate Ip address Regexp
by Laurent_R (Canon) on Nov 28, 2015 at 16:31 UTC
    Besides the beginning and end of string anchors missing in your regex (as pointed out by ww and possibly other monks above), which explains why your regex matched something like "1000.3.4.5", there is an additional deeper problem in your regex. The digit groups of an IPv4 address have to be octets, i.e. numbers between 0 and 255 in decimal notation.

    Your regex would validate an address such as 258.344.543.877, whereas none of the digit groups fits the definition of an IP address, as they are all larger than 255.

    How do we check that the numbers are correct?

    One possible way is to start by creating an $octet regex, which will match only numbers between 0 and 255. It could be something like this:

    my $octet = qr/\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5]/;
    which basically means: 1 or 2 digits (0-99), OR a 0 or 1 followed by 2 digits (000-199), OR a 2 followed by a digit between 0 and 4 and any other digit (200-249), OR a 25 followed by a digit between 0 and 5 (250-255).

    Once you have defined such an $octet regex, you can use it to further define an $ip regex:

    my $ip = qr/^$octet\.$octet\.$octet\.$octet$/;
    in which you may want to factor out the repetition of $octet with something like this:
    my $ip = qr/^(?:$octet\.){3}$octet$/;
    which is, at best, only marginally clearer.

    Provided I did not make any small silly mistake in the code above (I only tested a few obvious cases), this should really be able to validate any valid IPv4 address (and hopefully reject any invalid one).

    Please note that I am providing the above only because you stated that you wanted to try on your own, and also because learning to build a regex from a sub-regex is sometimes really useful and may give you an initial idea of what Perl 6's or other regex-based grammars are about.

    For a real life Perl 5 application, I would agree with other monks above, really recommend not to try to reinvent the wheel and advise you to use the proper ready-made module (such as Regexp::Common::net), which has undoubtedly been more thoroughly tested than my quick attempt above.

Re: Validate Ip address Regexp
by VinsWorldcom (Prior) on Nov 28, 2015 at 14:11 UTC
Re: Validate Ip address Regexp
by shmem (Chancellor) on Nov 28, 2015 at 19:19 UTC

    Regular expression is the wrong approach IMHO. The IPv4 address consists of four bytes (0..255) separated by dots.
    Have a loock at split and pack. If you pack the byte values and unpack them again and assemble the IP string, origin and result should be equal if the origin is a valid IP address.

    print "Enter IP Address:"; chop(my $ip = <STDIN>); my $parsed = join'.', unpack "C4", pack "C4", split/\./, $ip; if ($parsed eq $ip) { print "yes match $parsed\n"; } else { print "no match: parsed $parsed <> input $ip\n"; }

    This of course doesn't work for IP addresses written with leading zeroes e.g 001.012.013.225 - which I consider bad practice anyways, because in my book a leading zero marks an octal value.

    If you want to be absolutely sure, you can isolate the four bytes with a regular expression limiting each string representation to 3 chars of type number, i.e (\d{3]), then run the four values through pack/unpack as described above. There are many other ways to do this task...

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      Regular expression is the wrong approach IMHO.
      Well, quite possibly, but the OP wanted to practice regular expressions (which is why I gave a pure regex solution) and, besides, using split is in fact using regular expressions.

      But I agree that checking for numbers in the 0-255 range with a pure regex is somewhat unwieldy. An easier way might be something along these lines:

      print "Valid IP\n" if 4 == grep { /^\d+$/ and $_ < 256 } split /\./, $ +ip;
      (although this is still not entirely satisfactory, since this would validate something like "23.45.aa3.234.244"", so a bit more effort might be needed).

      Note that Perl 6's regexes allow a code assertion to be inserted within a regex, leading for example to something like this:

      my regex octet {(\d ** 1..3) <?{0 <= $0 <= 255 }> } my regex ip {^ <octet> ** 4 % '.' $}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1148750]
Approved by Athanasius
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-19 04:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found