Re: Don't Use Regular Expressions To Parse IP Addresses!by atcroft (Monsignor)
|on Dec 20, 2002 at 23:57 UTC||Need Help??|
In private conversations with ybiC, we discussed one of the major problems with trying to use regex-en to test an IP for validity: whether it makes sense to be using it in a particular application. For instance, it would not make sense to use a class D or E address when configuring a PC. Or, in many cases, addresses in RFC-defined private addressing space would not be appropriate, but is the case under examination one where such an address is appropriate? To truly test for validity of the address would thus seem to require knowledge specific to the application and its environment, either coded into the application, or determined by some form of active testing.
If appropriate, one direction you can go is to remove the user's ability to cause errors by presenting them with a valid grouping of addresses to select from, which is the approach I have taken in one of the applications I have written for work. The listing depends upon the addresses entered into that listing to be valid, and so again the problem raises its head.
In the case of adding the addresses for the application I mentioned, unfortunately I can only truly depend upon the vigilance of those administrators adding data the application will pull from to make sure it is correct and valid, as I can only test for those cases where the data is formatted incorrectly-not where it is valid but inappropriate.
In discussing with ybiC, there are cases that fall into ranges that can be useful filters, such as the aforementioned class D/E address space, the localhost addressing space, or the RFC-defined private address space. To that end, I offer what I hope are some useful filters that may aid in this. Assuming we have validated that the format is proper (remembering both the "Traps and Snares" and "Multiple Representations" sections above), let us first convert the address in question to a number (my appologies if there are errors on these, as I generally only use the a.b.c.d format). Having done thus, it is now much easier to filter, or convert to whichever format is needed (by doing much the reverse of the ip2bin? functions). Now, sample code.
Other, similar tests I believe could easily be written from this point-these were examples. Admittedly, while I am sure there are probably modules in CPAN to perform tests of this type, I do not know them off-hand, so I welcome the input of others.
It is important to remember that to truly validate the representation of an IP address, regex-en are but one part, as one must understand the environment in which it is to be used.
Update: Extended comments in ip2bin? subroutines.
Update: Fixed bug in test for 172.16.0.0, because of incorrect CIDR (was /16, is /12).
Update: Added routines for Link-Local (169.254.0.0/16) and TEST-NET (192.0.2.0/24) address ranges.
Update: Fixed typo in code.
Update: (17 Mar 2005) Fixed missing '(' in conditions in is_linklocal and is_testnet functions.