http://www.perlmonks.org?node_id=334091

A non-Perl mailing list I'm on is tossing around Perl fragments. One, for determining whether a dotted quad IP address is valid, basically reads:

sub isValidIP { my $ip = shift; if ( ($a,$b,$c,$d) = $ip =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ ) { if (($a|$b|$c|$d) < 256 ) { return 1; } } return 0; }

Now, without running this code, can you see how and why this will run off the rails, and why a test case that includes "0.0.0.0", "127.0.0.1", and "255.255.255.255" won't demonstrate the problem?

So as not to give away the punchline and thus deprive people of a good puzzle, reply by posting a valid dotted-quad IP address that will demonstrate this routine failing. I'll post an explanation in 24 hours. (It doesn't have anything to do with \A and \Z.)

(Updated to correct a mistranscription, but one not related to the problem.)

Replies are listed 'Best First'.
Re: Can you spot the problem?
by hossman (Prior) on Mar 05, 2004 at 06:09 UTC
    • I'm always suspicious of code that uses Bitwise operators
    • I'm suprised no one has pointed out any false positives, ie: 999.0.0.222
Re: Can you spot the problem?
by Abigail-II (Bishop) on Mar 05, 2004 at 01:47 UTC
    200.0.0.1

    Abigail

Re: Can you spot the problem? (golf)
by tye (Sage) on Mar 05, 2004 at 02:41 UTC
    All Perl puzzles turn into Perl golf.

    I can fix it with only 2 characters. (:

    - tye        

      Me too. Question is, which way is neatest. Can it be done with one char? In fact, I'm fairly convinced it can't. I can come up with several two-char changes which make it work, but no one-chars.
        How'd we compare notes / or check the solution without doing a spoiler for everyone else?

        There is a way...

        Have you ever played any Interactive Fiction?

        Or read any game-related usenet groups?

        One hint left...


        ;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print
      Hey, you ran it in your head. Bad tye. :)
Re: Can you spot the problem?
by thelenm (Vicar) on Mar 05, 2004 at 03:23 UTC

    172.16.0.2 (but not 172.16.0.1)

    -- Mike

    --
    XML::Simpler does not require XML::Parser or a SAX parser. It does require File::Slurp.
    -- grantm, perldoc XML::Simpler

Re: Can you spot the problem? (solution)
by dws (Chancellor) on Mar 05, 2004 at 23:09 UTC

    It's no secret at this point that the "problem" is the ORing of strings instead of numbers.

    This code fragment derives from discussion about an interview question that asked for a code fragment for determining if a dotted quad IP address was valid. One straightforward solution was basically

    if ( ($a,$b,$c,$d) = m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ ) { if ( $a < 256 && $b < 256 && $c < 256 && $d < 256 ) { return 1; } } return 0;

    which works, though there are better ways to parse IPv4 addresses.

    In the course of "improving" the solution, someone reached into their bag of tricks and pulled out one often used in assembly language (and occassionaly in C): OR the numbers together and compare once, thus trading off 3 tests and branches against less expensive boolean operations. Unfortunately, Perl's automagic string to number conversion then doesn't happen, and the "improved" expression ORs strings instead of numbers. But, it looks like it should work! And, in one of those amusing quirks that happen now and then, you might not notice the problem if you happened to pick a simple set of test data, such as this one, which tests boundaries and then probes into the middle of the range.

    0.0.0.0: "0" | "0" | "0" | "0" eq "0" OK 255.255.255.255: "255" | "255" | "255" | "255" eq "255" OK 127.0.0.1: "127" | "0" | "0" | "1" eq "127" OK

    My solution was to force the conversion via

    return 1 if (0+$a|0+$b|0+$c|0+$d) < 256;

    which tye hinted at above (and pointed out privately) could more simply be written as

    return 1 if (0|$a|$b|$c|$d) < 256;
Re: Can you spot the problem?
by graff (Chancellor) on Mar 05, 2004 at 03:10 UTC
    (sorry... I had to run the code :P -- but in my own defense, I only had to run it a few times to see the pattern.)

    No cigar for bmann, I'm afraid -- 192.168.0.0 thru 192.168.0.16 all pass, but 192.0.0.2 does not, nor does 192.20.0.0. (update: -- having seen bmann's update -- no, 192.168.255.255 will pass, but, e.g. 192.1.2.2 will fail.)

    The key to the problem involves paying attention to the first digit in each quad, particularly in the cases where at least one quad has three digits and at least one other has fewer digits. It does make sense, and it's quite an amusing trap. Thanks.

    (another update: different digit counts is not an essential property -- 100.100.100.200 also fails.)

      So you ran the code and gave hints. Bad dog. No cookie for you.
      You are partially right and partially wrong. You found a partial correlation between the two, but 213.244.213.213 will fail, but 213.243.213.213 will pass. And that's NOT using the first digit of every set. :)
        Thank you -- I did have a sense that I must have missed something (as I often do with puzzles like this). Now I don't feel so bad about not getting the cookie.
      Doh! Good thing I don't smoke cigars (very often)...

      Well, I'll console myself with the fact that I knew why the subroutine wouldn't work - I just chose a bad IP range trying to be cute ;) I overlooked one small detail<<<

Re: Can you spot the problem?
by little (Curate) on Mar 05, 2004 at 11:35 UTC
    this
    if (($a|$b|$c|$d) < 256 ) { return 1; }
    should be
    if (($a < 256) && ($b < 256) && ($c < 256) && ($d < 256) ) { return 1; }

    Have a nice day
    All decision is left to your taste

      Or you could just change the first if from

      if ( ($a,$b,$c,$d) = $ip =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ ) {

      to

      if ( ($a,$b,$c,$d) = map $_+0,$ip =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ ) + {

      and the code will work as expected.

      antirice    
      The first rule of Perl club is - use Perl
      The
      ith rule of Perl club is - follow rule i - 1 for i > 1

      Exactly the opening post attempts to be clever, but instead it does something that doesn't even make sense... ($a|$b|$c|$d) is not any() from Quantum::Superpositions, it's an or operation on a series of values and then a conditional. Horribly bad form and it should have never been written that way in the first place.

      ( For the vast masses downvoting this, BTW, please explain why the OP post is exemplifying good software design... think about it... not writing maintainable working code and falling for these kind of traps is what gives Perl programmers a bad name .. and this is *not* helping. I don't mind taking the hit here, go ahead, please stand up for your right to write broken code and drag down Perl in your workplace, etc. )

        ... the opening post attempts to be clever, but instead it does something that doesn't even make sense...

        It make sense if you understand a lower-level idiom. To be fair, I left off the thread of discussion that led up to something close to that particular snippet.

        The problem, at the level of that statement, is to determine if any one of the octet numbers exceeds 255. You can compare the numbers one by one, or you can take advantage of knowing that 255 == 0xFF, and that if any one of the octet numbers is larger than that, the OR of all the numbers will have bits set above 0xFF, and will hence be greater than 255 (and conversly, a valid quad will OR to less than 256). At a raw instruction level, doing three ORs and one test/branch is faster than doing four individual test/branches if the majority of the IP addresses are going to be valid. This works well in assembler, and in C (which is a limited number of steps above assembler). If you haven't played at that level, the idiom may seem strange. And, unless performance really, really matters, it's not a technique that I'd use in production Perl. But for purposes of the puzzle, it's great, since it's so misleading.

        instead it does something that doesn't even make sense

        I disagree. Most people would expect a bitwise-or of a series of numbers. In many languages it would do exactly what the author intended.

        ...but we're moving hint territory :-)

        If you're talking about good software design, you shouldn't be parsing IPv4 addresses with regexen in the first place. I took the OP as a clever puzzle and nothing more. I think it's unfair to judge it on the basis of good practices.

        ----
        : () { :|:& };:

        Note: All code is untested, unless otherwise stated

        For the vast masses downvoting this, BTW, please explain why the OP post is exemplifying good software design...

        S'funny, but no matter how may times I re-read the OP, I can't find the reference to "good software design". From my reading, it appears to be holding the code up as an example of (at best) a gotchya, and probably bad design, but mostly, as a puzzle. Something fun to try a solve by looking rather than debugging.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        ( For the vast masses downvoting this, BTW, please explain why the OP post is exemplifying good software design... think about it... not writing maintainable working code and falling for these kind of traps is what gives Perl programmers a bad name .. and this is *not* helping. I don't mind taking the hit here, go ahead, please stand up for your right to write broken code and drag down Perl in your workplace, etc. )
        If you ever used the reason why it works for other things, it's fairly obvious. I remember seeing &sub($a,$b), sub( $a, $b ) and sub $a, $b, which are all the same for the most part, but are different formats. It was really unclear when people nested paren'-less subs. But as I got more experienced, as a programmer.. as a perl developer.. things got clearer. So the mistake in the program made me wonder, ok, this doesn't work, i know WHY it shouldn't work, lemme do some examples on paper using the logical error.

        But I don't think you got downvoted solely for your opinion, 'cause to some degree, you are right, it could be clearer, but then again, $a||='bob' is too, eh?

Re: Can you spot the problem?
by adrianh (Chancellor) on Mar 05, 2004 at 11:51 UTC

    200.050.006.000 - hopefully this won't count as a hint :-)

    Nice. I've been caught by this odd corner of Perl before.

Re: Can you spot the problem?
by duff (Parson) on Mar 05, 2004 at 16:26 UTC
    I was going to do something clever like post the smallest valid IP address that illustrates the problem, but I decided that I'd rather point out a few well-known sites that have "invalid" IPs :-)
    thinkgeek  66.35.250.160
    perl.com   208.201.239.56      
    google     64.233.161.104
    theonion   66.216.104.235
    
    I was really hoping that perlmonks.com itself would be "invalid", but alas it is not so.

    Oh, I noticed some golfing for the smallest fix. I can't come up with anything smaller than 2 characters. Unless of course, you count removing the entire sub and replacing it with Regexp::Common or something. The absolute character count would be smaller then I think :)

Re: Can you spot the problem?
by tachyon (Chancellor) on Mar 05, 2004 at 23:15 UTC

    254.3.2.1

    25 strokes which is roughly -100 strokes to fix it :-)

    sub isValidIP { #234567890123456789012345 use Socket;inet_aton(pop) }

    cheers

    tachyon

      Very good. I like that solution. But at that point you could more easily do this:

      use Socket;*isValidIP=\&inet_aton;
Re: Can you spot the problem?
by The Mad Hatter (Priest) on Mar 05, 2004 at 02:21 UTC

    Root node updated. Well, for one, that second if needs curly braces. This isn't C after all. ; )

    Just to nitpick with it, $a and $b really shouldn't be used as they're used for sort.

Re: Can you spot the problem?
by perlinux (Deacon) on Mar 05, 2004 at 13:25 UTC
    This is my regex for Ip addresses:
    (([01]?\d\d?|2[0-4]\d|25[0-5])\.){3}([01]?\d\d?|2[0-4]\d|25[0-5])

      Don't parse IP addresses with regexen in real code. The OP makes an amusing puzzle (one which I have admittedly not figured out yet), but nothing more. There are far more valid formats for IPv4 than the traditional dotted-quad one. I don't even want to think about all the formats available for IPv6.

      Look at Valid IP? for more info.

      ----
      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

        You can match an IP-address with a regexp, just don't try to roll your own, do  {use Regexp::Common; /$RE{net}{IPv4}/} instead.

Re: Can you spot the problem?
by dragonchild (Archbishop) on Mar 05, 2004 at 18:36 UTC
    One cute thing is that all the false negatives (or false positives) are actually solution spaces. So, Abigail-II's solution of 200.0.0.1 actually is the solution space containing the elements 200, 0, 0, and 1. So, 1.200.0.0 also is a false negative (for the same reason). (The symmetry is caused by the bit-wise OR, not by the issue at hand.)

    As for fixes ... I would think that this should be something Perl5 should handle through dwimmery (and something it seems that Perl6 will be handling more intelligently, at least according to TheDamian's discussion on sorting ...)

    (And, if I get dinged on providing too much of a hint, at least I got someone interested in the differences between Perl5 and Perl6's sorting ... *grins*)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      Well, depends on how you decide to fix it in Perl 6. The code as written will do superpositional logic, but it will only tell you if one of the numbers is less than 256. You'd have to use & instead of | to do a conjunction.

      If, on the other hand, you try to keep the bitwise-OR semantics, Perl 6 will force you to pick between ~| and +|, and if you pick the obvious one, it numerifies the string and works as intended. So I think Perl 6 is improving things in this area...er, as long as you don't forget and try to use | or & for bit operations. Doubtless that will be a cultural issue for a while...

Re: Can you spot the problem?
by Abigail-II (Bishop) on Mar 09, 2004 at 13:02 UTC
    I've been bitten by a similar issue once:
    my $avg = 0; if (my ($x, $y) = $str =~ m!(\d+)/(\d+)!) { $avg = $x / $y if $y; }
    That died with a 'division by zero' error. Can you spot why?

    Abigail

      Because zero can have leading zeros?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
        Yep, and that's why chanting "don't compare to 0" when someone writes $y == 0 isn't always appropriate. My code would have run fine if the test had been unless $y == 0.

        Abigail

Re: Can you spot the problem?
by bart (Canon) on Mar 05, 2004 at 21:56 UTC
    Time's almost up...

    The reason is because ($a, $b, $c, $d) are all strings. That means "|" works on strings, bytewise: "189" | "2" yields "389". That is unlike 189 | 2 which returns 191 — where the operands are treated as integers.

    Just converting one string into a number is enough to force "|" into numerical mode.

Re: Can you spot the problem?
by bmann (Priest) on Mar 05, 2004 at 02:11 UTC
    Among others, the whole network 192.168.0.0/16...

    Update In other words, any ip in the entire range from 192.168.0.0-192.168.255.255

    Update2 Nope

Re: Can you spot the problem?
by ambrus (Abbot) on Mar 05, 2004 at 16:38 UTC

    Good puzzle.

    100.200.0.1

Re: Can you spot the problem?
by Nkuvu (Priest) on Mar 05, 2004 at 14:10 UTC

    I have to admit that trying to solve puzzles like this pre-coffee are not the best idea. I looked at this for a while, and saw the problem. But I wasn't sure I saw the problem, so I started running the code.

    Even knowing the problem, I couldn't find a good test case to make this fail until I looked at some of the other notes -- I'd become convinced that I was wrong about the problem. But my initial guess was correct.

    So no cookie for me. But I'll settle for my coffee instead.

Re: Can you spot the problem?
by diotalevi (Canon) on Mar 05, 2004 at 22:00 UTC

    8.7.whatever because ("8" | "7") eq "?" and "?" == 0 so ("8" | "7") == 0.

    In fact, here's the key.

    $\ = "\n"; print chr() . ", " . unpack( "B*", chr ) for ord( "0" ) .. ord( "9" ); print; print chr() . ", " . unpack( "B*", chr ) for ord( "2" | "8" ) .. ord( +"7" | "8" ); __DATA__ 0, 00110000 1, 00110001 2, 00110010 3, 00110011 4, 00110100 5, 00110101 6, 00110110 7, 00110111 8, 00111000 9, 00111001 :, 00111010 ;, 00111011 <, 00111100 =, 00111101 >, 00111110 ?, 00111111
Re: Can you spot the problem?
by ambrus (Abbot) on Mar 06, 2004 at 20:13 UTC

    It would be nice if use integer; changed this behaviour. But it does not.

Re: Can you spot the problem?
by ambrus (Abbot) on Mar 05, 2004 at 18:59 UTC

    I wonder weather this code could possibly work in perl6, where the | operator will mean something else (see Exegesis 6).

      Works if you change the all the |s to &s. (Plus you can omit the parens if you're golfing, since junctions now bind tighter than comparisons. Well, okay, you can omit the parens even if you're not golfing.)

        I'd rather leave the |'s and change the comparison from "if ... <256" to "unless ... >255". That way the puzzle still works in perl5.