http://www.perlmonks.org?node_id=746682

Antigone has asked for the wisdom of the Perl Monks concerning the following question:

This is an unusual question, but here goes. I'll ask the question generally, and then provide a little more info below. The question: What are some common Perl bugs--specifically related to control flow and/or pattern matching--that tend to cause run time errors or simply incorrect output? It's a long story, but I'm testing out someone's debugging abilities, and I'd like to insert a bug in my own code and see if she can weed it out. More about my code: I have a script that reads a logfile for my company's web page and computes various statistics for each user based on the types of pages the user has visited. The log is a textfile which is made of lines that look like this: UserIPAddress <tab>Unix system time<tab>URL So the program reads the text file line by line. While doing so, it uses some pattern matching expressions to check what part of the site each URL belongs to, and then computes the total amount of time each visitor spends on different parts of the site. I realize that this is very general, and that any answers will also be quite general as a result. I'd appreciate any suggestions that you may have given this general info. Thanks! Anna
  • Comment on I need a "non-trivial" bug for my script!

Replies are listed 'Best First'.
Re: I need a "non-trivial" bug for my script!
by almut (Canon) on Feb 26, 2009 at 22:09 UTC

    As you're doing regex matching, you could construct something around this feature:

    From perlop:

    m/PATTERN/cgimosx

    (...)
    If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead.

    I've seen rather experienced programmers having been bitten by this...

      Another good regex-related one is blindly assuming a match succeeded and geting "stale" results from the last one which was successful sitting in $1 and friends.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        Hah, I hated that one! Out of curiosity, what is the best method of fixing it? Undefining $1 before the regex (not sure if that is possible), testing to see if the regex is successful, or something else?

        And you didn't even know bears could type.

Re: I need a "non-trivial" bug for my script!
by GrandFather (Saint) on Feb 26, 2009 at 22:20 UTC

    Confusing the scalar context range operator (flip flop) and the list context range operator is pretty nasty. Consider:

    my @array = qw(1 2 3 4 5 6 7 8 9); print "$array[0..3]";

    What does it print for you? For a little more explanation see the last part of Flipin good, or a total flop?.


    True laziness is hard work
Re: I need a "non-trivial" bug for my script!
by ikegami (Patriarch) on Feb 27, 2009 at 05:12 UTC

    /./ is often assumed to mean "any character", but doesn't match new lines. Conversely, it could have been meant as a literal "." rather than the meta character it is in patterns. Particularly where IP address or domain names are involved.

    Unanchored patterns is another problem /10\.0\.0\./ will match "110.0.0.0", for example.

    I'm not sure how using common problems is going to be of use as a gauge for debugging skills. I would have red-flagged those instances even before starting to debug.

Re: I need a "non-trivial" bug for my script!
by Perlbotics (Archbishop) on Feb 26, 2009 at 23:12 UTC

    The silent and sporadic errors are the nasty ones (from my PoV). Some ideas:

    • Control flow: A conditional stray last or next in a subroutine that is called from within a loop can have interesting effects.
    • Variable names: Take a descriptive name but use it for something else, e.g. store user-ids in %IP_Tables. Or use something like $result_aref but don't let it refer to an array.
    • I/O: print in combination with select offers some pitfalls...
    • List context: remove or displace some parenthesis ...
    • Return values: implicit return values (instead of a proper return)...
    • Side effects (global variables w. and w/o local)...
    • Implicit use of @_, e.g. by &some_sub; (see perlsub)...
    • shift can operate on @_ or @ARGV...
    • poison input: remove/add some TAB characters... swap some columns...
    • ...
    Is this approach intended to test a maintenance programmer? Maybe I got something wrong, but I feel a little bit uncomfortable with the approach of laying out a code-minefield. Wouldn't it be more constructive to give the candidate a task to start from scratch? ... or to find an alternative solution to the already solved problem? ... or let her/him optimise/refactor the script concerning speed/memory/code-formatting/robustness? ... turn a script into a module? ... wrap a GUI around it? ...

Re: I need a "non-trivial" bug for my script!
by bellaire (Hermit) on Feb 26, 2009 at 22:14 UTC
    Since you're computing time spent in various places by comparing timestamps from one line to another, you could introduce a bug in the indexing of those pre-computed times. For example, comparing everything against the last match of any user, rather than the last match for that user. Or perhaps comparing the timestamp on each line with the first line on which that user's ip address ever appears, for times that continue to increase into infinity.

    Perhaps a bug in the regular expression that causes two non-identical IP addresses to match each other when they should not.

    I'm shootin' in the dark here. It would be easier if we could see the code. :)

      Actually as soon as you introduce comparing time stamps issues such as non-synced time sources, day roll over, daylight savings changes, leap seconds, timezone issues and date formats present a huge range of bug possibilities.


      True laziness is hard work
Re: I need a "non-trivial" bug for my script!
by tilly (Archbishop) on Feb 27, 2009 at 05:09 UTC
    If you want to be mean...
    sub foo { my $condition = shift; my $some_var = generate_var() if $condition; # Use $some_var in an interesting way } # Elsewhere in code... foo(0); foo(1); foo(0);
    More reasonably, take strict out, and put a typo or two in a variable name. Use a hash, and have a typo in the hash key somewhere. Generate a report, and have the header and the fields not actually match up. Deliberately create a bug in one place, then put a comment in another suggesting that something like that bug would exist for another reason. Have a log with something like a free-form text field in it (an Apache error log would work) and ask why the number of records from processing it with a correct program don't match the number of lines in the log.

    These are all (including the nasty at the start) based on real problems that I have encountered while debugging code.

Re: I need a "non-trivial" bug for my script!
by toolic (Bishop) on Feb 26, 2009 at 22:18 UTC
Re: I need a "non-trivial" bug for my script!
by ELISHEVA (Prior) on Feb 27, 2009 at 02:17 UTC
    This isn't really a fair test unless:
    • What the code is supposed to do is clearly spelled out
    • The bug violates one of the spelled out expectations in some obvious way.

    Otherwise, you may be testing your ability to define specs and ask questions more than you are testing the other person's debugging abilities.

    Best, beth

Re: I need a "non-trivial" bug for my script!
by ruzam (Curate) on Feb 27, 2009 at 02:17 UTC
    I often create function (method) stubs along the lines of:
    sub my_method { shift->{some_parm} }
    Sometimes, the function, takes more parameters. Recently I made the mistake of doing something like this:
    sub my_method { shift->{shift} }
    The idea was to use the second parameter as a key to the hash ref in the first parameter. But instead I got the literal 'shift' as a key. The code should have been:
    sub my_method { shift->{shift()} }
    Took me a while to find that one and no amount of staring at it made it any easier to find.