Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

grep trouble

by LogMiner (Novice)
on Apr 17, 2011 at 04:14 UTC ( #899784=perlquestion: print w/ replies, xml ) Need Help??
LogMiner has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks ! Have been chasing a bug in a large script, and finally distilled the issue into 3 lines. And for the life of me can't figure out what's going on. Here you go:
use strict; use warnings; print grep({"" =~ /$_/} (""))."\n"; "foo" =~ /foo/; print grep({"" =~ /$_/} (""))."\n";
Basically, there are two identical grep calls, with a regex between them. This prints (at least in ActiveState Perl 5.12.3 on XP SP3):
1 0
Why in the world does the "foo" regex in the middle affect the second grep result ?

Comment on grep trouble
Select or Download Code
Re: grep trouble
by davido (Archbishop) on Apr 17, 2011 at 04:32 UTC

    Passing an empty string in $_ creates an empty regexp pattern, as in "m//". This is a special case discussed in perlop, here:

    The empty pattern //

    If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead.....If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match).

    Your first match occurs in that vacuum of acting like a genuinely empty pattern, which will always match. The final test is like asking if "" =~ m/foo/, which doesn't match.


    Dave

      Are there any reasonable use cases/idioms for this feature?

      If not, shouldn't there be a pragma to disable these edge cases?

      Cheers Rolf

        It could be a convenience for a switch with fall-through, or along similar (but more oddball lines), possibly an implementation of Duff's Device.


        Dave

        It furthermore won't match at the same place twice, being a special case for a zero-length match. So // is just perfect for splitting a string into individual character, using split. I did that just yesterday.

      Thanks davido. I guess the grep-evaluated block should look like:

      {$_ eq "" or "" =~ /$_/}

      (of course the actual logic in my script is different, as there are easier ways to do the above)
        I'm still not sure what you're trying to achieve here, cause your grep returns the PATTERNs which matched.

        Are those patterns simple words? If yes you could consider constructing an or-regex:

        DB<100> @patterns=qw#one two three# DB<101> $str="one two" DB<102> $re=join "|",@patterns DB<103> print $str =~ m/($re)/g onetwo

        UPDATE:

        just noticed there are still subtle differences:

        DB<106> $str="two one two" DB<107> print $str =~ m/($re)/g twoonetwo DB<108> print grep {$str=~/$_/} @patterns onetwo

        Cheers Rolf

        you also have to check for undef:

        DB<120> print scalar grep {"a"=~/$_/} ("a","",undef) 3

        Cheers Rolf

        Your links are broken. Please use [doc://grep] to link to the Perl documentation for grep. It will render like grep.

Re: grep trouble (OT)
by Eliya (Vicar) on Apr 17, 2011 at 06:57 UTC
    And for the life of me can't figure out what's going on.

    Don't worry, you're not the only one who's been bitten by this silly feature (as pointed out by davido).

    Another "favorite" one of mine is this (see perlrun):

    "If the #! line does not contain the word "perl", the program named after the #! is executed instead of the Perl interpreter. (...)"

    <rant>

    Let's say you wanted to write a wrapper something like this

    #!/bin/sh eval 'PATH=/usr/local/bin:$PATH \ exec perl -x -S $0 "$@"' if 0; #!perl print "this is $^X, version $]\n";

    with the idea being that your preferred perl (/usr/local/bin/perl) is being used if found, or else some other (possibly older) one on the default PATH.  Also, you'd like people to be able to call the script via

    /path/to/some/other/perl script.pl

    so they can directly use whatever perl they have in some non-default location.  Similarly, your Windows users should be allowed to simply say

    perl script.pl

    (which is why you've put the eval '...' if 0; around the shell code)

    The first part works fine, i.e. if you call the script via the shebang/shell wrapper, you'd get as expected

    $ ./script.pl this is /usr/local/bin/perl, version 5.012002

    but if you try to use some other perl, e.g.

    $ /usr/bin/perl -v This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi $ /usr/bin/perl script.pl this is /usr/local/bin/perl, version 5.012002

    you still get the same perl version as before!  And even worse, your Windows users would simply get

    >perl script.pl Can't exec /bin/sh at script.pl line 1.

    Why is that?  Well, Perl tries to be helpful and - as it doesn't see "perl" on the first #! line - calls /bin/sh for you...

    Sure, you can make it work by saying

    perl -x script.pl

    but what mere mortal user would think of doing so?

    </rant>

Re: grep trouble
by LanX (Canon) on Apr 17, 2011 at 15:14 UTC
    Two thoughts...

    1. IMHO using $_ to hold a pattern is anyway a bad style!

    a) since the default DWIM behaviour for m/PATTERN/ is to match in $_

    b) because the content of $_ could easily be affected by other side effects.

    2. So what is the behavior you expect from an empty match-pattern?

    Some will expect always true¹ others always false² and a third party would want a warning³.

    So you should take care about valid patterns.

    Cheers Rolf

    1) mathematically correct

    2) "empty strings are false" and "nothing to match => no match"!

    3) 'Invalid pattern "" in line ...'

      I wanted to address a few of your points. I intended to do so earlier, but just didn't have the time until now.

      1. You mentioned that using $_ to hold a pattern is bad style. But he's using grep. If he's passing patterns to grep what choice does he have? Remember: my @found = grep { do something with $_ } @input_list;. If he really intended grep { $something =~ m/$_/ } @array; do you know of some other way to use grep that wouldn't involve $_?

        Your sub-item (a): The default behavior of the m// operator is to match against $_, but he didn't invoke its default behavior.

        Sub item (b): Yup... but not in this case.

      2. That's a better question: Why is he using an empty string as the pattern in a regexp? Additionally, why would he be interested in matching an empty pattern against an empty string literal. But I have a feeling that's not the whole story. His input list to grep probably consists of a lot of patterns, which may include an empty string. Why is he matching against the literal empty string? That's again probably just a boiled down example of the problem. I suspect that where he showed us a literal empty string, there's probably a foreach loop with an iterator on the lefthand side of the match operator, as in the following snippet:

        my @patterns = ( .................. ); foreach my $iterator ( @array ) { my @found = grep { $iterator =~ m/$_/ } @patterns; do_something_with( @found ); }

        You make a good point that the behavior of a m// (empty pattern match) is impossible to intuitively predict. Though the behavior may be useful in some situations, those situations probably warrant a comment in real-world code so that it doesn't initiate a head-scratching and document-reading session when someone looks at the code six months later. Regardless of whether the feature is generally familiar to people, it seems to be accurately documented. It might be a dusty corner in the halls of pattern matching, but Perl is full of dusty corners that provide useful features when nothing else would be quite as convenient.


      Dave

        1. > But he's using grep.

        Agreed, but he could copy $_ and define fall-backs for edge cases like "". (I'm not sure if he should)

        grep { my $pat = ( $_ eq ="" ? EMPTY_PAT : $_ ); $string =~ m/$pat/g } @patterns;

        If this gets more complicated he should consider using a function instead of an anonymous block.

        2. > I suspect that where he showed us a literal empty string, ...

        I suppose he is checking multiple log-files simultaneously for the same matching patterns. By grepping patterns, he is calculating the intersection of all patterns which apply to all files.

        > those situations probably warrant a comment in real-world code

        That's why I would prefer a special var $PATTERN for this behavior.

        Cheers Rolf

        UPDATE: what bothers me is not this special behavior of a literally empty match m// but that of "If the PATTERN evaluates to the empty string". IMHO that $pat="";m/$pat/ behavior is difficult to justify.

        > I'm still not sure what you're trying to achieve here, cause your grep returns the PATTERNs which matched.

        Yes, LanX, and in scalar context it returns the NUMBER of patterns that matched. That's what I'm using in my (much larger) script.

        > ...His input list to grep probably consists of a lot of patterns, which may include an empty string. ... where he showed us a literal empty string, there's probably a foreach loop with an iterator on the lefthand side of the match operator...

        That's exactly what I'm doing, davido.

        Obviously, if I posted a (much larger) block of real-life code here, it would make more sense (to those who bother comprehending it), but there would be much fewer interested people. A short, distilled version of the problem may not make real-life sense, but is much easier to discuss.
Re: grep trouble
by SimonClinch (Chaplain) on Apr 18, 2011 at 13:49 UTC
    All this reminds me of why a colleague of mine stubbornly refuses to use any built-in variables, not even $_. But of course, as this example demonstrates, ignorance is not always bliss - even avoiding them doesn't stop them sneaking up and biting you on the arse. I think on the whole it is better to understand (read up on) the impact of code on the built-ins, especially $_ and the matching variables, and vice-versa. Merely being aware of their behaviour would prompt using debug and setting a watch on these variables before testing the relevant areas of code, when encountering such "anomalies".

    One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://899784]
Approved by davido
Front-paged by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-07-25 04:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (167 votes), past polls