http://www.perlmonks.org?node_id=390074

bronto has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

Following the suggestions I had from this node, I started coding a one liner but I can't get it to work.

The problem: I have a UNIX alias file and I want to modify only root's alias, and only if it is different from a predefined one. For example:

To test if I well understanded the lesson, I created a file containing...

root: admin@somewhere.here root: someone@somewhere.else any: anybody@anywhere.else

...and wrote a regular expression that I would eventually put into an s/// operator; I expected it to match just the second line, but the one-liner below...

perl -ne '/^root:\s*(?!admin@somewhere.here)/ and print' alliases

actually outputs:

root: admin@somewhere.here root: someone@somewhere.else

which looks quite odd to me, since I expected the first line not to match. I also tried quoting the @ sign with a backslash, \Q...\E or useing strict: no way. More oddly (to me), if I add a \s*$ at the end of the regex to match any whitespace between the address and the end of line, then no line matches!!!

I am getting a little confused, where am I doing wrong?

Thanks in advance, and thanks to everyone that answered to the original post

Ciao!
--bronto


The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz

Replies are listed 'Best First'.
Re: On zero-width negative lookahead assertions
by ccn (Vicar) on Sep 10, 2004 at 14:13 UTC

    there are two errors in the code:

    • @ and . must be backslashed
    • your \s* allows the regexp to match when \s* matches empty string
    You are searching the match and Perl find it for you looking through all possible combinations
      @ and . must be backslashed

      Backslashed: still matches too much

      your \s* allows the regexp to match when \s* matches empty string

      I know it, I expressely want to match 0 or more spaces before line end

      Ciao!
      --bronto


      In theory, there is no difference between theory and practice. In practice, there is.

        Note the difference: perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alliases

Re: On zero-width negative lookahead assertions
by ikegami (Patriarch) on Sep 10, 2004 at 14:24 UTC

    First, don't forget to escape @ and ..

    >perl -lne "/^root:\s*(?!admin\@somewhere\.here)(.*)/ and print $1" \ aliases.txt someone@somewhere.else admin@somewhere.here

    Note the leading space. When the regexp engine failed using all the spaces, it backtracked to \s* matching all but one space. One way to fix it is to anchor it as follows:

    >perl -ne "/^root:\s*(?!admin\@somewhere\.here)\S/ and print;" \ aliases.txt root: someone@somewhere.else

      That works, and I thank you for explaining why. Unfortunately, I can't understand why, in the first case, the parentheses match the leading space, and why putting the \S makes it match, even if there is no non-space character at the end...

      It would be glad if you (or anyone else) could further explain that. I think I'll discover what I didn't understand of zwnla assertions

      Thanks a lot!

      Ciao!
      --bronto


      In theory, there is no difference between theory and practice. In practice, there is.

        Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation.

        It has nothing to do with lookaheads, really. For example, let's look at
        /^ab*bc/

        The regexp can be read as:
        1. Starting at the begining of the string
        2. Match an 'a'.
        3. Match as many 'b's as possible, but not matching any is ok.
        4. Match a 'b'.
        5. Match a 'c'.

        Match against 'abbbbbbc' 01234567 1) ok! pos = 0. (zw) 2) ok! Found an 'a' at pos 0. pos = 1. 3) ok! Found 6 'b's at pos 1 through 6. pos = 7. 4) fail! Did not find a 'b' at pos 7. Backtrack! 3) ok! Found 5 'b's at pos 1 through 5. pos = 6. 4) ok! Found a 'b' at pos 6. pos = 7. 5) ok! Found a 'c' at pos 7. pos = 8. Match!

        Something similiar is occuring with your
        /^root:\s*(?!email)/

        The regexp can be read as:
        1. Starting at the begining of the string
        2. Match 'root:'.
        3. Match as many '\s's as possible, but not matching any is ok.
        4. Match something other than 'email'.

        Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') Match!

        Now let's look at my solution
        /^root:\s*(?!email)\S/

        The regexp can be read as:
        1. Starting at the begining of the string
        2. Match 'root:'.
        3. Match as many '\s's as possible, but not matching any is ok.
        4. Match something other than 'email'.
        5. Match a '\S'.

        Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') 5) fail! Did not find a '\S' at pos 5. Backtrack! Nothing more to try. No match!
        Match against 'root: hisemail' 01234567890123 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla) (found 'hisemail') 5) ok! Found a '\S' at pos 6. pos = 6. Match!

        Backtracking means: (might not be an exhaustive list)

        In the case of the first rule
        Look for a match further on.
        In the case of a * rule or ? rule,
        try matching less.
        In the case of a *? rule or ?? rule,
        try matching more.
        In the case of a | or [] rule,
        try matching the next choice.
        else,
        no match, so backtrack the last matching rule.
        The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.
        There is a non-space character after the \s*. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used:
        /root:(?>\s*)(?!...)/
Re: On zero-width negative lookahead assertions
by pbeckingham (Parson) on Sep 10, 2004 at 14:14 UTC

    The following works if you break it into two expressions, but I can't see why yours doesn't match.

    perl -ne '/^root:\s*/ and $_ !~ /admin\@somewhere\.here/ and print' +alias
    Update: Moving it around also works:
    perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias



    pbeckingham - typist, perishable vertebrate.
      perl -ne '/^root:\s*/ and $_ !~ /admin\@somewhere\.here/ and print' alias

      That's ok, but I want to understand that blah-blah-look-ahead thing

      perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias

      This works! But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s*$ at the end of the regex :-(

      Thanks a lot

      Ciao!
      --bronto


      In theory, there is no difference between theory and practice. In practice, there is.

        > But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s*$ at the end of the regex :-(

        That is because if you have the string

        root: admin@somewhere.here 11111233333333333333333333

        and the RE

        /^root:\s*(?!\s*admin\@somewhere\.here)/ ABBBBBCCC

        then the part A in the RE matches the beginning of the string, part BBBBB matches 11111 ("root:") and CCC matches an empty string (not a space, a string with zero chars in it). After this empty string follows a space, and the space is not the beginning of "admin@somewhere.here", because it is the beginning of " admin@somewhere.here".

        I hope things are getting clearer for you :-)

Re: On zero-width negative lookahead assertions
by antirice (Priest) on Sep 10, 2004 at 15:16 UTC

    A few things:

    1. Don't forget to escape your @ and ..
    2. I also tried escaping the @ sign with a backslash, \Q...\E or useing strict: no way.

      You must escape @ in a regex no matter what. However, be careful with your escaping as it exhibits different behavior depending upon what's around it.

      > perl -l $,=$/; print 'right:', qr(a\@b), qr(a\Q@\Eb), qr(\Qa@\Eb), qr(\Qa\E\@\Qb\E); print 'wrong:', qr(\Qa@b\E), qr(\Qa\@b\E); __END__ right: (?-xism:a\@b) (?-xism:a\@b) (?-xism:a\@b) (?-xism:a\@b) wrong: (?-xism:a) (?-xism:a\\\@b)

      Your regex without an escaped @ is equivalent to /^root:\s*(?!admin.here)/; that is unless @somewhere is defined within your program, of course.

    3. \s* can also match the empty string as the following code shows:
      > perl -l $_='root: admin@somewhere.here'; print '(',join(")(",/^(root:)(\s*)(?!admin\@somewhere\.here)/),')'; print qq[Postmatch contained "$'"]; __END__ (root:)() Postmatch contained " admin@somewhere.here"
    4. More oddly (to me), if I add a \s*$ at the end of the regex to match any whitespace between the address and the end of line, then no line matches!!!

      The reason for that is because you basically turned your regex into /^root:\s*$/.

    How should you do it? There are a couple of ways:

    Hardcoded: /^root:(?!\s*admin\@somewhere\.here)/ Variable: my $admin_email = 'admin@somewhere.here'; /^root:(?!\s*\Q$admin_email\E)/

    Note that if you want more constraints on your regex, you need to add them at the end of the zero-width negative lookahead assertion. Hope this helps.

    Update: Wow, guess that took me a lot longer than I thought it would. Everyone else already said what I did =/

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1