http://www.perlmonks.org?node_id=941881

svenXY has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm confused with a negative lookahead. Here's some example code. I want it to match on $needs and to not match on $has. The examples 3 and 4 are only as a reference to a fixed string, but I basically want ^($start(.*?))(?!$end)$. Any ideas?

#!/usr/bin/perl use strict; use warnings; my $start = "log4j.rootLogger="; my $startlong = "log4j.rootLogger=INFO, FILE"; my $end = ", SYSLOG"; my $needs = 'log4j.rootLogger=INFO, FILE'; my $has = $needs . ', SYSLOG'; print "needs_matched $needs\n" if $needs =~ /^($start(\w+(,\s)?)+)(?!$ +end)$/; print "has_matched $has\n" if $has =~ /^($start(\w+(,\s)?)+)(?!$end)$/ +; print "needs_matched $needs\n" if $needs =~ /^(log4j.rootLogger=INFO, +FILE)(?!, SYSLOG)$/; print "has_matched $has\n" if $has =~ /^(log4j.rootLogger=INFO, FILE)( +?!, SYSLOG)$/;

Regards,
svenXY

Replies are listed 'Best First'.
Re: regex: negative lookahead
by kennethk (Abbot) on Dec 05, 2011 at 15:54 UTC
    If you take a look at what you have in $1 after your failed miss for 2, the issue will become more apparent, I think; print $1; yields log4j.rootLogger=INFO, FILE, SYSLOG. Essentially, you have already consumed ", SYSLOG" by the time you get to the end of the string, so clearly that bit does not follow. A negative look-behind will actually yield what you intended:

    print "has_matched $has\n" if $has =~ /^($start(\w+(,\s)?)+)(?<!$end)$/;

      Hi,
      ++kennethk - thanks for clarifying. I'm still a little bit confused by the difference between (?!...) and (?<!...) though.
      Regards,
      svenXY
        The difference is a lookahead versus a lookbehind. If you ignore the lookahead/lookbehind because they are zero width (don't consume characters), you would expect /^($start(\w+(,\s)?)+)$/ to grab an entire string between start and finish. When you get to the part of the regular expression where you have your assertion, the cursor is here:
        log4j.rootLogger=INFO, FILE, SYSLOG ^
        If you look ahead of this position, there is nothing, which clearly passes the negative lookahead check. The offending string is behind your position, thus you need a negative lookbehind.
Re: regex: negative lookahead
by Khen1950fx (Canon) on Dec 05, 2011 at 20:50 UTC
    I've never had a the need to use a lookaround, but your question piqued my curiosity. As an experiment, I used Regexp::Assemble to rework your example because it just does the right thing with assertions, lookaheads, and lookbehinds.
    #! /usr/bin/perl -slw use strict; use Regexp::Assemble; my $needs = Regexp::Assemble->new->add( qw[ FILE INFO ] ); my $has = Regexp::Assemble->new->add( qw[ FILE INFO SYSLOG ] ); while( defined( $_ = <DATA> )) { chomp; if( /($needs)/ ) { print $needs->as_string; } } while( defined( $_ = <DATA> )) { chomp; if( /($has)/ ) { print $has->as_string } } __DATA__ FILE INFO SYSLOG
    I believe, if I"ve understood you correctly, that it works the way that it should. What do you think?
Re: regex: negative lookahead
by vinian (Beadle) on Dec 06, 2011 at 05:01 UTC

    i don't think there is something to do with regex negative lookahead, the problem lies in the
    (\w+(,\s)?)+
    it will match "INFO, FILE, SYSLOG", but not "INFO, FILE".
    "+" is greedy, it eaten whatever match by "( ... )" . I don't how to make the above regex match less, but i use this
    print "has_matched $has\n"    if $has   =~ /^($start(\w+(,\s)?)+?)$(?<!$end)/;
    and it will not match $has.