Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

regex: negative lookahead

by svenXY (Deacon)
on Dec 05, 2011 at 15:47 UTC ( #941881=perlquestion: print w/ replies, xml ) Need Help??
svenXY has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm confused with a negative lookahead. Here's some example code. I want it to match on $needs and to not match on $has. The examples 3 and 4 are only as a reference to a fixed string, but I basically want ^($start(.*?))(?!$end)$. Any ideas?

#!/usr/bin/perl use strict; use warnings; my $start = "log4j.rootLogger="; my $startlong = "log4j.rootLogger=INFO, FILE"; my $end = ", SYSLOG"; my $needs = 'log4j.rootLogger=INFO, FILE'; my $has = $needs . ', SYSLOG'; print "needs_matched $needs\n" if $needs =~ /^($start(\w+(,\s)?)+)(?!$ +end)$/; print "has_matched $has\n" if $has =~ /^($start(\w+(,\s)?)+)(?!$end)$/ +; print "needs_matched $needs\n" if $needs =~ /^(log4j.rootLogger=INFO, +FILE)(?!, SYSLOG)$/; print "has_matched $has\n" if $has =~ /^(log4j.rootLogger=INFO, FILE)( +?!, SYSLOG)$/;

Regards,
svenXY

Comment on regex: negative lookahead
Select or Download Code
Replies are listed 'Best First'.
Re: regex: negative lookahead
by kennethk (Abbot) on Dec 05, 2011 at 15:54 UTC
    If you take a look at what you have in $1 after your failed miss for 2, the issue will become more apparent, I think; print $1; yields log4j.rootLogger=INFO, FILE, SYSLOG. Essentially, you have already consumed ", SYSLOG" by the time you get to the end of the string, so clearly that bit does not follow. A negative look-behind will actually yield what you intended:

    print "has_matched $has\n" if $has =~ /^($start(\w+(,\s)?)+)(?<!$end)$/;

      Hi,
      ++kennethk - thanks for clarifying. I'm still a little bit confused by the difference between (?!...) and (?<!...) though.
      Regards,
      svenXY
        The difference is a lookahead versus a lookbehind. If you ignore the lookahead/lookbehind because they are zero width (don't consume characters), you would expect /^($start(\w+(,\s)?)+)$/ to grab an entire string between start and finish. When you get to the part of the regular expression where you have your assertion, the cursor is here:
        log4j.rootLogger=INFO, FILE, SYSLOG ^
        If you look ahead of this position, there is nothing, which clearly passes the negative lookahead check. The offending string is behind your position, thus you need a negative lookbehind.
Re: regex: negative lookahead
by Khen1950fx (Canon) on Dec 05, 2011 at 20:50 UTC
    I've never had a the need to use a lookaround, but your question piqued my curiosity. As an experiment, I used Regexp::Assemble to rework your example because it just does the right thing with assertions, lookaheads, and lookbehinds.
    #! /usr/bin/perl -slw use strict; use Regexp::Assemble; my $needs = Regexp::Assemble->new->add( qw[ FILE INFO ] ); my $has = Regexp::Assemble->new->add( qw[ FILE INFO SYSLOG ] ); while( defined( $_ = <DATA> )) { chomp; if( /($needs)/ ) { print $needs->as_string; } } while( defined( $_ = <DATA> )) { chomp; if( /($has)/ ) { print $has->as_string } } __DATA__ FILE INFO SYSLOG
    I believe, if I"ve understood you correctly, that it works the way that it should. What do you think?
Re: regex: negative lookahead
by vinian (Beadle) on Dec 06, 2011 at 05:01 UTC

    i don't think there is something to do with regex negative lookahead, the problem lies in the
    (\w+(,\s)?)+
    it will match "INFO, FILE, SYSLOG", but not "INFO, FILE".
    "+" is greedy, it eaten whatever match by "( ... )" . I don't how to make the above regex match less, but i use this
    print "has_matched $has\n"    if $has   =~ /^($start(\w+(,\s)?)+?)$(?<!$end)/;
    and it will not match $has.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://941881]
Approved by kennethk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (13)
As of 2015-07-28 21:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (259 votes), past polls