Re: Using Look-ahead and Look-behind

Replies are listed 'Best First'.
Re^2: Using Look-ahead and Look-behind by Anonymous Monk on Jun 25, 2011 at 08:41 UTC
Hi, new questions go in Seekers Of Perl Wisdom because Roy Johnson, whom you asked a question, hasn't been here in 6 weeks. You used code tags and put your code in between, that is awesome :) Welcome, see How do I post a question effectively?, Where should I post X? The regex which is not working for you, contains A zero-width negative look-ahead assertion, and like perlre#(?!pattern) says A zero-width negative look-ahead assertion. For example `/foo(?!bar)/` matches any occurrence of "foo" that isn't followed by "bar". Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind. If you are looking for a "bar" that isn't preceded by a "foo", `/(?!foo)bar/` will not do what you want. That's because the `(?!foo)` is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. Use look-behind instead (see below). So, use a look-behind But, that probably won't work either, because you can't have variable length length lookbehind , so you need to use a fixed width lookbehind. #!/usr/bin/perl -- use strict; use warnings; use Test::More qw' no_plan '; Main(@ARGV); exit(0); sub Main { my @yesWant = ( 'equity, private equity', 'equity', 'private equity,equity', 'private equity, equity', 'equity,private equity', ); my @notWant = ( 'private equity', 'private equity', 'mutual funds', 'cds', ); for my $not ( @notWant ){ ok( (not TestEquity($not)), "not '$not'" ); } for my $yes ( @yesWant ){ ok( TestEquity($yes), "yes '$yes'" ); } } sub TestEquity { return 1 if $_[0] =~ m/(?<!private\s)equity/; return 0; } __END__ $ prove -v pm911357.lookbehind.pl pm911357.lookbehind.pl .. ok 1 - not 'private equity' ok 2 - not 'private equity' ok 3 - not 'mutual funds' ok 4 - not 'cds' ok 5 - yes 'equity, private equity' ok 6 - yes 'equity' ok 7 - yes 'private equity,equity' ok 8 - yes 'private equity, equity' ok 9 - yes 'equity,private equity' 1..9 ok All tests successful. Files=1, Tests=9, 0 wallclock secs ( 0.06 usr + 0.01 sys = 0.08 CPU +) Result: PASS [download] If fixed width lookbehind doesn't work for you, simply do TWO tests	[reply] [d/l] [select]
Re^3: Using Look-ahead and Look-behind by AnomalousMonk (Archbishop) on Jun 25, 2011 at 19:51 UTC
Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's `\K` assertion. Explanation: Any 'equity' that is preceded by either a character that is not a comma or whitespace, or by the 'private' phrase FAILS and is skipped over (this test has first precedence); Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS. >perl -wMstrict -le "use Test::More 'no_plan'; ;; for my $ar_vector ( [ YES => 'equity, private equity', ], [ YES => 'equity', ], [ no => 'private equity', ], [ YES => 'private equity,equity', ], [ YES => 'private equity, equity', ], [ no => 'equity,private equity', ], [ no => 'private equity', ], [ no => 'mutual funds', ], [ no => 'cds' ], ) { my ($expected, $string) = @$ar_vector; is match($string), $expected, qq{'$string'}; } ;; sub match { my ($string) = @_; ;; my $char_not_comma_or_space = qr{ [^,\s] }xms; my $private = qr{ private \s+ }xms; return 'YES' if $string =~ m{ (?: $char_not_comma_or_space \| $private) equity (SKIP)(FAIL) \| equity (?! , \S) }xms; return 'no', } " ok 1 - 'equity, private equity' ok 2 - 'equity' ok 3 - 'private equity' ok 4 - 'private equity,equity' ok 5 - 'private equity, equity' ok 6 - 'equity,private equity' ok 7 - 'private equity' ok 8 - 'mutual funds' ok 9 - 'cds' 1..9 [download]	[reply] [d/l] [select]
Re^4: Using Look-ahead and Look-behind by JohnN (Initiate) on Oct 15, 2012 at 15:09 UTC
I have a dumb question. This code works well (THANKS Roy!) when looking for DNA string matches within a genome sequence but not when the * is changed to {50,100} e.g. `/CCGG # Match starting at DNA sequence CCGG ( (?: (?!CCGG) # make sure we're not finding duplicates mid-stream . # accept any character )? # any number of times BUT not greedily <==== ) AATT # and ending at AATT /x;` [download] versus `/CCGG ( (?: (?!CCGG) . ){50,100}? # <==== ) AATT # and ending at AATT /x;` [download] This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT. A follow-up:* The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation `/CCGG ( (?: (?!AATT\|CCGG) # <============= . # ){50,100}? # Here the "?" is not required but I'm anal ) # AATT # /x;` [download]	[reply] [d/l] [select]
Re^5: Using Look-ahead and Look-behind by choroba (Cardinal) on Oct 15, 2012 at 15:25 UTC
Re^5: Using Look-ahead and Look-behind by Anonymous Monk on Oct 15, 2012 at 15:28 UTC
Re^3: Using Look-ahead and Look-behind by Anonymous Monk on Jun 25, 2011 at 10:31 UTC
Nice. Very nice! You nailed. It's working. Thanks a bunch!	[reply]
Re^3: Using Look-ahead and Look-behind by heyjoec (Initiate) on Jun 19, 2014 at 11:18 UTC
I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong? `sub TestEquity { return 1 if $_[0] =~ m/(?<!private).*equity/; return 0; }` [download]	[reply] [d/l]
Re^4: Using Look-ahead and Look-behind by AnomalousMonk (Archbishop) on Jun 19, 2014 at 12:09 UTC
How about just (untested, and also case-sensitive): `sub TestEquity { return $_[0] =~ m/private.equity/ ? 0 : $_[0] =~ m/equity/ ? 1 : 0 ; }` [download] This could be slightly simplified if you can tolerate `""` (empty string) as a false flag in addition to or in place of 0. BTW: "I can't get it to work" is rarely helpful as a problem description. How about some input strings and actual versus expected output? Update:* Changed `$_[0] =~ m/private.equity/` to `$_[0] =~ m/private/` because it makes more sense. Update:* ... and then changed it back to `$_[0] =~ m/private.equity/` because it actually makes even more sense that way! (sigh)*	[reply] [d/l] [select]
Re^4: Using Look-ahead and Look-behind by Anonymous Monk on Jun 19, 2014 at 23:13 UTC
I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong? Impossible to say, although the anomalous one makes a good point	[reply]


Problems? Is your data what you think it is?
	PerlMonks