Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Using Look-ahead and Look-behind

by Anonymous Monk
on Jun 25, 2011 at 08:41 UTC ( #911361=note: print w/ replies, xml ) Need Help??


in reply to Re: Using Look-ahead and Look-behind
in thread Using Look-ahead and Look-behind

Hi, new questions go in Seekers Of Perl Wisdom because

Roy Johnson, whom you asked a question, hasn't been here in 6 weeks.

You used code tags and put your code in between, that is awesome :)

Welcome, see How do I post a question effectively?, Where should I post X?

The regex which is not working for you, contains A zero-width negative look-ahead assertion, and like perlre#(?!pattern) says

A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar". Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind.

If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. Use look-behind instead (see below).

So, use a look-behind

But, that probably won't work either, because you can't have variable length length lookbehind , so you need to use a fixed width lookbehind.

#!/usr/bin/perl -- use strict; use warnings; use Test::More qw' no_plan '; Main(@ARGV); exit(0); sub Main { my @yesWant = ( 'equity, private equity', 'equity', 'private equity,equity', 'private equity, equity', 'equity,private equity', ); my @notWant = ( 'private equity', 'private equity', 'mutual funds', 'cds', ); for my $not ( @notWant ){ ok( (not TestEquity($not)), "not '$not'" ); } for my $yes ( @yesWant ){ ok( TestEquity($yes), "yes '$yes'" ); } } sub TestEquity { return 1 if $_[0] =~ m/(?<!private\s)equity/; return 0; } __END__ $ prove -v pm911357.lookbehind.pl pm911357.lookbehind.pl .. ok 1 - not 'private equity' ok 2 - not 'private equity' ok 3 - not 'mutual funds' ok 4 - not 'cds' ok 5 - yes 'equity, private equity' ok 6 - yes 'equity' ok 7 - yes 'private equity,equity' ok 8 - yes 'private equity, equity' ok 9 - yes 'equity,private equity' 1..9 ok All tests successful. Files=1, Tests=9, 0 wallclock secs ( 0.06 usr + 0.01 sys = 0.08 CPU +) Result: PASS

If fixed width lookbehind doesn't work for you, simply do TWO tests


Comment on Re^2: Using Look-ahead and Look-behind
Select or Download Code
Re^3: Using Look-ahead and Look-behind
by Anonymous Monk on Jun 25, 2011 at 10:31 UTC
    Nice. Very nice! You nailed. It's working. Thanks a bunch!
Re^3: Using Look-ahead and Look-behind
by AnomalousMonk (Abbot) on Jun 25, 2011 at 19:51 UTC

    Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's  \K assertion.

    Explanation:

    • Any 'equity' that is preceded by
      • either a character that is not a comma or whitespace, or
      • by the 'private' phrase
      FAILS and is skipped over (this test has first precedence);
    • Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS.

    >perl -wMstrict -le "use Test::More 'no_plan'; ;; for my $ar_vector ( [ YES => 'equity, private equity', ], [ YES => 'equity', ], [ no => 'private equity', ], [ YES => 'private equity,equity', ], [ YES => 'private equity, equity', ], [ no => 'equity,private equity', ], [ no => 'private equity', ], [ no => 'mutual funds', ], [ no => 'cds' ], ) { my ($expected, $string) = @$ar_vector; is match($string), $expected, qq{'$string'}; } ;; sub match { my ($string) = @_; ;; my $char_not_comma_or_space = qr{ [^,\s] }xms; my $private = qr{ private \s+ }xms; return 'YES' if $string =~ m{ (?: $char_not_comma_or_space | $private) equity (*SKIP)(*FAIL) | equity (?! , \S) }xms; return 'no', } " ok 1 - 'equity, private equity' ok 2 - 'equity' ok 3 - 'private equity' ok 4 - 'private equity,equity' ok 5 - 'private equity, equity' ok 6 - 'equity,private equity' ok 7 - 'private equity' ok 8 - 'mutual funds' ok 9 - 'cds' 1..9

      I have a dumb question.

      This code works well (THANKS Roy!) when looking for DNA string matches within a genome sequence but not when the * is changed to {50,100}

      e.g.
      /CCGG # Match starting at DNA sequence CCGG ( (?: (?!CCGG) # make sure we're not finding duplicates mid-stream . # accept any character )*? # any number of times BUT not greedily <==== ) AATT # and ending at AATT /x;

      versus

      /CCGG ( (?: (?!CCGG) . ){50,100}? # <==== ) AATT # and ending at AATT /x;

      This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT.

      A follow-up: The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation

      /CCGG ( (?: (?!AATT|CCGG) # <============= . # ){50,100}? # Here the "?" is not required but I'm anal ) # AATT # /x;
        When * is changed to ^, it does not work either. Why are you changing it at all?

        But jokes aside: The *? matches after seeing the first occurence of AATT, so there are no dupes. The {50,100} must match at least 50 times, so if there is AATT after say 25th character, it cannot stop there and must match a larger string.

        Use YAPE::Regex::Explain to see what your regular expresions mean.

        Moreover, you are replying to a node that is not related to your question.

        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Great, please take it to Seekers Of Perl Wisdom, see Re^2: Using Look-ahead and Look-behind

        You forgot to include sample input, no matter, here are clues, run these and compare

        perl -Mre=debug -le " $_ = q/foobarfoodrinkAATT/; /foo((?:(?!bar).){1, +5}?)AATT/; "

        perl -Mre=debug -le " $_ = q/foobarfoodrinkAATTAATT/; /foo((?:(?!bar). +){6,10}?)AATT/; "

        50,100 means match at minimum 50 but no more than 100

        .* means match at least zero times

        in my short example, first AATT appears at 6, so it is included in the match

Re^3: Using Look-ahead and Look-behind
by heyjoec (Initiate) on Jun 19, 2014 at 11:18 UTC

    I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

    sub TestEquity { return 1 if $_[0] =~ m/(?<!private).*equity/; return 0; }

      How about just (untested, and also case-sensitive):

      sub TestEquity { return $_[0] =~ m/private.*equity/ ? 0 : $_[0] =~ m/equity/ ? 1 : 0 ; }
      This could be slightly simplified if you can tolerate  "" (empty string) as a false flag in addition to or in place of 0.

      BTW: "I can't get it to work" is rarely helpful as a problem description. How about some input strings and actual versus expected output?

      Update: Changed  $_[0] =~ m/private.*equity/ to  $_[0] =~ m/private/ because it makes more sense.
      Update: ... and then changed it back to  $_[0] =~ m/private.*equity/ because it actually makes even more sense that way! (sigh)

      I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

      Impossible to say, although the anomalous one makes a good point

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://911361]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2015-07-05 09:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (61 votes), past polls