Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Using Look-ahead and Look-behind

by Anonymous Monk
on Jun 25, 2011 at 08:41 UTC ( #911361=note: print w/ replies, xml ) Need Help??


in reply to Re: Using Look-ahead and Look-behind
in thread Using Look-ahead and Look-behind

Hi, new questions go in Seekers Of Perl Wisdom because

Roy Johnson, whom you asked a question, hasn't been here in 6 weeks.

You used code tags and put your code in between, that is awesome :)

Welcome, see How do I post a question effectively?, Where should I post X?

The regex which is not working for you, contains A zero-width negative look-ahead assertion, and like perlre#(?!pattern) says

A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar". Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind.

If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. Use look-behind instead (see below).

So, use a look-behind

But, that probably won't work either, because you can't have variable length length lookbehind , so you need to use a fixed width lookbehind.

#!/usr/bin/perl -- use strict; use warnings; use Test::More qw' no_plan '; Main(@ARGV); exit(0); sub Main { my @yesWant = ( 'equity, private equity', 'equity', 'private equity,equity', 'private equity, equity', 'equity,private equity', ); my @notWant = ( 'private equity', 'private equity', 'mutual funds', 'cds', ); for my $not ( @notWant ){ ok( (not TestEquity($not)), "not '$not'" ); } for my $yes ( @yesWant ){ ok( TestEquity($yes), "yes '$yes'" ); } } sub TestEquity { return 1 if $_[0] =~ m/(?<!private\s)equity/; return 0; } __END__ $ prove -v pm911357.lookbehind.pl pm911357.lookbehind.pl .. ok 1 - not 'private equity' ok 2 - not 'private equity' ok 3 - not 'mutual funds' ok 4 - not 'cds' ok 5 - yes 'equity, private equity' ok 6 - yes 'equity' ok 7 - yes 'private equity,equity' ok 8 - yes 'private equity, equity' ok 9 - yes 'equity,private equity' 1..9 ok All tests successful. Files=1, Tests=9, 0 wallclock secs ( 0.06 usr + 0.01 sys = 0.08 CPU +) Result: PASS

If fixed width lookbehind doesn't work for you, simply do TWO tests


Comment on Re^2: Using Look-ahead and Look-behind
Select or Download Code
Re^3: Using Look-ahead and Look-behind
by Anonymous Monk on Jun 25, 2011 at 10:31 UTC
    Nice. Very nice! You nailed. It's working. Thanks a bunch!
Re^3: Using Look-ahead and Look-behind
by AnomalousMonk (Monsignor) on Jun 25, 2011 at 19:51 UTC

    Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's  \K assertion.

    Explanation:

    • Any 'equity' that is preceded by
      • either a character that is not a comma or whitespace, or
      • by the 'private' phrase
      FAILS and is skipped over (this test has first precedence);
    • Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS.

    >perl -wMstrict -le "use Test::More 'no_plan'; ;; for my $ar_vector ( [ YES => 'equity, private equity', ], [ YES => 'equity', ], [ no => 'private equity', ], [ YES => 'private equity,equity', ], [ YES => 'private equity, equity', ], [ no => 'equity,private equity', ], [ no => 'private equity', ], [ no => 'mutual funds', ], [ no => 'cds' ], ) { my ($expected, $string) = @$ar_vector; is match($string), $expected, qq{'$string'}; } ;; sub match { my ($string) = @_; ;; my $char_not_comma_or_space = qr{ [^,\s] }xms; my $private = qr{ private \s+ }xms; return 'YES' if $string =~ m{ (?: $char_not_comma_or_space | $private) equity (*SKIP)(*FAIL) | equity (?! , \S) }xms; return 'no', } " ok 1 - 'equity, private equity' ok 2 - 'equity' ok 3 - 'private equity' ok 4 - 'private equity,equity' ok 5 - 'private equity, equity' ok 6 - 'equity,private equity' ok 7 - 'private equity' ok 8 - 'mutual funds' ok 9 - 'cds' 1..9

      I have a dumb question.

      This code works well (THANKS Roy!) when looking for DNA string matches within a genome sequence but not when the * is changed to {50,100}

      e.g.
      /CCGG # Match starting at DNA sequence CCGG ( (?: (?!CCGG) # make sure we're not finding duplicates mid-stream . # accept any character )*? # any number of times BUT not greedily <==== ) AATT # and ending at AATT /x;

      versus

      /CCGG ( (?: (?!CCGG) . ){50,100}? # <==== ) AATT # and ending at AATT /x;

      This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT.

      A follow-up: The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation

      /CCGG ( (?: (?!AATT|CCGG) # <============= . # ){50,100}? # Here the "?" is not required but I'm anal ) # AATT # /x;
        When * is changed to ^, it does not work either. Why are you changing it at all?

        But jokes aside: The *? matches after seeing the first occurence of AATT, so there are no dupes. The {50,100} must match at least 50 times, so if there is AATT after say 25th character, it cannot stop there and must match a larger string.

        Use YAPE::Regex::Explain to see what your regular expresions mean.

        Moreover, you are replying to a node that is not related to your question.

        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Great, please take it to Seekers Of Perl Wisdom, see Re^2: Using Look-ahead and Look-behind

        You forgot to include sample input, no matter, here are clues, run these and compare

        perl -Mre=debug -le " $_ = q/foobarfoodrinkAATT/; /foo((?:(?!bar).){1, +5}?)AATT/; "

        perl -Mre=debug -le " $_ = q/foobarfoodrinkAATTAATT/; /foo((?:(?!bar). +){6,10}?)AATT/; "

        50,100 means match at minimum 50 but no more than 100

        .* means match at least zero times

        in my short example, first AATT appears at 6, so it is included in the match

Re^3: Using Look-ahead and Look-behind
by heyjoec (Initiate) on Jun 19, 2014 at 11:18 UTC

    I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

    sub TestEquity { return 1 if $_[0] =~ m/(?<!private).*equity/; return 0; }

      How about just (untested, and also case-sensitive):

      sub TestEquity { return $_[0] =~ m/private.*equity/ ? 0 : $_[0] =~ m/equity/ ? 1 : 0 ; }
      This could be slightly simplified if you can tolerate  "" (empty string) as a false flag in addition to or in place of 0.

      BTW: "I can't get it to work" is rarely helpful as a problem description. How about some input strings and actual versus expected output?

      Update: Changed  $_[0] =~ m/private.*equity/ to  $_[0] =~ m/private/ because it makes more sense.
      Update: ... and then changed it back to  $_[0] =~ m/private.*equity/ because it actually makes even more sense that way! (sigh)

      I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

      Impossible to say, although the anomalous one makes a good point

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://911361]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2014-07-23 21:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (152 votes), past polls