Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Using Look-ahead and Look-behind

by Anonymous Monk
on Jun 25, 2011 at 07:49 UTC ( [id://911357]=note: print w/replies, xml ) Need Help??


in reply to Using Look-ahead and Look-behind

The following is just not working. Basically, i want to match a value that has "equity",but NOT "private equity". The result must be items 1, 2, 4, 5. Please check this out:
my %hash = ( 1 => 'equity, private equity', 2 => 'equity', 3 => 'private equity', 4 => 'private equity,equity', 5 => 'private equity, equity', 6 => 'equity,private equity', 7 => 'private equity', 8 => 'mutual funds', 9 => 'cds' ); while (my ($k, $v) = each %hash) { next unless $v =~ m/(?!private\s+)equity/; printf("%d -> %s\n", $k, $v); }

Replies are listed 'Best First'.
Re^2: Using Look-ahead and Look-behind
by Anonymous Monk on Jun 25, 2011 at 08:41 UTC

    Hi, new questions go in Seekers Of Perl Wisdom because

    Roy Johnson, whom you asked a question, hasn't been here in 6 weeks.

    You used code tags and put your code in between, that is awesome :)

    Welcome, see How do I post a question effectively?, Where should I post X?

    The regex which is not working for you, contains A zero-width negative look-ahead assertion, and like perlre#(?!pattern) says

    A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar". Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind.

    If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. Use look-behind instead (see below).

    So, use a look-behind

    But, that probably won't work either, because you can't have variable length length lookbehind , so you need to use a fixed width lookbehind.

    #!/usr/bin/perl -- use strict; use warnings; use Test::More qw' no_plan '; Main(@ARGV); exit(0); sub Main { my @yesWant = ( 'equity, private equity', 'equity', 'private equity,equity', 'private equity, equity', 'equity,private equity', ); my @notWant = ( 'private equity', 'private equity', 'mutual funds', 'cds', ); for my $not ( @notWant ){ ok( (not TestEquity($not)), "not '$not'" ); } for my $yes ( @yesWant ){ ok( TestEquity($yes), "yes '$yes'" ); } } sub TestEquity { return 1 if $_[0] =~ m/(?<!private\s)equity/; return 0; } __END__ $ prove -v pm911357.lookbehind.pl pm911357.lookbehind.pl .. ok 1 - not 'private equity' ok 2 - not 'private equity' ok 3 - not 'mutual funds' ok 4 - not 'cds' ok 5 - yes 'equity, private equity' ok 6 - yes 'equity' ok 7 - yes 'private equity,equity' ok 8 - yes 'private equity, equity' ok 9 - yes 'equity,private equity' 1..9 ok All tests successful. Files=1, Tests=9, 0 wallclock secs ( 0.06 usr + 0.01 sys = 0.08 CPU +) Result: PASS

    If fixed width lookbehind doesn't work for you, simply do TWO tests

      Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's  \K assertion.

      Explanation:

      • Any 'equity' that is preceded by
        • either a character that is not a comma or whitespace, or
        • by the 'private' phrase
        FAILS and is skipped over (this test has first precedence);
      • Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS.

      >perl -wMstrict -le "use Test::More 'no_plan'; ;; for my $ar_vector ( [ YES => 'equity, private equity', ], [ YES => 'equity', ], [ no => 'private equity', ], [ YES => 'private equity,equity', ], [ YES => 'private equity, equity', ], [ no => 'equity,private equity', ], [ no => 'private equity', ], [ no => 'mutual funds', ], [ no => 'cds' ], ) { my ($expected, $string) = @$ar_vector; is match($string), $expected, qq{'$string'}; } ;; sub match { my ($string) = @_; ;; my $char_not_comma_or_space = qr{ [^,\s] }xms; my $private = qr{ private \s+ }xms; return 'YES' if $string =~ m{ (?: $char_not_comma_or_space | $private) equity (*SKIP)(*FAIL) | equity (?! , \S) }xms; return 'no', } " ok 1 - 'equity, private equity' ok 2 - 'equity' ok 3 - 'private equity' ok 4 - 'private equity,equity' ok 5 - 'private equity, equity' ok 6 - 'equity,private equity' ok 7 - 'private equity' ok 8 - 'mutual funds' ok 9 - 'cds' 1..9

        I have a dumb question.

        This code works well (THANKS Roy!) when looking for DNA string matches within a genome sequence but not when the * is changed to {50,100}

        e.g.
        /CCGG # Match starting at DNA sequence CCGG ( (?: (?!CCGG) # make sure we're not finding duplicates mid-stream . # accept any character )*? # any number of times BUT not greedily <==== ) AATT # and ending at AATT /x;

        versus

        /CCGG ( (?: (?!CCGG) . ){50,100}? # <==== ) AATT # and ending at AATT /x;

        This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT.

        A follow-up: The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation

        /CCGG ( (?: (?!AATT|CCGG) # <============= . # ){50,100}? # Here the "?" is not required but I'm anal ) # AATT # /x;
      Nice. Very nice! You nailed. It's working. Thanks a bunch!

      I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

      sub TestEquity { return 1 if $_[0] =~ m/(?<!private).*equity/; return 0; }

        How about just (untested, and also case-sensitive):

        sub TestEquity { return $_[0] =~ m/private.*equity/ ? 0 : $_[0] =~ m/equity/ ? 1 : 0 ; }
        This could be slightly simplified if you can tolerate  "" (empty string) as a false flag in addition to or in place of 0.

        BTW: "I can't get it to work" is rarely helpful as a problem description. How about some input strings and actual versus expected output?

        Update: Changed  $_[0] =~ m/private.*equity/ to  $_[0] =~ m/private/ because it makes more sense.
        Update: ... and then changed it back to  $_[0] =~ m/private.*equity/ because it actually makes even more sense that way! (sigh)

        I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

        Impossible to say, although the anomalous one makes a good point

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://911357]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2025-07-13 10:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.