Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Using Look-ahead and Look-behind

by Anonymous Monk
on Jun 25, 2011 at 07:49 UTC ( #911357=note: print w/ replies, xml ) Need Help??


in reply to Using Look-ahead and Look-behind

The following is just not working. Basically, i want to match a value that has "equity",but NOT "private equity". The result must be items 1, 2, 4, 5. Please check this out:

my %hash = ( 1 => 'equity, private equity', 2 => 'equity', 3 => 'private equity', 4 => 'private equity,equity', 5 => 'private equity, equity', 6 => 'equity,private equity', 7 => 'private equity', 8 => 'mutual funds', 9 => 'cds' ); while (my ($k, $v) = each %hash) { next unless $v =~ m/(?!private\s+)equity/; printf("%d -> %s\n", $k, $v); }


Comment on Re: Using Look-ahead and Look-behind
Download Code
Re^2: Using Look-ahead and Look-behind
by Anonymous Monk on Jun 25, 2011 at 08:41 UTC

    Hi, new questions go in Seekers Of Perl Wisdom because

    Roy Johnson, whom you asked a question, hasn't been here in 6 weeks.

    You used code tags and put your code in between, that is awesome :)

    Welcome, see How do I post a question effectively?, Where should I post X?

    The regex which is not working for you, contains A zero-width negative look-ahead assertion, and like perlre#(?!pattern) says

    A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar". Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind.

    If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. Use look-behind instead (see below).

    So, use a look-behind

    But, that probably won't work either, because you can't have variable length length lookbehind , so you need to use a fixed width lookbehind.

    #!/usr/bin/perl -- use strict; use warnings; use Test::More qw' no_plan '; Main(@ARGV); exit(0); sub Main { my @yesWant = ( 'equity, private equity', 'equity', 'private equity,equity', 'private equity, equity', 'equity,private equity', ); my @notWant = ( 'private equity', 'private equity', 'mutual funds', 'cds', ); for my $not ( @notWant ){ ok( (not TestEquity($not)), "not '$not'" ); } for my $yes ( @yesWant ){ ok( TestEquity($yes), "yes '$yes'" ); } } sub TestEquity { return 1 if $_[0] =~ m/(?<!private\s)equity/; return 0; } __END__ $ prove -v pm911357.lookbehind.pl pm911357.lookbehind.pl .. ok 1 - not 'private equity' ok 2 - not 'private equity' ok 3 - not 'mutual funds' ok 4 - not 'cds' ok 5 - yes 'equity, private equity' ok 6 - yes 'equity' ok 7 - yes 'private equity,equity' ok 8 - yes 'private equity, equity' ok 9 - yes 'equity,private equity' 1..9 ok All tests successful. Files=1, Tests=9, 0 wallclock secs ( 0.06 usr + 0.01 sys = 0.08 CPU +) Result: PASS

    If fixed width lookbehind doesn't work for you, simply do TWO tests

      Nice. Very nice! You nailed. It's working. Thanks a bunch!

      Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's  \K assertion.

      Explanation:

      • Any 'equity' that is preceded by
        • either a character that is not a comma or whitespace, or
        • by the 'private' phrase
        FAILS and is skipped over (this test has first precedence);
      • Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS.

      >perl -wMstrict -le "use Test::More 'no_plan'; ;; for my $ar_vector ( [ YES => 'equity, private equity', ], [ YES => 'equity', ], [ no => 'private equity', ], [ YES => 'private equity,equity', ], [ YES => 'private equity, equity', ], [ no => 'equity,private equity', ], [ no => 'private equity', ], [ no => 'mutual funds', ], [ no => 'cds' ], ) { my ($expected, $string) = @$ar_vector; is match($string), $expected, qq{'$string'}; } ;; sub match { my ($string) = @_; ;; my $char_not_comma_or_space = qr{ [^,\s] }xms; my $private = qr{ private \s+ }xms; return 'YES' if $string =~ m{ (?: $char_not_comma_or_space | $private) equity (*SKIP)(*FAIL) | equity (?! , \S) }xms; return 'no', } " ok 1 - 'equity, private equity' ok 2 - 'equity' ok 3 - 'private equity' ok 4 - 'private equity,equity' ok 5 - 'private equity, equity' ok 6 - 'equity,private equity' ok 7 - 'private equity' ok 8 - 'mutual funds' ok 9 - 'cds' 1..9

        I have a dumb question.

        This code works well (THANKS Roy!) when looking for DNA string matches within a genome sequence but not when the * is changed to {50,100}

        e.g.
        /CCGG # Match starting at DNA sequence CCGG ( (?: (?!CCGG) # make sure we're not finding duplicates mid-stream . # accept any character )*? # any number of times BUT not greedily <==== ) AATT # and ending at AATT /x;

        versus

        /CCGG ( (?: (?!CCGG) . ){50,100}? # <==== ) AATT # and ending at AATT /x;

        This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT.

        A follow-up: The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation

        /CCGG ( (?: (?!AATT|CCGG) # <============= . # ){50,100}? # Here the "?" is not required but I'm anal ) # AATT # /x;

      I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

      sub TestEquity { return 1 if $_[0] =~ m/(?<!private).*equity/; return 0; }

        How about just (untested, and also case-sensitive):

        sub TestEquity { return $_[0] =~ m/private.*equity/ ? 0 : $_[0] =~ m/equity/ ? 1 : 0 ; }
        This could be slightly simplified if you can tolerate  "" (empty string) as a false flag in addition to or in place of 0.

        BTW: "I can't get it to work" is rarely helpful as a problem description. How about some input strings and actual versus expected output?

        Update: Changed  $_[0] =~ m/private.*equity/ to  $_[0] =~ m/private/ because it makes more sense.
        Update: ... and then changed it back to  $_[0] =~ m/private.*equity/ because it actually makes even more sense that way! (sigh)

        I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong?

        Impossible to say, although the anomalous one makes a good point

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://911357]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (21)
As of 2015-07-01 20:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (19 votes), past polls