Re^3: Using Look-ahead and Look-behind

in reply to Re^2: Using Look-ahead and Look-behind
in thread Using Look-ahead and Look-behind

Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's \K assertion.

Explanation:

Any 'equity' that is preceded by
- either a character that is not a comma or whitespace, or
- by the 'private' phrase
FAILS and is skipped over (this test has first precedence);
Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS.

>perl -wMstrict -le
"use Test::More 'no_plan';
 ;;
 for my $ar_vector (
   [ YES => 'equity, private equity', ],
   [ YES => 'equity',                 ],
   [ no  => 'private equity',         ],
   [ YES => 'private equity,equity',  ],
   [ YES => 'private equity, equity', ],
   [ no  => 'equity,private equity',  ],
   [ no  => 'private equity',         ],
   [ no  => 'mutual funds',           ],
   [ no  => 'cds'                     ],
   ) {
   my ($expected, $string) = @$ar_vector;
   is match($string), $expected, qq{'$string'};
   }
 ;;
 sub match {
   my ($string) = @_;
   ;;
   my $char_not_comma_or_space = qr{ [^,\s]      }xms;
   my $private                 = qr{ private \s+ }xms;
   return 'YES' if $string =~
     m{ (?: $char_not_comma_or_space | $private) equity (*SKIP)(*FAIL)
        |
        equity (?! , \S)
      }xms;
   return 'no',
   }
"
ok 1 - 'equity, private equity'
ok 2 - 'equity'
ok 3 - 'private equity'
ok 4 - 'private equity,equity'
ok 5 - 'private equity, equity'
ok 6 - 'equity,private equity'
ok 7 - 'private equity'
ok 8 - 'mutual funds'
ok 9 - 'cds'
1..9
[download]

Comment on Re^3: Using Look-ahead and Look-behind Select or Download Code

Replies are listed 'Best First'.
Re^4: Using Look-ahead and Look-behind by JohnN (Initiate) on Oct 15, 2012 at 15:09 UTC
I have a dumb question. This code works well (THANKS Roy!) when looking for DNA string matches within a genome sequence but not when the * is changed to {50,100} e.g. `/CCGG # Match starting at DNA sequence CCGG ( (?: (?!CCGG) # make sure we're not finding duplicates mid-stream . # accept any character )? # any number of times BUT not greedily <==== ) AATT # and ending at AATT /x;` [download] versus `/CCGG ( (?: (?!CCGG) . ){50,100}? # <==== ) AATT # and ending at AATT /x;` [download] This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT. A follow-up:* The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation `/CCGG ( (?: (?!AATT\|CCGG) # <============= . # ){50,100}? # Here the "?" is not required but I'm anal ) # AATT # /x;` [download]	[reply] [d/l] [select]
Re^5: Using Look-ahead and Look-behind by choroba (Cardinal) on Oct 15, 2012 at 15:25 UTC
When `` is changed to `^`, it does not work either. Why are you changing it at all? But jokes aside: The `?` matches after seeing the first occurence of AATT, so there are no dupes. The `{50,100}` must match at least 50 times, so if there is AATT after say 25^th character, it cannot stop there and must match a larger string. Use YAPE::Regex::Explain to see what your regular expresions mean. Moreover, you are replying to a node that is not related to your question. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^5: Using Look-ahead and Look-behind by Anonymous Monk on Oct 15, 2012 at 15:28 UTC
Great, please take it to Seekers Of Perl Wisdom, see Re^2: Using Look-ahead and Look-behind You forgot to include sample input, no matter, here are clues, run these and compare `perl -Mre=debug -le " $_ = q/foobarfoodrinkAATT/; /foo((?:(?!bar).){1, +5}?)AATT/; "` [download] `perl -Mre=debug -le " $_ = q/foobarfoodrinkAATTAATT/; /foo((?:(?!bar). +){6,10}?)AATT/; "` [download] 50,100 means match at minimum 50 but no more than 100 .* means match at least zero times in my short example, first AATT appears at 6, so it is included in the match	[reply] [d/l] [select]

In Section Tutorials