Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Using negative lookahead

by tybalt89 (Curate)
on Oct 19, 2017 at 01:44 UTC ( #1201643=note: print w/replies, xml ) Need Help??


in reply to Using negative lookahead

#!/usr/bin/perl # http://perlmonks.org/?node_id=1201640 use strict; use warnings; my @cases = ( q{'abc"def'}, q{'abc'}, q{"abc"}, q{''}, q{'abc'def'}, # Want this to fail matching q{'This shouldn't match'}, # Want this to fail matching q{"This isn't a problem"}, q{"abc}, q{abc"}, q{abc}, q{'abc"}, q{'ab''}, # Want this to fail matching ); strip_quotes($_) for @cases; # If we can remove a matching pair of single or double quotes from # a string, without the quote symbol also appearing within the string, # do so. Otherwise don't change the string. sub strip_quotes { my $line = shift; print "\n$line\n"; # NO NEGATIVE LOOKAHEAD # This works except it allows an embedded delimiter if ( $line =~ m{^ # anchor ( # capture delimiter in pos 1 ["'] # delim is single or double quote ) (.*) # anything \g1$}x # finally, the delim ) { print " 1- Got a match: delimiter was {$1}, body was {$2}\n"; } else { print " 1- No match.\n"; } # ATTEMPTING NEGATIVE LOOKAHEAD # This should fail if the delimiter is found in non-terminal pos. if ( $line =~ m{^ # anchor ( # capture delimiter in pos 1 ["'] # delim is single or double quote ) #(.*(?!\g1)) # neg lookahead for delim ((?!.*\g1.).*) # neg lookahead for delim \g1$}x # finally, the delim ) { print " 2- Got a match: delimiter was {$1}, body was {$2}\n"; } else { print " 2- No match.\n"; } }

Replies are listed 'Best First'.
Re^2: Using negative lookahead
by ibm1620 (Monk) on Oct 19, 2017 at 22:57 UTC
    I'm still having trouble grasping this. Let me try and restate what your solution is doing:
    ^ ( # capture delimiter in pos 1 ["'] # delim is single or double quote ) ((?!.*\g1.).*) # neg lookahead for delim \g1$ # finally, the delim
    Pick up the delimiter character from pos 1, if there is one (otherwise fail)

    Capture this in $2:

    -- As many characters as possible (could be none)

    -- that are NOT followed by the delimiter character and another character (which is the case of an embedded delimiter, which should be a fail)

    -- and are followed by zero or more characters.

    Beyond this should be the delimiter, at the end of the string.

    I'm confused by the '.*' that's the last part of capture group $2. Why is it needed at all? Hasn't the preceding already consumed the payload that I want? I guess I'm not understanding the precise role that the negative lookahead is playing. Is it simply saying what the string picked up by .* must look like? Is there any significance to (?!.*\g1.) appearing *before*, rather than after, .* ?

      Zero-width negative lookaheads do not advance the match point -> "zero-width".

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1201643]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2017-12-14 23:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (414 votes). Check out past polls.

    Notices?