Re^2: On zero-width negative lookahead assertions

Replies are listed 'Best First'.
How backtracking works in regular expressions by ikegami (Patriarch) on Sep 10, 2004 at 15:33 UTC
Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation. It has nothing to do with lookaheads, really. For example, let's look at `/^abbc/` The regexp can be read as: 1. Starting at the begining of the string 2. Match an 'a'. 3. Match as many 'b's as possible, but not matching any is ok. 4. Match a 'b'. 5. Match a 'c'. `Match against 'abbbbbbc' 01234567 1) ok! pos = 0. (zw) 2) ok! Found an 'a' at pos 0. pos = 1. 3) ok! Found 6 'b's at pos 1 through 6. pos = 7. 4) fail! Did not find a 'b' at pos 7. Backtrack! 3) ok! Found 5 'b's at pos 1 through 5. pos = 6. 4) ok! Found a 'b' at pos 6. pos = 7. 5) ok! Found a 'c' at pos 7. pos = 8. Match!` [download] Something similiar is occuring with your `/^root:\s(?!email)/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') Match!` [download] Now let's look at my solution `/^root:\s(?!email)\S/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. 5. Match a '\S'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') 5) fail! Did not find a '\S' at pos 5. Backtrack! Nothing more to try. No match!` [download] `Match against 'root: hisemail' 01234567890123 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla) (found 'hisemail') 5) ok! Found a '\S' at pos 6. pos = 6. Match!` [download] Backtracking means: (might not be an exhaustive list) In the case of the first rule Look for a match further on. In the case of a `` rule or `?` rule, try matching less. In the case of a `*?` rule or `??` rule, try matching more. In the case of a `\|` or `[]` rule, try matching the next choice. else, no match, so backtrack the last matching rule.	[reply] [d/l] [select]
Re^3: On zero-width negative lookahead assertions by Eimi Metamorphoumai (Deacon) on Sep 10, 2004 at 14:46 UTC
The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.	[reply]
Re^4: On zero-width negative lookahead assertions by Crian (Curate) on Sep 10, 2004 at 14:54 UTC
not exactly, not "followed by something that is not 'admin@somewhere.here'" it is "not followed by 'admin@somewhere.here' That is a difference, because it matches, if nothing follows at all.	[reply]
Re^4: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 15:32 UTC
Uhmmmmm... so the old adagio that "* is greedy" has an exception when zwnlaa come into play; I expected that the \s* had eat all the whitespace before the e-mail address. Ok. Now I am still to understand why that \S thing works... Oh, by the way, I am doing: `perl -i.bak -pe 'BEGIN { $status = 0 } /^root:(?!\sadmin\@somewhere\.here\s$)/ and $status = 1 ; END { exit $status }' aliases` and it seems to work great! Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply] [d/l]
Re^5: On zero-width negative lookahead assertions by Roy Johnson (Monsignor) on Sep 10, 2004 at 15:51 UTC
so the old adagio that "* is greedy" has an exception No, it is always greedy, but its greed is not absolute. It will eat as much as it can, but if that results in failure to match, then it will relinquish some of what it ate (try not to picture that) to allow the whole expression to match. Greed (and the anti-greed of minimal-matching) is tempered by persistence in regexen. Recently, hv wrote a tutorial explaining the rules the regex engine uses in trying to find a match. Caution: Contents may have been coded under pressure.	[reply]
Re^5: On zero-width negative lookahead assertions by ysth (Canon) on Sep 10, 2004 at 17:39 UTC
To make it behave as you describe, use (?>\s). The (?> ) says whatever is in it will match whatever it would match at that point in the string as an independent expression. So if matching all the spaces makes something later on fail, it won't backtrack and try having the \s match fewer spaces. (It's really time to unmark all of the extensions as experimental, except perhaps how variables in (?{}) and (??{}) bind.)	[reply]
Re^3: On zero-width negative lookahead assertions by Anonymous Monk on Sep 10, 2004 at 14:54 UTC
There is a non-space character after the \s. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used: `/root:(?>\s)(?!...)/` [download]	[reply] [d/l]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks