in reply to Why do these regex variants behave as they do?
Why does regex 1, with the error, produce the desired output, while regex 2 fails to capture the terminal "g" in the desired link-text and regex3 fails almost entirely?
- / > ( (?: \s | \w )+ ) (?! <\td> ) /mx
This works because the error is irrelevant, and redundant, to what is captured.
Without it, the resultant regex / > ( (?: \s | \w )+ ) /mx still works exactly the same.
The only place in the string where '>' is immediately followed by a space (\s) or word (\w) character, starts at '>Moving'.
And the string of 1 or more space or word characters ends with the first '<'.
- / > ( (?: \s| \w ) + ) (?! <\/td> ) /mx says that the last captured character in the string must not be followed by </td>.
So the regex omits the 'g' which is followed by that string.
- / > (?: ( \s | \w ) + ) (?! <\/td> ) /mx only captures a single character because that's what it asks for.
( \s | \w ) says 'capture either a single space, or a single word character', so it does.
The presence of the quantifier '+' outside the capturing parens does not change that.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Why do these regex variants behave as they do?
by AnomalousMonk (Archbishop) on Oct 02, 2011 at 20:34 UTC | |
Re^2: Why do these regex variants behave as they do?
by ww (Archbishop) on Oct 02, 2011 at 22:05 UTC | |
by AnomalousMonk (Archbishop) on Oct 03, 2011 at 01:23 UTC |
In Section
Seekers of Perl Wisdom