Re^2: Why do these regex variants behave as they do?

by AnomalousMonk (Chancellor)
on Oct 02, 2011 at 20:34 UTC

in reply to Re: Why do these regex variants behave as they do?
in thread Why do these regex variants behave as they do?

Further to BrowserUk's reply, it may be helpful, particularly in the third example, to see the entirety of what is matched (seen in $&, a naughty fellow whom we normally shun) versus what is captured (to $1 from the first capture group).

Note: in the examples below, I use the character set  [\s \w] as equivalent to  (?:\s|\w) to emphasize the character-set nature of the grouping. The presence of an extra space in the character set is used in an attempt, possibly ill-conceived, to get everything to 'line up right'; the space is redundant because it is included in the  \s 'whitespace' set.

Note also that I have used a simplified string in the examples, and the 'closing' tag is just '<t>' and '<X>' is the incorrect closing tag (the forward/backward slashes just confuse the issue).

>perl -wMstrict -le "my $s = '<t>Abcd efgh ijK<t>'; ;; print qq{1a '$&' ($1)} if $s =~ m{ > ( (?:\s|\w)+ ) (?!<X>) }xms; print qq{1b '$&' ($1)} if $s =~ m{ > ( [\s \w]+ ) (?!<X>) }xms; ;; print qq{2a '$&' ($1)} if $s =~ m{ > ( (?:\s|\w)+ ) (?!<t>) }xms; print qq{2b '$&' ($1)} if $s =~ m{ > ( [\s \w]+ ) (?!<t>) }xms; ;; print qq{3a '$&' ($1)} if $s =~ m{ > (?: ( \s|\w )+ ) (?!<t>) }xms; print qq{3b '$&' ($1)} if $s =~ m{ > (?: ([\s \w])+ ) (?!<t>) }xms; " 1a '>Abcd efgh ijK' (Abcd efgh ijK) 1b '>Abcd efgh ijK' (Abcd efgh ijK) 2a '>Abcd efgh ij' (Abcd efgh ij) 2b '>Abcd efgh ij' (Abcd efgh ij) 3a '>Abcd efgh ij' (j) 3b '>Abcd efgh ij' (j)

Update: Changed example code to print $& first (in single-quotes), then $1 (in parentheses, symbolic of capture) to match the order of their discussion in the text.

