ww:
Further to BrowserUk's reply, it may be helpful, particularly in the third example, to see the entirety of what is matched (seen in $&, a naughty fellow whom we normally shun) versus what is captured (to $1 from the first capture group).
Note: in the examples below, I use the character set [\s \w] as equivalent to (?:\s|\w) to emphasize the character-set nature of the grouping. The presence of an extra space in the character set is used in an attempt, possibly ill-conceived, to get everything to 'line up right'; the space is redundant because it is included in the \s 'whitespace' set.
Note also that I have used a simplified string in the examples, and the 'closing' tag is just '<t>' and '<X>' is the incorrect closing tag (the forward/backward slashes just confuse the issue).
>perl -wMstrict -le
"my $s = '<t>Abcd efgh ijK<t>';
;;
print qq{1a '$&' ($1)} if $s =~ m{ > ( (?:\s|\w)+ ) (?!<X>) }xms;
print qq{1b '$&' ($1)} if $s =~ m{ > ( [\s \w]+ ) (?!<X>) }xms;
;;
print qq{2a '$&' ($1)} if $s =~ m{ > ( (?:\s|\w)+ ) (?!<t>) }xms;
print qq{2b '$&' ($1)} if $s =~ m{ > ( [\s \w]+ ) (?!<t>) }xms;
;;
print qq{3a '$&' ($1)} if $s =~ m{ > (?: ( \s|\w )+ ) (?!<t>) }xms;
print qq{3b '$&' ($1)} if $s =~ m{ > (?: ([\s \w])+ ) (?!<t>) }xms;
"
1a '>Abcd efgh ijK' (Abcd efgh ijK)
1b '>Abcd efgh ijK' (Abcd efgh ijK)
2a '>Abcd efgh ij' (Abcd efgh ij)
2b '>Abcd efgh ij' (Abcd efgh ij)
3a '>Abcd efgh ij' (j)
3b '>Abcd efgh ij' (j)
Update: Changed example code to print $& first (in single-quotes), then $1 (in parentheses, symbolic of capture) to match the order of their discussion in the text.
|