in reply to Regex code assertion should be able to fail

Well here's a first cut at a patch against 5.9.5.

It seems to test OK and I think the documentation is probably minimally acceptable.

The implementation is a bit messy because I'm storing the distinction between (?{...}?) and (?{...}) in the same way as that between (??{...}) and (?{...}) (i.e. an extra LOGICAL op in the complied regex) and that's probably not the right way (in either of those cases). This distinction should in the EVAL op but I didn't feel my understanding of the code was deep enough to attempt that.

Update: Use TRUE and FALSE with bool variable.

diff -u --recursive orig\perl-5.9.5/pod/perlre.pod perl-5.9.5/pod/perl +re.pod --- orig\perl-5.9.5\pod\perlre.pod Sat Jul 07 15:40:28 2007 +++ perl-5.9.5\pod\perlre.pod Mon Sep 03 09:59:35 2007 @@ -880,6 +880,8 @@ may be used instead of C<< \k<NAME> >> in Perl 5.10 or later. =item C<(?{ code })> + +=item C<(?{ code }?)> X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, c +ode in> B<WARNING>: This extended regular expression feature is considered @@ -887,18 +889,24 @@ has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine. -This zero-width assertion evaluates any embedded Perl code. It -always succeeds, and its C<code> is not interpolated. Currently, -the rules to determine where the C<code> ends are somewhat convoluted +. +These zero-width assertions evaluate any embedded Perl code. In the +form without the terminal C<?> the assertion always succeeds. The +C<(?{ code }?)> form the assertion only succeeds if C<code> returns +true. In either case its C<code> is not interpolated. Currently, the +rules to determine where the C<code> ends are somewhat convoluted. This feature can be used together with the special variable C<$^N> to -capture the results of submatches in variables without having to keep +restrict or save the results of submatches without having to keep track of the number of nested parentheses. For example: $_ = "The brown fox jumps over the lazy dog"; /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i; print "color = $color, animal = $animal\n"; + @known_animal{ qw( cat dog fox horse rabbit rat ) } = (); + @animals = /\b(\w++)(?{ exists $known_animal{$^N} }?)/g; + print "@animals\n"; + Inside the C<(?{...})> block, C<$_> refers to the string the regular expression is matching against. You can also use C<pos()> to know wha +t is the current position of matching within this string. @@ -925,11 +933,12 @@ introduced value, because the scopes that restrict C<local> operators are unwound. -This assertion may be used as a C<(?(condition)yes-pattern|no-pattern +)> -switch. If I<not> used in this way, the result of evaluation of -C<code> is put into the special variable C<$^R>. This happens -immediately, so C<$^R> can be used from other C<(?{ code })> assertio +ns -inside the same regular expression. +These assertions may be used as a +C<(?(condition)yes-pattern|no-pattern)> switch and the two forms are +equivalent when used in this way. If I<not> used in this way, the +result of evaluation of C<code> is put into the special variable +C<$^R>. This happens immediately, so C<$^R> can be used from other +C<(?{ code })> assertions inside the same regular expression. The assignment to C<$^R> above is properly localized, so the old value of C<$^R> is restored if the assertion is backtracked; compare diff -u --recursive orig\perl-5.9.5/regcomp.c perl-5.9.5/regcomp.c --- orig\perl-5.9.5/regcomp.c Sat Jul 07 15:40:26 2007 +++ perl-5.9.5/regcomp.c Sun Sep 02 22:48:48 2007 @@ -5690,6 +5690,7 @@ U32 n = 0; char c; char *s = RExC_parse; + bool is_failable = FALSE; RExC_seen_zerolen++; RExC_seen |= REG_SEEN_EVAL; @@ -5704,6 +5705,12 @@ count--; RExC_parse++; } + + if (!is_logical && *RExC_parse == '?') { + /* (?{...}?) but not (??{...}?) */ + is_failable = TRUE; + RExC_parse++; + } if (*RExC_parse != ')') { RExC_parse = s; vFAIL("Sequence (?{...}) not terminated or not {}-balance +d"); @@ -5711,7 +5718,7 @@ if (!SIZE_ONLY) { PAD *pad; OP_4tree *sop, *rop; - SV * const sv = newSVpvn(s, RExC_parse - 1 - s); + SV * const sv = newSVpvn(s, RExC_parse - 1 - is_failable +- s); ENTER; Perl_save_re_context(aTHX); @@ -5742,10 +5749,14 @@ } nextchar(pRExC_state); - if (is_logical) { + if (is_logical || is_failable ) { + /* (??{...}} or (?{...}?) but, if we could manage +, + not (?(?{ ... }?)...). That's rather difficult + so instead work round it in regmatch. + -- nobull */ ret = reg_node(pRExC_state, LOGICAL); if (!SIZE_ONLY) - ret->flags = 2; + ret->flags = is_logical ? 2 : 3; REGTAIL(pRExC_state, ret, reganode(pRExC_state, E +VAL, n)); /* deal with the length of this later - MJD */ return ret; @@ -5767,6 +5778,9 @@ ret = reg_node(pRExC_state, LOGICAL); if (!SIZE_ONLY) ret->flags = 1; + /* Need somehow to tell /(?{...)?)/ that it + should behave like /(?{...})/ as part of + if construct -- nobull * +/ REGTAIL(pRExC_state, ret, reg(pRExC_state, 1, + &flag,depth+1)); goto insert_if; } diff -u --recursive orig\perl-5.9.5/regexec.c perl-5.9.5/regexec.c --- orig\perl-5.9.5/regexec.c Sat Jul 07 15:40:26 2007 +++ perl-5.9.5/regexec.c Sun Sep 02 21:19:53 2007 @@ -2713,6 +2713,7 @@ 0: (?{...}) 1: (?(?{...})X|Y) 2: (??{...}) + 3: (?{...}?) or the following IFMATCH/UNLESSM is: false: plain (?=foo) true: used as a condition: (?(?=foo)) @@ -3669,9 +3670,13 @@ PL_op = oop; PAD_RESTORE_LOCAL(old_comppad); PL_curcop = ocurcop; - if (!logical) { - /* /(?{...})/ */ + if (!logical || logical == 3 ) { + /* /(?{...})/ or /(?{...}?) */ sv_setsv(save_scalar(PL_replgv), ret); + if ( logical && !SvTRUE(ret) ) { + /* (?{...}?/ assertion failed */ + sayNO; + } break; } } @@ -3875,7 +3880,12 @@ } break; case LOGICAL: - logical = scan->flags; + /* (?(?{...}?)...) will insert a second redundant LOGICAL + op which we here ignore. This is messy, sorry + --nobull */ + if ( !logical ) { + logical = scan->flags; + } break; /******************************************************************* diff -u --recursive orig\perl-5.9.5/t/op/re_tests perl-5.9.5/t/op/re_t +ests --- orig\perl-5.9.5/t/op/re_tests Sat Jul 07 15:40:24 2007 +++ perl-5.9.5/t/op/re_tests Mon Sep 03 08:57:01 2007 @@ -1338,3 +1338,10 @@ .*\z foo\n y - - ^(?:(\d)x)?\d$ 1 y ${\(defined($1)?1:0)} 0 .*?(?:(\w)|(\w))x abx y $1-$2 b- +x(??{1}?) x c - Sequence (?{...}) not terminated +x(?{1}?) x y $& x +x(?{0}?) x n - - +(?(?{1}?)x|y) x y $& x +(?(?{0}?)x|y) x n - - +(?(?{1}?)y|x) x n - - +(?(?{0}?)y|x) x y $& x \ No newline at end of file