go ahead... be a heretic PerlMonks

### Re: Why machine-generated solutions will never cease to amaze me

by tmoertel (Chaplain)
 on Nov 10, 2004 at 18:27 UTC ( #406746=note: print w/replies, xml ) Need Help??

(Updated 2004-11-11: Pointed out all-or-none matching constraint that applies here. If you read this post, please also read the follow-on thread started by hv for more regexp-probing fun.)

Regarding:

And try as I might, I can't find a string that behaves differently between the two. I've tried all the possibilities I can think of, like sadad salalad saladad slad sal, but they behave identically. I still have a nagging doubt that there may be a difference; if anyone can point it out I be most appreciative.
There's no difference because we can prove that the regular expressions are equivalent for any atomic regexes x and y (when constrained to all-or-none matching, which is the case in the original poster's question):
(x|y)? === (x?|y) === (x|y?)
The proof goes as follows. (Note: I use (x) instead of (?:x) to denote grouping.)

2. We can rewrite as (x|y)|E because, by definition, r? is equivalent to r|E where E matches the empty string.
3. We can rewrite as x|(y|E) because regex union | is associative.
4. And rewrite as x|(y?), again by definition of ?
5. And as x|(y?|0) because the regex that never matches anything – 0 – is the identity for union.
6. And as (x|y?)|0 by associativity.
7. And finally as (x|y?), again, because 0 is the identity for union.

(The proof for the (x?|y) case follows immediately from regex union being commutative.)

To use your example, let x = (?:al) and y = (?:pre) and the same proof applies. That's why you can't find a string that matches s(?:al|pre)?ad differently than s(?:(?:al)?|pre)ad – there isn't one.

Sometimes it's easier to go after the proof than the counterexample.   : )

Cheers,
Tom

Replies are listed 'Best First'.
Re^2: Why machine-generated solutions will never cease to amaze me
by hv (Parson) on Nov 11, 2004 at 13:34 UTC

The proof for the (x?|y) case follows immediately from regex union being commutative.

Regexp union is not commutative when one of the alternates is a leading substring of another: then order becomes important - (E|x) will always match E in preference to x.

It is the presence of the outer anchors in the original pattern that disambiguates and thus makes it commutative.

Hugo

hv wrote:
Regexp union is not commutative when one of the alternates is a leading substring of another: then order becomes important - (E|x) will always match E in preference to x.
Excellent point. (However, union is commutative even in the face of overlapping operands when the union is forced to match all of the target string or not match at all.)
It is the presence of the outer anchors in the original pattern that disambiguates and thus makes (regexp union) commutative.
Even anchoring isn't sufficient. What if one of the anchors matches the difference between the union operands? Consider (a|aa) in:
```  "aaaa" =~ /a(a|aa)a/

Fortunately, in the case of the OP's regexes – s(?:al|pre)?ad and s(?:(?:al)?|pre)ad) – the union operands don't overlap, and so classical equivalence (do the regexps generate the same language?) predicts the equivalence of Perl's matching behavior (do the regexps match the same strings identically?)

Cheers,
Tom

Create A New User
Node Status?
node history
Node Type: note [id://406746]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2018-03-21 21:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When I think of a mole I think of:

Results (270 votes). Check out past polls.

Notices?