Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Why machine-generated solutions will never cease to amaze me

by tmoertel (Chaplain)
on Nov 10, 2004 at 18:27 UTC ( #406746=note: print w/ replies, xml ) Need Help??


in reply to Why machine-generated solutions will never cease to amaze me

(Updated 2004-11-11: Pointed out all-or-none matching constraint that applies here. If you read this post, please also read the follow-on thread started by hv for more regexp-probing fun.)

(Updated: Added quotation for context; added conclusion.)

Regarding:

And try as I might, I can't find a string that behaves differently between the two. I've tried all the possibilities I can think of, like sadad salalad saladad slad sal, but they behave identically. I still have a nagging doubt that there may be a difference; if anyone can point it out I be most appreciative.
There's no difference because we can prove that the regular expressions are equivalent for any atomic regexes x and y (when constrained to all-or-none matching, which is the case in the original poster's question):
(x|y)? === (x?|y) === (x|y?)
The proof goes as follows. (Note: I use (x) instead of (?:x) to denote grouping.)

  1. We start with (x|y)?
  2. We can rewrite as (x|y)|E because, by definition, r? is equivalent to r|E where E matches the empty string.
  3. We can rewrite as x|(y|E) because regex union | is associative.
  4. And rewrite as x|(y?), again by definition of ?
  5. And as x|(y?|0) because the regex that never matches anything – 0 – is the identity for union.
  6. And as (x|y?)|0 by associativity.
  7. And finally as (x|y?), again, because 0 is the identity for union.

(The proof for the (x?|y) case follows immediately from regex union being commutative.)

To use your example, let x = (?:al) and y = (?:pre) and the same proof applies. That's why you can't find a string that matches s(?:al|pre)?ad differently than s(?:(?:al)?|pre)ad – there isn't one.

Sometimes it's easier to go after the proof than the counterexample.   : )

Cheers,
Tom


Comment on Re: Why machine-generated solutions will never cease to amaze me
Select or Download Code
Re^2: Why machine-generated solutions will never cease to amaze me
by hv (Parson) on Nov 11, 2004 at 13:34 UTC

    The proof for the (x?|y) case follows immediately from regex union being commutative.

    Regexp union is not commutative when one of the alternates is a leading substring of another: then order becomes important - (E|x) will always match E in preference to x.

    It is the presence of the outer anchors in the original pattern that disambiguates and thus makes it commutative.

    Hugo

      hv wrote:
      Regexp union is not commutative when one of the alternates is a leading substring of another: then order becomes important - (E|x) will always match E in preference to x.
      Excellent point. (However, union is commutative even in the face of overlapping operands when the union is forced to match all of the target string or not match at all.)
      It is the presence of the outer anchors in the original pattern that disambiguates and thus makes (regexp union) commutative.
      Even anchoring isn't sufficient. What if one of the anchors matches the difference between the union operands? Consider (a|aa) in:
      "aaaa" =~ /a(a|aa)a/

      Fortunately, in the case of the OP's regexes – s(?:al|pre)?ad and s(?:(?:al)?|pre)ad) – the union operands don't overlap, and so classical equivalence (do the regexps generate the same language?) predicts the equivalence of Perl's matching behavior (do the regexps match the same strings identically?)

      Cheers,
      Tom

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://406746]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2014-12-19 13:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (83 votes), past polls