Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Regex fun

by JadeNB (Chaplain)
on Dec 15, 2009 at 19:01 UTC ( #812918=note: print w/replies, xml ) Need Help??


in reply to Regex fun

I would assume that quantifier cannot be a '\x' variable, but don't really know.

I think it's important to note that \1 is not a variable * (which is why you can't use it outside of a regex); the variable that contains the contents of the first capture group is $1, but that 's empty doesn't take on its new value *** until the capture has completed **.

I think that the reason that ($rx){\1} isn't allowed is that the regex engine wants to compile the regex before running it. Since the contents of \1, hence the number of times that $rx is supposed to be captured, aren't known until run-time, this interferes with the compilation. For example, /\+32767.{32767}/ is rejected at compile time, but a '+32767' =~ /\+([0-9]*).{\1}/ construct would circumvent this restriction. (“Why, then,” you ask, “is something like /(.)\1/, which suffers from the same compilation problem, OK?” I dunno. :-) )

* Not a Perl variable, anyway. See Re^3: Regex fun, and probably Re^2: Regex fun as well.
** Except that (?{ print $1 }) works correctly, which is somewhat miraculous to me and very very helpful for debugging regexes.
UPDATE: *** Still false (see Re^6: Regex fun for where realisation finally dawns). It takes on its new value as soon as the capture completes (which explains the miracle referenced above); it's just that the interpolation in the text of the regex has already happened, so that the quantifier doesn't ‘see’ the new value.

Replies are listed 'Best First'.
Re^2: Regex fun
by JavaFan (Canon) on Dec 15, 2009 at 19:40 UTC
    I think it's important to note that \1 is not a variable (which is why you can't use it outside of a regex);
    But you can, sometimes, use it in the replacement part.
    think it's important to note that \1 is not a variable (which is why you can't use it outside of a regex); the variable that contains the contents of the first capture group is $1, but that's empty until the capture has completed.
    But in /([0-9]+){$1}/, the first capture is completed before the quantifier. So, that's not the reason.
    For example, /\+32767.{32767}/ is rejected at compile time
    Yes, but that's considered a bug. It's a restriction that should have been removed after the regexp engine was no longer recursive.
    “Why, then,” you ask, “is something like /(.)\1/, which suffers from the same compilation problem, OK?”
    That's not the same problem. {...} is one of the mini-languages inside regular expressions. Compare it with [...]. [\1] doesn't refer back to something else either.

    But one can defer a subpattern. The syntax is (??{ }). This is what the OP wants, and this is what the OP ought to use.

      But you can, sometimes, use it in the replacement part.
      Sure, but you're not supposed to: Warning on \1 Instead of $1.
      But in /([0-9]+){$1}/, the first capture is completed before the quantifier. So, that's not the reason.
      Sorry, I don't understand—not the reason for what?
      It's a restriction that should have been removed after the regexp engine was no longer recursive.
      Sorry, I don't understand this, either. Do you mean ‘re-entrant’? (UPDATE: Nope, just my internals-ignorance revealed. Thanks, ikegami!)
        Regarding the last point, the engine was re-engineered for 5.10. It used to use the C stack, so limits were imposed to prevent stack overflows. Now, the stack it uses is on the heap. The implementation moved away from a recursive model as part of the change.
        Sorry, I don't understand—not the reason for what?
        Quoting myself where I am quoting you:
        the variable that contains the contents of the first capture group is $1, but that's empty until the capture has completed.
        You're claiming $1 is "empty" until the the capture has completed. I'm pointing that the in the case of the OP, said first capture has completed.
        Do you mean ‘re-entrant’?
        No, I don't. The current regexp-engine isn't re-entrant.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://812918]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2020-04-10 07:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (49 votes). Check out past polls.

    Notices?