Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Regular Expressions

by y8 (Novice)
on May 17, 2005 at 20:10 UTC ( #457984=perlquestion: print w/replies, xml ) Need Help??

y8 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, all! Today I tryed to use this rx:


to catch some data from this string:


Why perl gives me '1' in \1 at position 13-14? Probably, ^ donít work?
And finally perl slows down at something like this:



Replies are listed 'Best First'.
Re: Regular Expressions
by davido (Cardinal) on May 17, 2005 at 20:14 UTC

    You are essentially asking Perl to match one or more digits, followed by another single digit that is equal to what it just matched. That means for the overall match to work, it needs to find two consecutive identical digits. The only place you have two equal digits are near the end of your string; '111'. The '$' anchors \1 to the right hand side of the string, so basically the last two 1's are being matched.


      (We`re just disscusing this with y8 via IRC...)

      Yes, but ^ anchors \1 to the left hand side. So, the first '3' should be in \1... '\d+' can`t get nothing ...

      Nick <znick at inbox dot ru>
Re: Regular Expressions
by sh1tn (Priest) on May 17, 2005 at 21:24 UTC
    It's perfectly valid result.
    m/^(?:(\d+)|::)*\1$/x #which means the same as: m/^ (\d+)* #capture digits from the beginning of string \1 #than match the same digit as $1 in the end #and the backtracking (in reverse order) ends happily #where $1 is the digit before the last one $/;
    So for example if you have string like '12312377' then $1 will be 7.

Re: Regular Expressions
by mrborisguy (Hermit) on May 17, 2005 at 20:15 UTC
    what exactly are you trying to do?
    according to perlre, if you use the ?:, it won't store your match, so \1 shouldn't be available. (well, i guess it says $1 won't be stored, so i don't know if \1 will be available, can somebody clear this up?)
      Given the code:
      #!/usr/bin/perl use strict; use warnings; my $test_string = "This is not a waffle"; print "Match: $1\n" if $test_string =~ /(?:f)\1/;
      I get the following error (Win2K, perl v5.8.6):
      Reference to nonexistent group in regex; marked by <-- HERE in m/(?:f)\1 <-- HERE / at C:\Documents\ line 8. (where line 8 is the last line in the snippet)

      Note that this is actually referring to the lack of capturing parens. If I change the last line to
      print "Match: $1\n" if $test_string =~ /((?:f)\1)/;
      there are no errors but also no match. Interesting, if you ask me.

        In the first case, there is nothing stored in \1 so there shouldn't be any match (and it is an error.) In the latter case, \1 probably refers to the outer parentheses, but that hasn't taken effect yet, so there is no match and no error (perhaps it really doesn't make sense but perl tries to interpret it that way...)
        If you instead do
        print "Match: $1\n" if $test_string =~ /((?:f))\1/;
        the result is: Match: f
        (Doing that doesn't seem especially useful, though...)
        hmm... interesting. thanks for tryin' that our for me!
        perl -e '$_="aaaa"; print "Match: $1\n" if /(?:(a)\1)/;'

        prints 'Match: a'
        because you asked perl to grab letter 'a' and then another 'a' after this.

        perl -e '$_="aaaa"; print "Match: $1\n" if /((?:a)\1)/;'
        prints nothing because at the point of \1 there must be already captured group, which isn't a case, because \1 is prefixed by non-capturing parens.

        As for absence of warning in ((?:a)\1) - maybe regexp engine only issues warning at compile time if there're no grouping parens in re at all. (maybe it just doesn't perform complicated compile-time checking of regexp to see if grouping parens really come before \1).

      (\d+) this should be in \1

      Nick <znick at inbox dot ru>
Re: Regular Expressions
by runrig (Abbot) on May 17, 2005 at 23:27 UTC

    Logically, $1 can be anything that exists at the end of the string because (?:(\d+)|::)* matches a zero-length string. So whatever is in (\d+) is irrelevant.(this is wrong)

    I thought the regex engine had been "fixed" to set inner captured matches to undef when outer matches were zero length, but I still wouldn't depend on it.(this doesn't apply here)

    Update: The uncapturing parens along with the "*" do match multiple digits, but the inner parens match many times, and only the last thing is captured.

Re: Regular Expressions
by Anonymous Monk on May 17, 2005 at 20:49 UTC
    It`s interesting that...

    In Java string 3601825932618111 doesn`t match with regex ^(?:(\d+)|::)*\1$ .
    In seems to me, that ^ really doesn`t work in Perl (in this regex) or there is an error in syntax?

    "This is perl, v5.8.4 built for i386-linux-thread-multi"

    Nick <znick at inbox dot ru>
      There's an error in your understanding, if you think that the ^ should cause the expression not to match. It matches the beginning of the string, then multiple digits any number of times. Then the last capture has to match at the end of string.

      You see, the * allows the group to match multiple times, and each time through, the capture group gets overwritten. So it can match everything up to "81" the first pass, then throw away that capture, capture "1" instead, and then the \1 matches, and the whole match succeeds.

      Sounds like Java's regex engine just doesn't try hard enough.

      Caution: Contents may have been coded under pressure.
        I understand that ^ here is "from the begining of the sting".
        I didn`t know, that each match would overwrite capture group. Thanks. :)
        I can`t find something about overwiting capture group in perlretut (am I a bad finder?). Are there any docs about it?
        Thanks. :)

        Nick <znick at inbox dot ru>
      does java's regex engines support the (?:) feature? that may also be why it's not matching.
        Yes, it supports (?:). But, probably, it doen`t overwite capture group each mach of (?:(\d+)|::).

        Nick <znick at inbox dot ru>

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://457984]
Approved by Corion
Front-paged by ww
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2023-01-28 10:04 GMT
Find Nodes?
    Voting Booth?

    No recent polls found