Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Backref in a regex quantifier

by Anonymous Monk
on Jul 09, 2007 at 12:27 UTC ( [id://625595]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear wise ones, I am trying to retrieve a numeric value in a regex, and then use it as the quantifier to the next capture. Something like this (simplified): /^(\d+?) (.{\1}) / But it doesn't work. Nothing ever matches. It seems as if it does not like variables in a quantifier, perhaps?? And yet this doesn't seem like an outrageous thing to want to do; but I daresay I am missing something very obvious. Hopefully someone can tell me wherein my dullness lies! Thanks in advance! GRS

Replies are listed 'Best First'.
Re: Backref in a regex quantifier
by almut (Canon) on Jul 09, 2007 at 13:13 UTC

    I can't tell you why the backref doesn't work in a quantifier... but you could use the "postponed regex" construct (??{ code }):

    my ($n, $v) = "3 aaa " =~ /^(\d+?) ((??{".{$1}"})) /; print "n=$n, v=$v\n"; # prints "n=3, v=aaa"
      a small tweak would be to use the  $^N pre-defined variable that "contains whatever was matched by the most-recently closed group (submatch)".
      this makes the sub-expression less sensitive to its position in a possibly complex regex expression.

      perl -wMstrict -e "my $re = qr/ ^ (\d+?) ( (??{ qq(.{$^N}) }) ) /x; /$re/, print qq($_ [$1] [$2] \n) for @ARGV" 43fffxxx 54321234 12xxxxxxxxxxx 12xxxxxxxxxxxx 43fffxxx [4] [3fff] 54321234 [5] [43212] 12xxxxxxxxxxx [1] [2] 12xxxxxxxxxxxx [1] [2]
Re: Backref in a regex quantifier
by jasonk (Parson) on Jul 09, 2007 at 13:13 UTC

    I don't think you will be able to do this in one step, you will probably have to do something like this:

    if ( /^(\d+?)/ ) { my $len = $1; if ( /^$len(.{$len})/ ) { my $value = $1; # do something with $len and $value } }

    We're not surrounded, we're in a target-rich environment!

      Here's a marginally more clever version of your solution, which relies on the placeholder capabilities of a /g search. I mention it because I think it's a tiny bit more readable than the version you wrote. But ... you know ... personal preference and all that.

      use strict; use warnings; my $test_string = "3 xyz"; # Start a global search # It looks like the expected pattern includes a space. Let's put it he +re. if ( $test_string =~ m/^(\d+) /g ) { my $length = $1; # Pick up the search where our first match left off. if ( $test_string =~ m/\G(.{$length})/ ) { my $result = $1; # Do stuff with $result ... } }
Re: Backref in a regex quantifier
by RMGir (Prior) on Jul 09, 2007 at 16:34 UTC
    I'm impressed w/ almut's solution, but you're likely safer using jasonk's. I think ??{ code } has had some odd issues in the past, so it's likely better not to use anything that exotic in production code.

    Mike
      ...it's likely better not to use anything that exotic in production code.

      You're absolutely right, and I probably should've added a "still experimental" warning. (In fact, I wouldn't use it myself, if my well-being depended on it :)  OTOH, if people never really use those more exotic features (and report bugs, if found!), they'll remain experimental forever...

      As to the original problem, does anyone know if backrefs are supposed to work within the {} quantifier, or not? (In the docs, I couldn't find anything explicit either way.)  I can well imagine that it's non-trivial to implement, but as the OP said, it doesn't seem too far off to attempt to use them that way, in particular as they do work outside of the quantifier.

        does anyone know if backrefs are supposed to work within the {} quantifier, or not?

        Good question.

        My guess would be that they're not supposed to work -- text matched in a backreference is matched textually, without metacharacters. So doing /^(.)\1$/ won't match a wildcard as a second character if the first is a literal ".", and won't match one or more "+" if the first character is a literal "+".

        Keeping to that pattern, it makes sense that if a backreference appears between {}, what's parsed is a literal "{", the backreference, and a literal "}".

        It would obviously complicate regexen something fierce if the possibility of recompiling the regex based on backreferences had to be handled in the general case, and not just in the ??{} case.


        Mike
Re: Backref in a regex quantifier
by monarch (Priest) on Jul 09, 2007 at 12:56 UTC
    The following works (note the lack of curly braces in the back-reference):
    use strict; foreach ( "123 123 ", "1 1 ", "45 &45 " ) { print( "\"$_\"\n " ); if ( /^(\d+?) (.\1) / ) { printf( "Match 1 => \"%s\", Match 2 => \"%s\"\n", $1, $2 ); } else { print( "No match\n" ); } }

    Outputs:

    "123 123 " Match 1 => "123", Match 2 => " 123" "1 1 " No match "45 &45 " Match 1 => "45", Match 2 => "&45"

    Update: corrected spelling in sentence.

      That doesn't do what the OP wants though. You're just matching the back reference again.

      If I understand it correctly, what the OP wants is to have a match on the string "3 56789" return 3 in $1 and 567 in $2

Re: Backref in a regex quantifier
by sfink (Deacon) on Jul 11, 2007 at 03:31 UTC
    No, backreferences are not allowed there. It could be implemented, but I don't think it comes up often enough to be needed.

    However, if I can simplify your example even further and assume that the first (\d+?) is always of fixed length (say 4 characters), then the following will give you what you want:

    unpack("a4/A*", $_)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://625595]
Approved by Corion
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-19 09:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found