Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Regular expressions: interpolate the variable in the value of the number of repetitions for the group

by kcott (Abbot)
on Aug 03, 2013 at 10:30 UTC ( #1047691=note: print w/ replies, xml ) Need Help??


in reply to Regular expressions: interpolate the variable in the value of the number of repetitions for the group

G'day 0day,

The perlre documentation, in the Extended Patterns section, says this about the (??{ code }) construct:

"During the matching of this sub-pattern, it has its own set of captures which are valid during the sub-match, but are discarded once control returns to the main pattern."

So, while /(\d)((??{'.*?\n' x $1}))/ captures $2 as you state, attempting to capture sub-patterns within (??{ code }) to $3, $4, etc. won't work. Furthermore, unless your target string always begins with '3' you won't know how many $<integer> variables will be available.

The following script achieves the result you're after by storing the sub-pattern captures in an array.

#!/usr/bin/env perl -T use 5.010; use strict; use warnings; use re qw{taint eval}; my $x = "3aaaa\nbbbb\nccccc\nddddd"; my @captures; $x =~ /\A (\d+) ( (??{ '(.+?(?:\n|$))(?{ push @captures, $^N })' x $1 +}) )/mx; say '[1]', $1, '[1]'; say '[2]', $2, '[2]'; say '[', $_ + 3, ']', $captures[$_], '[', $_ + 3, ']' for 0 .. $#captu +res;

Here's the output which, as you can see, is somewhat fudged to give a sense of what $3, $4, etc. would have been:

$ pm_1047667_regex.pl [1]3[1] [2]aaaa bbbb ccccc [2] [3]aaaa [3] [4]bbbb [4] [5]ccccc [5]

Changing the first character of the target string to '2':

$ pm_1047667_regex.pl [1]2[1] [2]aaaa bbbb [2] [3]aaaa [3] [4]bbbb [4]

And to '4':

$ pm_1047667_regex.pl [1]4[1] [2]aaaa bbbb ccccc ddddd[2] [3]aaaa [3] [4]bbbb [4] [5]ccccc [5] [6]ddddd[6]

Notes:

  • Read the security concerns regarding (?{ code }) and (??{ code }) mentioned in both the re (pragma) and perlre documentation. The script I've posted uses both the -T switch and use re 'taint': this is more to highlight the issue rather than any real worries about matching parts of "3aaaa\nbbbb\nccccc\nddddd".
  • Also read the experimental warnings for both (?{ code }) and (??{ code }) in perlre. I'm using 5.18.0, you may get different results with another version.

-- Ken


Comment on Re: Regular expressions: interpolate the variable in the value of the number of repetitions for the group
Select or Download Code
Re^2: Regular expressions: interpolate the variable in the value of the number of repetitions for the group
by 0day (Sexton) on Aug 03, 2013 at 14:46 UTC
    Wow...
    Thank you very much kcott. Thank that you explained, your decision is the most elegant.

    Many thanks to all who tried to help.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1047691]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2015-07-05 08:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (61 votes), past polls