http://www.perlmonks.org?node_id=1047667

0day has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

It is difficult to explain in words (I do not know English very well),
here is the code:
open (my $file, '<', 'data.bin'); binmode $file; my $binary_data = <$file>; $binary_data =~ /(.)(.){$1}/; # how can I do something like that? #or better yet $binary_data =~ /(.)(.){(?{unpack 'H*', $1})}/; # ? # or may be /(.)(?{unpack 'H*', $1})(.){$^R}/; # ???
I would like to perform such a trick in regular expression. Possible?

Update

There is a solution (help my friend):
/(.)((??{'.' x $1}))/;
but, how I can in this example, assign values to three different groups?
$_ = "3aaaa\nbbbb\nccccc\nddddd"; /(\d)((??{'.*?\n' x $1}))/; print $2; # aaaa\nbbbb\nccccc\n # how to assign a value aaaa\n of $3, bbbb\n -> $4 etc ?
There are other solutions?

Replies are listed 'Best First'.
Re: Regular expressions: interpolate the variable in the value of the number of repetitions for the group
by rjt (Curate) on Aug 03, 2013 at 01:40 UTC

    The problem you want to solve is certainly solvable. However, the approach you want to take is going to be very ugly, if it's even possible at all. Here's what I would do instead:

    s/(\d)//; say for (split /\n/)[0..$1-1];

    However, you haven't mentioned what context this is in, and the data does seem highly contrived. What problem are you really trying to solve?

    If you need to match an expression like this several times in a larger chunk of text, you'll need to split that text somehow, but you haven't given enough information for me to help you, there. (For example, split /(\d)/ first would give you an array of digits and strings to loop over.)

      Heh... Thanks, but I want to do it the forces of the regular
      expression (without external code).

      Problem has practical examples (which are easily solved by
      external code), but I want to do it other way.

      Thanks.
        Heh... Thanks, but I want to do it the forces of the regular expression (without external code).

        The really funny thing is, all of the broken examples you cite as somehow desirable in the root node execute "external code" (which I take to mean non-regex Perl code), via the experimental features (?{ ... }) and (??{ ... }):

        0day's code:

        $binary_data =~ /(.)(.){(?{unpack 'H*', $1})}/; # ? /(.)(?{unpack 'H*', $1})(.){$^R}/; # ??? /(.)((??{'.' x $1}))/; /(\d)((??{'.*?\n' x $1}))/;

        how to assign a value aaaa\n of $3, bbbb\n -> $4 etc ?

        Short answer: you can't, unless you're OK with this:

        /(\d) ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? ([^\d]+?\n)? /x; printf "%d: <%s>\n", $_, eval '$'.$_ for 2..$1+1;

        But, no doubt you already thought of that. I want to help you, and to do that, I need a complete description of the actual problem you're trying to solve that you need our help with, as well as some real examples of input and expected output.

        I do not know whether the solution proposed by rjt solves your problem, because you haven't decribed your problem and your data in sufficient details, but rjt's solution does not use any external code, but just Perl core functions and operators. And, BTW, the split function uses regular expressions.

Re: Regular expressions: interpolate the variable in the value of the number of repetitions for the group
by kcott (Archbishop) on Aug 03, 2013 at 10:30 UTC

    G'day 0day,

    The perlre documentation, in the Extended Patterns section, says this about the (??{ code }) construct:

    "During the matching of this sub-pattern, it has its own set of captures which are valid during the sub-match, but are discarded once control returns to the main pattern."

    So, while /(\d)((??{'.*?\n' x $1}))/ captures $2 as you state, attempting to capture sub-patterns within (??{ code }) to $3, $4, etc. won't work. Furthermore, unless your target string always begins with '3' you won't know how many $<integer> variables will be available.

    The following script achieves the result you're after by storing the sub-pattern captures in an array.

    #!/usr/bin/env perl -T use 5.010; use strict; use warnings; use re qw{taint eval}; my $x = "3aaaa\nbbbb\nccccc\nddddd"; my @captures; $x =~ /\A (\d+) ( (??{ '(.+?(?:\n|$))(?{ push @captures, $^N })' x $1 +}) )/mx; say '[1]', $1, '[1]'; say '[2]', $2, '[2]'; say '[', $_ + 3, ']', $captures[$_], '[', $_ + 3, ']' for 0 .. $#captu +res;

    Here's the output which, as you can see, is somewhat fudged to give a sense of what $3, $4, etc. would have been:

    $ pm_1047667_regex.pl [1]3[1] [2]aaaa bbbb ccccc [2] [3]aaaa [3] [4]bbbb [4] [5]ccccc [5]

    Changing the first character of the target string to '2':

    $ pm_1047667_regex.pl [1]2[1] [2]aaaa bbbb [2] [3]aaaa [3] [4]bbbb [4]

    And to '4':

    $ pm_1047667_regex.pl [1]4[1] [2]aaaa bbbb ccccc ddddd[2] [3]aaaa [3] [4]bbbb [4] [5]ccccc [5] [6]ddddd[6]

    Notes:

    • Read the security concerns regarding (?{ code }) and (??{ code }) mentioned in both the re (pragma) and perlre documentation. The script I've posted uses both the -T switch and use re 'taint': this is more to highlight the issue rather than any real worries about matching parts of "3aaaa\nbbbb\nccccc\nddddd".
    • Also read the experimental warnings for both (?{ code }) and (??{ code }) in perlre. I'm using 5.18.0, you may get different results with another version.

    -- Ken

      Wow...
      Thank you very much kcott. Thank that you explained, your decision is the most elegant.

      Many thanks to all who tried to help.