Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Interpolation of capture buffers not working when regex stored in variable

by jacklh (Initiate)
on Jun 05, 2013 at 20:01 UTC ( #1037292=perlquestion: print w/ replies, xml ) Need Help??
jacklh has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to add functionality to an existing script that sends files to customers. The problem is we have dozens, possibly hundreds in near future, of customers who want the files we send them to match their specific naming convention. As such, I'm trying to make our script generic and allow a series of regex patterns to be defined in an INI per client. We have it working EXCEPT for capture groups, which are not interpolating at all--it always remains as $1, $2 ... $n. Here is a script showing the basic concept of what we've tried (and failed) when dealing with capture groups.
my $s = "foo1bar"; $s =~ /(\w+)\d(\w+)/; print "var1: $1\tvar2: $2\n"; my $p = "$2$1"; # works and prints "p is barfoo" # call script as "perl $2$1" #my $p = "$ARGV[0]"; # doesn't work, prints "p is $2$1" print "p is $p\n";
Results when hardcoded into script:
var1: foo var2: bar p is barfoo
Results when using ARGV[0] method as "perl $2$1":
var1: foo var2: bar p is $2$1
Here is another conceptual version illustrating the issue we've run into.
use strict; use warnings; use Config::IniFiles; my %args; my $global_ini_data = Config::IniFiles->new( -file => "test.ini" ); my $regex = $global_ini_data->val( 'client1', 'rename' ); my $filename = "201306051200foobar.dat"; print "REGEX: $regex\n"; print "BEFORE: $filename\n"; # Uncomment each method individually to test # Method 1: doesn't work #$filename =~ qr($regex); # Method 2: doesn't work #$filename =~ $regex ; # Method 3: doesn't work #$filename = modify($filename, sub { $_[0] =~ $regex }); # Method 4: doesn't work (not global) #my ($before, $after) = $regex =~ m{/(.*)/(.*)/}; #$filename = replace($filename, $before, sub { $after }, 0); # Method 5: doesn't work (global) #my ($before, $after) = $regex =~ m{/(.*)/(.*)/}; #$filename = replace($filename, $before, sub { $after }, 1); # Method 6: works, but hardcoded and not in INI; not scalable $filename =~ s/(\d{2})(\d{2})(\d{4})(.*)/$3$2\.$4/; # End Methods test print "AFTER: $filename\n"; exit; sub replace { my($string, $find, $replace, $global) = @_; unless($global) { $string =~ s($find){ $replace->() }e; } else { $string =~ s($find){ $replace->() }ge; } return $string; } sub modify { my($text, $code) = @_; $code->($text); return $text; }
The test.ini config file
[client1] # PURPOSE: Convert YYYYMMDDhhmmss to MMDDYY.hhmmss rename=s/(\d{2})(\d{2})(\d{4})(.*)/$3$2\.$4/
The results: Methods 1 - 3 (not working)
BEFORE: 201306051200foobar.dat AFTER: 201306051200foobar.dat
Methods 4 - 5 (not working)
BEFORE: 201306051200foobar.dat AFTER: $3$2\.$4
Method 6 works, but hardcoded:
BEFORE: 201306051200foobar.dat AFTER: 060513.1200foobar.dat

Any ideas to make capture groups work when patterns defined in INI?


(1) I know the test.ini format leaves security risk of code injection--the above script is just for illustrative purposes. In the prod script, we actually just parse the before and after regex patterns from the INI and build the correct substitution regex.

(2) I consulted these posts, which provided valuable insight (and used in above methods) and ALMOST have the solution, but either don't work for capture groups defined in INI or when it does work, it's too rigid by hardcoding the number of capture groups in source code rather than allowing it to be dynamically defined in an INI, again necessary because we may potentially be holding hundreds of unique client configurations and need this to scale:

Replies are listed 'Best First'.
Re: Interpolation of capture buffers not working when regex stored in variable
by ikegami (Pope) on Jun 05, 2013 at 20:14 UTC

    You're confusing string literals (code) and strings (values).

    In the source, "$2$1" is a string literal. It results in a string made up of the values of $1 and $2.

    my $p = "$2$1"; # my $p = $2.$1; # Doesn't change the value of $2 or $1.

    In the source, "$ARGV[0]" is a string literal. It results in a string made up of the value of $ARGV[0].

    my $p = "$ARGV[0]"; # my $p = "".$ARGV[0]; # Doesn't change the value of $ARGV[0].

    If you want process the value of a variable for $1 and $2 and replace them, you are trying to implement a templating system. There's a one that matches your specs called String::Interpolate.

    use String::Interpolate qw( interpolate); my $t = '$1$2'; or my $t = $ARGV[0]; my $p = interpolate($t);
      Thank you very much! String::Interpolate worked in conjunction with "replace()" code used by methods 4 & 5. We new about the string literal vs string value, but we didn't know how to "cast" it (not mentioned in any of the docs), so we'd hoped Perl would figure it out from context like it does for so many other type conversions. You have saved us. Many thanks again.
        Here's the modified code for others reference:
        use String::Interpolate qw( interpolate ); ...[snip]... # Method 4: worked! (not global) #my ($before, $after) = $regex =~ m{/(.*)/(.*)/}; #$filename = replace($filename, $before, sub {interpolate($after)},0); # Method 5: worked! (global) my ($before, $after) = $regex =~ m{/(.*)/(.*)/}; $filename = replace($filename, $before, sub {interpolate($after)},1);
Re: Interpolation of capture buffers not working when regex stored in variable
by shmem (Canon) on Jun 05, 2013 at 20:15 UTC
    Any ideas to make capture groups work when patterns defined in INI?


    use Config::IniFiles; my $global_ini_data = Config::IniFiles->new( -file => "test.ini" ); my $regex = $global_ini_data->val( 'client1', 'rename' ); my $filename = "201306051200foobar.dat"; eval "sub vodoo { \$_[0] =~ $regex }"; vodoo ( $filename ); # this does in-place edit. print $filename,$/; __END__ 060520.1200foobar.dat

    If you want a string to be a command, you have to compile it - with string eval. Just the same way as any other perl source is compiled.


    I admit no having read through all of your code. If you have only one substitution per filename, you could get around with

    eval "\$filename =~ $regex"; die $@ if $@;

    Having multiple patterns and clients, you surely don't want to have a named subroutine for each of them. You could construct a dispatch table with

    sub makesub { # my ($client, $regex) = @_; update: wrong, we have 1 param my $regex = shift; my $sub = eval "sub { \$_[0] =~ $regex }"; die $@ if $@; return $sub; }

    and populate a hash with the client identifier as key and the resulting anonymous sub as value, which you then call. Like so:

    $hash{$client} = makesub($regex); $hash{$client}->($filename); # $filename changed
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      We'd seen dispatch table mentioned in our design patterns book and it was one of the ideas we'd thought we'd pursue if we couldn't figure out the interpolation issue. Not having done a dispatch table before your code example will help--thanks. I'll read up on this for future reference.
Re: Interpolation of capture buffers not working when regex stored in variable
by rjt (Deacon) on Jun 06, 2013 at 01:22 UTC

    As an additional suggestion, for this sort of problem, you might also wish to make use of the named capture buffers feature available since Perl 5.10. This way you are not stuck trying to keep brackets in the same relative position for each regexp across many config files. For example, parsing a date:

    my $re = qr/(?<yyyy>\d{4})(?<mm>\d{2}(?<dd>\d{2})/; printf "The date was %04d-%02d-%02d", $+{yyyy}, $+{mm}, $+{dd};

    If you have to support a different format that is mm-dd-yyyy, for example, you're fine as long as you put the (?<name>...) capture buffers in the correct places.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1037292]
Approved by shmem
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2016-08-27 11:28 GMT
Find Nodes?
    Voting Booth?
    The best thing I ever won in a lottery was:

    Results (379 votes). Check out past polls.