Re: Regexp substitution using variables
by choroba (Cardinal) on Nov 25, 2020 at 19:56 UTC
|
Some of the flags can be moved to a non-capturing group:
#!/usr/bin/perl
use warnings;
use strict;
my $string = 'abc';
my $pattern = 'B';
my $replacement = 'X';
my $flags = 'i';
$string =~ s/(?$flags:$pattern)/$replacement/;
print $string; # aXc
But you can't do that for /gore.
Update: Even string eval doesn't help, as plain interpolation of the $replacement can break if it contains a slash.
eval "s/\$pattern/\$replacement/$flags"
doesn't work either, as you can't put $1 into $replacement unless you always use /ee which makes it unsafe again.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
|
Thank you to all of you who have suggested the form s/(?$flags:$pattern)/$replacement/. I think this will get me much of what I need. The fact that the global-replace flag "g" doesn't work in this position is an annoying wrinkle, but I am going to take a deep breath and code up the with-g and without-g cases separately, depending on whether or not $flags =~ s/g// succeeds.
| [reply] |
|
| [reply] [d/l] |
|
but I am going to take a deep breath and code up the with-g and without-g cases separately
What is the obsession of people to try and solve a complex problem like this in a single line of code, just because it can be done in one line of code in a perl script? I don't mean you specifically, but in general, like apparently most people who replied to this thread.
Splitting this up in two parts makes sense, using /g is not a modifier of the pattern (as it is in several other languages), but of the substitution.
Something like this looks acceptable to me as the (obvious) redundancy is actually quite limited:
if($flages =~ s/g//) {
s/(?:$flags)$pattern/replacement($replacement)/ge;
}
else {
s/(?:$flags)$pattern/replacement($replacement)/e;
}
where you still have to provide the sub replacement.
Other flags cannot really coded this way, but there's no need to provide for /o or /r at all, and allowing people to use /e flag in a config file, simply looks dangerous to me.
If people would really want to use /e, it likely would be for just a handful of specific cases, and you can instead code a simpler solution for those cases (for the user, not necessarily for you) explicitly in your script, than having them write convoluted perl code.
That real danger of allowing ordinary users to run arbitrary code, is also why I really don't like use of eval. It also enforces taking special care to be taken when writing the sub replacement. You can mitigate the danger by using a module like String::Interpolate, to embed captured values while disallowing access to the rest of the intestines of the script.
.
| [reply] [d/l] [select] |
|
Well, this gets me much of what I need ... but I can't get back-references, either with $1 or \1. In either case, they appear as literals. Any ideas, other than eval?
| [reply] |
|
|
|
| [reply] [d/l] |
|
|
Even string eval doesn't help, as plain interpolation of the $replacement can break if it contains a slash
My first thought was to use eval but I hit a brick wall when I tried...
I was thinking of first changing a slash in $replacement for a double slash then using eval to do the substitution but I got stuck getting the result of the substitution.
$replacement =~ s/\\/\\\\/g;
$string =~ eval "s/\$pattern/\$replacement/$flags";
But that doesn't do it... | [reply] [d/l] [select] |
Re: Regexp substitution using variables
by jwkrahn (Monsignor) on Nov 25, 2020 at 19:49 UTC
|
$value =~ s/(?$flags:$pattern)/$replacement/;
| [reply] [d/l] |
|
$value =~ s/(?$flags:$pattern)/$replacement/;
In my answer I used a subtle variation:
$value =~ s/(?$flags)$pattern/$replacement/;
which worked as expected in my test code. So I went off to the documentation and sure enough it shows both but it does not (at least to my eyes) show what the difference is between them. Can anyone explain if there is a difference and when it practically matters? It doesn't seem to matter here.
On a different note - is it preferred by other Monks that questions like this get asked in the thread or is the preference for them to have their own new thread?, | [reply] [d/l] [select] |
|
> does not (at least to my eyes) show what the difference is between them.
you are comparing
in your example there is no difference, but in the second approach with pattern the reach of modifiers is limited to the group.
DB<24> p 'xX' =~ /(?i:X)X/
1
DB<25> p 'xX' =~ /(?i:X)x/
DB<26> p 'xX' =~ /(?i)Xx/
1
DB<27>
| [reply] [d/l] |
Re: Regexp substitution using variables
by kcott (Archbishop) on Nov 25, 2020 at 22:11 UTC
|
G'day MikeTaylor,
Or perhaps something like this, if I could only find the right class name:
my $re = new Regexp($pattern);
my $value = $re->substitute($value, replacement, $flags);
About 15 or 20 years ago, I read a book by Damian Conway
called "Object Oriented Perl".
In it, he shows the creation of blessed objects using various things including regular expressions:
your post reminded me of this.
I don't own the book.
If you can get a copy (you possibly already own one) it's certainly worth reading even though it's now quite old.
If you follow the link I provided, you'll see a free PDF copy is offered;
however, it looks like you need to "add to cart" which probably also means you have to "create an account"
— I didn't follow through on this.
Here's a very quick-and-dirty implementation of a class which blesses regular expressions.
package Regex;
use strict;
use warnings;
sub new {
my ($class, $pattern, $flags) = @_;
my $flag_part = defined $flags ? "(?$flags)" : '';
my $re_part = "\Q$pattern";
return bless qr{$flag_part$re_part}, $class;
}
sub match {
my ($self, $str) = @_;
return $str =~ $self ? 'YES' : 'NO';
}
sub replace {
my ($self, $str, $new) = @_;
$str =~ s/$self/$new/;
return $str;
}
1;
If you want to use something like this in production code, it'll need a lot more work.
What I've provided is only intended to demonstrate the basic principles involved.
The book would probably provide a lot more information;
but I don't remember details of something I read about two decades ago.
Here's a test of that module:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
use lib "$FindBin::Bin/../lib";
use Regex;
my $pat = 'b';
my $case_sens_re_obj = Regex::->new($pat);
my $case_insens_re_obj = Regex::->new($pat, 'i');
my $test_string = 'ABC';
print 'case_sens_re_obj match: ',
$case_sens_re_obj->match($test_string), "\n";
print 'case_insens_re_obj match: ',
$case_insens_re_obj->match($test_string), "\n";
print 'case_sens_re_obj substition: ',
$case_sens_re_obj->replace($test_string, '_'), "\n";
print 'case_insens_re_obj substition: ',
$case_insens_re_obj->replace($test_string, '_'), "\n";
Output:
case_sens_re_obj match: NO
case_insens_re_obj match: YES
case_sens_re_obj substition: ABC
case_insens_re_obj substition: A_C
Unrelated but important:
Please avoid indirect object syntax; e.g. new Regexp($pattern).
See "perlobj: Indirect Object Syntax"
for a discussion of problems with this syntax.
The above example would be much better as Regexp::->new($pattern)
— "perlobj: Invoking Class Methods" explains that.
P.S. When checking links prior to posting, I noticed "PDF for FREE"
has been replaced with the text, "pBook + PDF".
I don't know what that means and whether the PDF is still free or not
(there would have only been a matter of minutes between finding the link in the first place
and checking I had correctly included it in my post).
| [reply] [d/l] [select] |
|
Thanks, Ken, this is helpful. Point taken on Class->new, too.
The problem with this class, like the solutions above that use s/(?$flags:$pattern)/$replacement/ directly, is that it doesn't handle back-references. (That's true even with the \Q removed from the definition of $re_part in the class constructor.
| [reply] |
Re: Regexp substitution using variables
by Fletch (Bishop) on Nov 25, 2020 at 20:38 UTC
|
I'm trying to think of some application where you'd reasonably need to accommodate random substitutions with possible /g modifiers but I'm coming up blank (but probably need more caffeine to boot . . .). I started to post something mentioning string eval (which as has been pointed out isn't the answer there either) but something about the original question has a not-too-faint whiff of "XY problem" about it.
Could you step back a hair more and explain why you think you need to run substitutions with arbitrary modifier flags? It may be that you don't actually and you could really get by with one of the prior suggestions (like moving compatible flags onto the front of the pattern). Or maybe you could work with some sort of (handwaving vigorously here) plugin / module system where you write substitution classes which implement a specific role that . . . /shrug
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [d/l] [select] |
|
I understand your scepticism; this does indeed feel like one of those "How do I do X?" questions where the answer "Don't do X, do Y instead". (Is that what you meant by an "XY problem"?
My situation is basically that I need to run a config file that specifies regular-expression substitutions. Specifically, my program is generating USMARC-format bibliographic records, and a config file says things like "in the 245$a field, replace /foo/ with 'bar' globally". In fact, the config looks like this:
"245$a": [
{ "op": "regsub", "from": "foo", "to": "bar", "flags": "g" }
]
If you can think of a better way to do this, I am all ears — but bear in mind I do need the full power of regexp substitutions, e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part. | [reply] [d/l] |
|
This is interesting. Can you provide some additional examples, including more esoteric ones, and possible a little sample text? I was just wanting to look at the challenges you're facing more pragmatically. Test cases would be fantastic.
| [reply] |
|
|
|
|
"245$a": [
{ "op": "regsub", "from": "foo", "to": "bar", "flags": "g" }
]
This seems like a good starting point. See neilwatson's article
How to ask better questions using Test::More and sample data for the way forward. Once you have a few working test cases
defined, the only thing left is to define about a million more,
including generous edge and corner cases and exception cases! No
problem. :)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
"245$a": [
{ "regexp": 's/(foo|bar)/He said "$1"/' }
]
There is no way to "safely" abstract the capture-var away, it has to be compiled into the regex and this needs an eval or /ee with all connected security issues.
> but bear in mind I do need the full power of regexp substitutions,
I have the impression your JSON format is an attempt to make it language agnostic. But the "full power" means you will be stuck with Perl.
And full power means that security becomes an illusion.
DB<111> $_="abc"
DB<112> s/(.)/@{[print "what? --> $1\n"]}/g
what? --> a
what? --> b
what? --> c
DB<113>
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
Re: Regexp substitution using variables
by Bod (Vicar) on Nov 25, 2020 at 19:53 UTC
|
use strict;
my $pattern = 'test';
my $replacement = 'New';
my $flags = 'i';
my $value = 'My Test Text';
$value =~ s/(?$flags)$pattern/$replacement/;
print "$value\n";
This will print:
My New Test
| [reply] [d/l] [select] |
Re: Regexp substitution using variables
by MikeTaylor (Acolyte) on Nov 25, 2020 at 23:33 UTC
|
Here is what I am doing at the moment:
$replacement =~ s/\\/\\\\/g;
eval "\$res =~ s/$pattern/$replacement/$flags";
It's working, and crucially supports back-references — unlike the $res =~ s/(?$flags:$pattern)/$replacement/ solution.
Of course, the use of eval gives me the heebie-jeebies; but I'm not going to lose too much sleep as we already need to trust the people who write the configuration files that will contain the values used in the eval.
| [reply] [d/l] |
|
Partly in answer to choroba's challenge, here's an
approach that works with forward/backslashes, escape sequences and
capture variables in replacement strings. Whether it will answer your
needs is another question. A fixup step for forward slashes is necessary. Works under
Perl versions 5.8.9 and 5.30.3.
Win8 Strawberry 5.8.9.5 (32) Thu 11/26/2020 4:08:05
C:\@Work\Perl\monks
>perl
use strict;
use warnings;
my $pattern = '(\\\\tEs/Ti//N\x67\\\)';
my $replacement = '\\\Fr/es//h\\\\ \U$1';
my $flags = 'i';
# $got_g is true if /g modifier present in flags.
# ($flags, my $got_g) = sanitize_flags_detect_g($flags);
fixup_forward_slashes($pattern, $replacement);
my $value = 'My \Tes/ti//ng\ Text';
print "replacement '$replacement' \n";
my $eval_string = "\$value =~ s/$pattern/$replacement/$flags";
print "eval_string '$eval_string' \n";
eval $eval_string;
print "eval err '$@' \n";
print "output '$value' \n";
sub fixup_forward_slashes { s{/}'\/'g for @_; }
^Z
replacement '\\Fr\/es\/\/h\\ \U$1'
eval_string '$value =~ s/(\\tEs\/Ti\/\/N\x67\\)/\\Fr\/es\/\/h\\ \U$1/i
+'
eval err ''
output 'My \Fr/es//h\ \TES/TI//NG\ Text'
It's awkward that a \ single literal backslash in the input/output
string must be represented by a \\ double backslash in the
substitution and by \\\ triple or \\\\\ quadruple backslashes in the
single-quoted pattern/replacement strings, but that's single/double-quotish
backslash handling for ya. If the pattern/replacement strings were
taken from a file, it would be possible to just use double backslashes.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
| [reply] |
|
It's working, and crucially supports back-references
That's interesting as I cannot get this to support back-references...this is very similar to my initial attempt. So I have attempted to replicate it:
use strict;
my $pattern = '(testing)';
my $replacement = 'New \1';
my $flags = 'i';
my $value = 'My Testing Text';
$replacement =~ s/\\/\\\\/g;
eval "\$value =~ s/$pattern/$replacement/$flags";
print "$value\n";
This prints
My New \1 Text
It doesn't substitute the capture. | [reply] [d/l] [select] |
|
Win8 Strawberry 5.8.9.5 (32) Wed 11/25/2020 22:12:13
C:\@Work\Perl\monks
>perl
use strict;
use warnings;
my $pattern = '(testing)';
my $replacement = 'New \U$1';
my $flags = 'i';
my $value = 'My Testing Text';
### $replacement =~ s/\\/\\\\/g;
print "replacement '$replacement' \n";
eval "\$value =~ s/$pattern/$replacement/$flags";
print "$value\n";
^Z
replacement 'New \U$1'
My New TESTING Text
(An escaped backreference \1 is not kosher in a replacement
string anyway; it should be in $1 form.)
Update: Here's a version of the example code that better
illustrates the process of building the evaluation string:
Win8 Strawberry 5.8.9.5 (32) Wed 11/25/2020 22:45:44
C:\@Work\Perl\monks
>perl
use strict;
use warnings;
my $pattern = '(testing)';
my $replacement = 'New \U$1';
my $flags = 'i';
my $value = 'My Testing Text';
print "replacement '$replacement' \n";
my $eval_string = "\$value =~ s/$pattern/$replacement/$flags";
print "eval_string '$eval_string' \n";
eval $eval_string;
print "$value\n";
^Z
replacement 'New \U$1'
eval_string '$value =~ s/(testing)/New \U$1/i'
My New TESTING Text
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Regexp substitution using variables
by BillKSmith (Monsignor) on Nov 26, 2020 at 15:36 UTC
|
Here is a solution using eval. Some care is required in using escapes. It works with or without the OO interface.
use strict;
use warnings;
use Test::More tests => 2;
my $pattern = '\Aabc\/';
my $replacement = '123\/';
my $flags = 'i';
my $value = 'ABC/def';
my $expected = '123/def';
my $command = "\$value =~ s/$pattern/$replacement/$flags";
diag $command;
eval $command;
ok( $value eq $expected, 'use eval directly' );
$value = 'ABC/def';
my $re = new Regexp($pattern);
$value = $re->substitute( $value, $replacement, $flags );
ok( $value eq $expected, 'use eval in class' );
package Regexp;
sub new {
my ( $class, $pattern ) = @_;
my $new_object = bless \$pattern, $class;
return $new_object;
}
sub substitute {
my ( $self, $value, $replacement, $flags ) = @_;
my $pattern = $$self;
my $command = "\$value =~ s/$pattern/$replacement/$flags";
main::diag $command;
eval $command;
return $value;
}
OUTPUT:
1..2
# $value =~ s/\Aabc\//123\//i
ok 1 - use eval directly
# $value =~ s/\Aabc\//123\//i
ok 2 - use eval in class
| [reply] [d/l] [select] |