Regexp::Common balanced curlies substitution

iaw4 has asked for the wisdom of the Perl Monks concerning the following question:

Thanks everyone for all the help getting me to this (hopefully) step. perl monks are truly wonderful. now, I know how to solve most of my problem. There is only one remaining problem: I want my substitution of balanced regexs not to contain the parens themselves. Of course, as always, this is easier to show than to describe.

/usr/bin/perl -w
use strict;
use warnings;
use re 'eval';
use Regexp::Common;

my $text= "start \\chapter{argument1}{argument2} end";
print "INPUT: $text\n";

################################################################
my $arg1= qr/\s*($RE{balanced}{-parens=>'{ }'})/;
my $args2= $arg1.$arg1;

my $macronamereadfromfile="chapter";
my $patternreadfromfile= "<h1 tag=\"\$2\">\$1</h1>";

my $inpattern = "\\\\$macronamereadfromfile$args2";

my $outpattern = qq{ qq{$patternreadfromfile} };

#print "PATTERNS: '$inpattern'.  to='$patternreadfromfile'.  became='$
+outpattern'\n";

$text =~ s/$inpattern/$outpattern/gee;

################################################################

print "OUTPUT: '$text'\n\n";
[download]

I really want

OUTPUT: 'start <h1 tag="argument1">argument1</h1> end'
[download]

and not

OUTPUT: 'start <h1 tag="{argument1}">{argument1}</h1> end'
[download]

I could replace my pattern with a really unusual surrounding string in step 1 (e.g., @@$1@@) and then remove all '@@{' and '}@@' but this seems very ugly. there must be a simpler way. advice appreciated. regards, /iaw

Comment on Regexp::Common balanced curlies substitution Select or Download Code

Replies are listed 'Best First'.
Re: Regexp::Common balanced curlies substitution by kyle (Abbot) on Aug 16, 2008 at 03:09 UTC
I can't say I recommend this, but a sort of minimal change would be: `my $patternreadfromfile = q{<h1 tag="@{[substr "$2",1,-1]}">@{[substr "$1",1,-1]}</h1>};` [download] It is probably more comprehensible to do something like this: `my $patternreadfromfile= q{do_what_i_want( $1, $2 )}; sub do_what_i_want { my ($one, $two) = @_; for ( $one, $two ) { s/\A.(.).\z/$1/; } return qq{<h1 tag="$one">$two</h1>}; } my $outpattern = $patternreadfromfile;` [download] Of course, you'll want to replace my silly names with your own silly names. If you don't want your namespace polluted with this sub, you could make it an anonymous sub in a lexical that will go away when you're done with it. In that case, it looks like this: `my $doit = sub { my ($one, $two) = @_; for ( $one, $two ) { s/\A.(.).\z/$1/; } return qq{<h1 tag="$one">$two</h1>}; }; my $patternreadfromfile= q{$doit->( $1, $2 )};` [download]	[reply] [d/l] [select]
Re: Regexp::Common balanced curlies substitution by JavaFan (Canon) on Aug 16, 2008 at 09:52 UTC
It's easy if you do it two lines: `$str =~ /$RE{balanced}{-parens=>'{}'}/ and substr $str. $-[0] + 1, $+[0] - $-[0] - 2, 'REPLACEMENT'` [download]	[reply] [d/l]
Re: Regexp::Common balanced curlies substitution by massa (Hermit) on Aug 16, 2008 at 02:59 UTC
`my $arg1 = qr/\s*{($RE{balanced}{-parens=>'{ }'})}/;` []s, HTH, Massa (κς,πμ,πλ)	[reply] [d/l]
Re^2: Regexp::Common balanced curlies substitution by kyle (Abbot) on Aug 16, 2008 at 03:12 UTC
That doesn't work when I try it. With the extra braces there, the pattern no longer matches at all, and the `s///` replacement just doesn't do anything. The `$RE{balanced}{-parens=>'{ }'}` pattern already includes braces, so your suggestion would only match a `{{double braced}}` string, which isn't what the OP has.	[reply] [d/l] [select]
Re: Regexp::Common balanced curlies substitution by hexcoder (Curate) on Aug 16, 2008 at 14:18 UTC
I tried to minimally change your prog, but I think it is also ugly yet. Unfortunately Regexp::Common::balanced always quotemeta its begin and end settings, so you cannot change behaviour right there. `... while ($text =~ m/$inpattern/xmsg) { my $first = substr $1, 1, -1; my $second = substr $2, 1, -1; $outpattern =~ s/\$1/$first/xmsg; $outpattern =~ s/\$2/$second/xmsg; } $text =~ s/$inpattern/$outpattern/gee; ...` [download]	[reply] [d/l]
Re: Regexp::Common balanced curlies substitution by AnomalousMonk (Archbishop) on Aug 16, 2008 at 19:26 UTC
I would tend to approach the problem differently, factoring out regex elements instead of trying to sink them beneath multiple levels of interpolation, and using explicit functions to operate on extracted substrings. Although there's more code on a character-count basis, this likely makes maintenance easier six months (or two weeks) from now when someone -- maybe even you! -- tries to make sense of it all. Of course, there is still the problematic `eval qq{ qq{$replacement_pattern} }` expression to reckon with, but you can't suck all the fun out of life. Note that the placeholder strings in the replacement pattern specifier taken from the file are changed to `$tag` and `$body` from the, to my mind, more fragile `$2` and `$1`. Getting rid of `$inpattern` and inlining its pattern elements in the substitution regex is, to me, desirable because it puts regex captures 'closer' to the point at which they are used. (The example code works the same either way.) BTW: This would probably be a nice place to use 5.10 named captures. use warnings; use strict; use Regexp::Common; my $text = q{start \chapter{argument1}{argument2} end}; print "INPUT: '$text' \n"; # macro name and replacement pattern taken from a file, sans newlines. chomp(my $macro_name = <DATA>); chomp(my $replacement_pattern = <DATA>); my $parens = '{}'; my $arg = qr{ $RE{balanced}{-parens => $parens} }xms; my $intro = qr{ \\ }xms; my $inpattern = qr{ $intro $macro_name \s* ($arg) \s* ($arg) }xms; # print "inpattern $inpattern \n"; # FOR DEBUG $text =~ # s{ $inpattern } s{ $intro $macro_name \s* ($arg) \s* ($arg) } { my ($body, $tag) = ($1, $2); # nail down captures soonest validate_paired($parens, $body, $tag); trim_paired ($parens, $body, $tag); eval qq{ qq{$replacement_pattern} } or die "replacement failed: '$replacement_pattern'"; }xmsge; print "OUTPUT: '$text' \n"; print "\n"; sub trim_paired { my $pair = shift; # opening/closing pair of chars to trim my ($opener, $closer) = $pair =~ m{ \S }xmsg; die "pair of opening and closing chars not supplied: $pair" unless defined $opener and defined $closer; s{ \A $opener \| $closer \z }''xmsg for @_; } sub validate_paired { my $pair = shift; # opening/closing pair of bracing chars my ($opener, $closer) = $pair =~ m{ \S }xmsg; die "pair of opening and closing chars not supplied" unless defined $opener and defined $closer; m{ \A $opener .* $closer \z }xms or die "not braced: '$_'" for @_; } __DATA__ chapter <h1 tag="$tag">$body</h1> [download] Output (both with and without `$inpattern`): `INPUT: 'start \chapter{argument1}{argument2} end' OUTPUT: 'start <h1 tag="argument2">argument1</h1> end'` [download]	[reply] [d/l] [select]


Just another Perl shrine
	PerlMonks