Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Help with regex - find captured pattern twice

by pachydermic (Beadle)
on Apr 09, 2013 at 21:24 UTC ( #1027848=perlquestion: print w/replies, xml ) Need Help??

pachydermic has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, my problem involves rewriting equations with regular expressions. I have a list of special "endogenous" variables which need to be changed from "x" to "xSS*exp(x)" everywhere they occur in an equation. I'm working with time series, so "x(-1)" or "x(+1)" needs to be changed to "xSS*exp(x(-1))" or "xSS*exp(x(+1))", as well.

For example, let's suppose that my endogenous variables are:

y, c, k, a, and h.

I need to take this:

y = a*(k(-1)ˆalpha)*(hˆ(1-alpha));

and turn it into this:

(ySS*exp(y)) = (aSS*exp(a))*((kSS*exp(k(-1)))ˆalpha)*((hSS*exp(h))ˆ(1-alpha));

Now I've already gotten started on this. Assume I already have a list of variables and equations to work with (contained in a single string which also has new lines). My code is:

my $model = "c*theta*hˆ(1+psi)=(1-alpha)*y;\n1 = beta*(c/c(+1)*alpha*y +(+1)/k+(1-delta));\ny = a*(k(-1)ˆalpha)*(hˆ(1-alpha));\nk = y-c+(1-de +lta)*k(-1);\nln(a) = rho*ln(a(-1))+ e;\n"; my @varlist = ('y','c','k','a','h'); my ($replacement,$search_string); foreach my $var (@varlist){ $search_string = quotemeta($var); $replacement = "$steady_states{$var}*exp(${var})"; $model =~ s/(\W)$search_string(\W)/$1($replacement)$2/g; $model =~ s/^$search_string(\W)/($replacement)$1/g; $model =~ s/exp\($search_string(\))+(\([\+\-]1\))/exp($search_st +ring$2$1)/g; } print $model;

And that's mostly okay except for the equation where you find "c/c(+1)" because the "/" character is already captured when the code finds "c/" - therefore "/c(+1)" isn't replaced. You can't just ignore the first (\W) because then parameters like "theta" would be problematic.

I know my regexs aren't very good anyways, so I'd like some advice about how to handle this problem.

Thank you very much for your time - hopefully I've explained everything clearly.

Replies are listed 'Best First'.
Re: Help with regex - find captured pattern twice
by roboticus (Chancellor) on Apr 09, 2013 at 23:22 UTC


    If you'll be doing a lot of this type of work, you'll probably do well to write an equation parser to parse the expression. Then you can do a replacement of the variables, and then write the results. (For parsing, see Parse::RecDescent or Marpa::R2. The example code for both includes the start of a basic expression parser.)

    Having said that, you can definitely hack something together. I wouldn't try it the way you're going about it because dealing with parenthesis is rather difficult (at least to me) in regular expressions. Instead, I'd try something to consume your input expression chunk by chunk and build a new expression. Something like this:

    #!/usr/bin/perl use strict; use warnings; my %replace = ( x=>"xSS*exp", y=>"ySS*exp", c=>"cSS*exp", k=>"kSS*exp", a=>"aSS*exp", h=>"hSS*exp", ); while (<DATA>) { s/\s+$//; print $_, " --> ", plonk($_), "\n"; } sub plonk { my $exp = shift; my $rv = ""; my @stack; while (length($exp)) { my ($var, $rest); if ($exp =~ /^(\w+)(.*)$/) { ($var, $rest) = ($1, $2); $var = $replace{$var} if exists $replace{$var}; $rv .= $var; } else { ($var, $rest) = ( substr($exp,0,1), substr($exp,1) ); if ($var eq '(') { push @stack, $rv; $rv = ""; } elsif ($var eq ')') { $rv = (pop @stack) . "($rv)"; } else { $rv .= $var; } } $exp = $rest; } return $rv; } __DATA__ y = a*(k(-1)^alpha)*(h^(1-alpha)) c*theta*hˆ(1+psi)=(1-alpha)*y 1 = beta*(c/c(+1)*alpha*y(+1)/k+(1-delta)) y = a*(k(-1)ˆalpha)*(hˆ(1-alpha)) k = y-c+(1-delta)*k(-1) ln(a) = rho*ln(a(-1))+ e

    This gives:

    $ perl y = a*(k(-1)^alpha)*(h^(1-alpha)) --> ySS*exp = aSS*exp*(kSS*exp(-1)^a +lpha)*(hSS*exp^(1-alpha)) c*theta*hˆ(1+psi)=(1-alpha)*y --> cSS*exp*theta*hSS*expˆ(1+psi)=(1-alp +ha)*ySS*exp 1 = beta*(c/c(+1)*alpha*y(+1)/k+(1-delta)) --> 1 = beta*(cSS*exp/cSS*e +xp(+1)*alpha*ySS*exp(+1)/kSS*exp+(1-delta)) y = a*(k(-1)ˆalpha)*(hˆ(1-alpha)) --> ySS*exp = aSS*exp*(kSS*exp(-1)ˆa +lpha)*(hSS*expˆ(1-alpha)) k = y-c+(1-delta)*k(-1) --> kSS*exp = ySS*exp-cSS*exp+(1-delta)*kSS*ex +p(-1) ln(a) = rho*ln(a(-1))+ e --> ln(aSS*exp) = rho*ln(aSS*exp(-1))+ e

    This has some obvious problems: First, it generates gibberish if the input isn't formed correctly. Second, bare variables have the same replacement as function calls. If you need to handle that, you'll have to add some special case handling for it. There may also be some other special cases you would need to handle. But it serves well enough as a demonstration.


    When your only tool is a hammer, all problems look like your thumb.

Re: Help with regex - find captured pattern twice
by hbm (Hermit) on Apr 09, 2013 at 23:37 UTC

    A bit broader than you specified - this replaces any single letter and optional parenthetical value.

    $_= "c*theta*h^(1+psi)=(1-alpha)*y;\n1 = beta*(c/c(+1)*alpha*y(+1)/k+( +1-delta));\ny = a*(k(-1)^alpha)*(h^(1-alpha));\nk = y-c+(1-delta)*k(- +1);\nln(a) = rho*ln(a(-1))+ e;\n"; s/(?<!\pL)(\pL)(\([\d+-]+\))?(?!\pL)/(${1}SS*exp($1$2))/g; print; #prints (cSS*exp(c))*theta*(hSS*exp(h))^(1+psi)=(1-alpha)*(ySS*exp(y)); 1 = beta*((cSS*exp(c))/(cSS*exp(c(+1)))*alpha*(ySS*exp(y(+1)))/(kSS*ex +p(k))+(1-delta)); (ySS*exp(y)) = (aSS*exp(a))*((kSS*exp(k(-1)))^alpha)*((hSS*exp(h))^(1- +alpha)); (kSS*exp(k)) = (ySS*exp(y))-(cSS*exp(c))+(1-delta)*(kSS*exp(k(-1))); ln((aSS*exp(a))) = rho*ln((aSS*exp(a(-1))))+ (eSS*exp(e));

      Wow. What the heck is that monstrosity?! Well done.

      Would you mind explaining that regex? I think that \pL references a unicode character class (from this page: but I'm still kind of baffled by what's going on...

        You might read it as this:

        s/ (?<! # 2. NOT preceded by \pL # a letter ) ( # 1. Capture \pL # a letter ) ( # 3. Capture \( # open paren; [\d+-]+ # one or more digits, plus, minus; \) # close paren )? # (Zero or one time) (?! # 4. NOT followed by \pL # a letter ) /(${1}SS*exp($1$2))/gx;
Re: Help with regex - find captured pattern twice
by hdb (Monsignor) on Apr 10, 2013 at 06:55 UTC

    For your problem with "c/c(+1)" there is are Look-Around Assertions, check perlre. For example, (?=pattern) matches the pattern but does not include it in the match.

    Also, in your code snippet, can you please include the missing $steady_states{$var}, so that the code becomes executable.

      That might not be the best solution, since my design is kind of flawed, but it is most certainly the easiest. Thank you. I learned yet another thing about regex today! Of course I will probably forget it by the next time I need it... Sorry about not including the hash - I cut this from a much larger program.
Re: Help with regex - find captured pattern twice
by Anonymous Monk on Apr 10, 2013 at 07:54 UTC

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1027848]
Approved by igelkott
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2021-10-21 22:23 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (85 votes). Check out past polls.