Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: substitute with recursive regexps?

by Anonymous Monk
on Sep 03, 2011 at 14:22 UTC ( #923993=note: print w/replies, xml ) Need Help??

in reply to substitute with recursive regexps?

Perl regexes currently don't provide a reasonable way to extract or modify a complete parse tree from a recursive regex match. In other words, because the independent subpatterns in your (?(DEFINE)...) block don't keep the substrings they capture, it's extremely difficult to extract that hierarchical information and convert it to something else using substitutions involving vanilla recursive regexes. Basically, all you can do is match the regex and extract the entire match as a single string...or at best, extract just the top-level components.

If you want to do something more sophisticated than that using regexes, you need the extra features provided by the Regexp::Grammars module. Have a look at the distribution's demo/ subdirectory for some examples of parsing and then manipulating arithmetic expressions (in particular: the various demo_calc*.pl files).

And, yes, it's unfortunate that Perl now has the equivalent of full grammar-based matching, without any facility for extracting useful information from such matches. I know that Damian Conway and Nicholas Clark were discussing that very limitation (and what might possibly be done about it in a future version of Perl) at YAPC::EU a few weeks ago, so maybe Perl will eventually get a native implementation of that missing feature as well.

But in the meantime, have a look at Regexp::Grammars (or Marpa or Parse::Yapp). Perl 5 regexes aren't (yet) full grammars and that seems to be what you want here. Any of those modules can give you that, and Regexp::Grammars can give it to you in (decidedly non-vanilla) regexes.

Replies are listed 'Best First'.
Re^2: substitute with recursive regexps?
by hexcoder (Chaplain) on Sep 04, 2011 at 19:19 UTC
    Thanks for the clarification and the pointers! I will have a look. Regarding building a parse tree with a recursive regexp, I found an inspiring example at ppp from demerphq (which seems a tiny bit hackish though). It uses a different technique with assertions that evaluate Perl code:
    use strict; use warnings; use Data::Dumper; $^R=undef; my $re = qr/ (?{[$^R]}) # initialize a new node in the par +setree # preserving the old as the 0th +element \( # match a paren (?> # atomic match ([^()]+) # capture (?{ push @{$^R},$1; # push the capture string into the + current node $^R # propagate the current node }) | (?0) # recurse (?{ push @{$^R->[0]},$^R; # push the current node into the o +ld node shift @{$^R} # and restore the old node to be t +he current # (while removing it from the new) }) )* \) /x; "(a(b)(c(d)e)(f)g)" =~ m/$re/x and print "matched\n"; print Dumper $^R;
    which gives (with perl 5.10.0):
    matched $VAR1 = [ undef, 'a', [ 'b' ], [ 'c', [ 'd' ], 'e' ], [ 'f' ], 'g' ];
    But my current goal is not parsing, it is optimization of stringified output from parsed productions. It would have been cool to do optimizing transforms with recursive pattern substitutions. Then optimizations could be applied on the most nested level first, and propagate to higher levels.
      Yes, Regexp::Grammars basically just automates the same technique as demerphq's example uses: injection of a distributed stack-based tree-constructor via embedded code blocks.

      But you really want that automation if the regex gets much more complex than the above example. Apart from the obvious maintainabilty issues of sprinkling dozens of (?{...}) blocks through your regex, there's a much more fundamental problem: if the regex can ever backtrack a repetition, that will almost certainly screw up the synchronization between the stack and the parser state, unless you take special precautions to unwind the stack correctly (which is what the module does for you).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://923993]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2018-06-22 19:50 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.