Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re^4: Perl regexp matching is slow??

by educated_foo (Vicar)
on Feb 27, 2007 at 18:20 UTC ( #602361=note: print w/replies, xml ) Need Help??

in reply to Re^3: Perl regexp matching is slow??
in thread Perl regexp matching is slow??

The problem is that it's still tricky to get at the parse tree. All the parts are there -- named captures, relative backreferences, recursion -- but you still have to litter your regexp with hairy use of
(?{ local $FOO{blah} = $^N })
to get the data back.

It should be possible to whip something up with re::engine and Regexp::Parser, but a brief attempt showed me that this is somewhat tricky to get right except in simple cases. It would be much more convenient to have this support built into the regex engine. Perhaps for 5.12...

Replies are listed 'Best First'.
Re^5: Perl regexp matching is slow??
by demerphq (Chancellor) on Feb 27, 2007 at 22:07 UTC

    abigail mentioned something similar, but i havent gotten my head around what is really needed. If i had a better idea id be able to given a better estimate of when it could be done by. For instance there is a considerable amount of data on the stack after a match that might be useful for this purpose, i just dont really have a good grip on the problem to make any useful suggestions.


      The inspiration comes from Parse::RecDescent's "autotree" directive, which basically constructs a parse tree by collecting submatches into a hash (keyed by submatch name) and blessing that into a package named according to the parent rule, e.g. A ::= B C creates an object of type A with two fields B and C.

      I think a much less blessed scheme would be appropriate for the regex engine, in which numbered submatches are collected into match variables @/ and %/. For @/, each capturing group generates a capture:

      • If the capturing group is quantified, it is an arrayref of captures, one for each repetition.
      • Otherwise, if the capturing group contains no submatches, the capture is a string containing the captured text.
      • Otherwise, $/[0] contains the captured text, and @/[1..N] contain the captures for nested groups $1 ... $N, where the indices are numbered starting from the first nested submatch.
      Similarly, for %/, each capturing group yields a hash capture:
      • $/{0} is the entire text.
      • If (?'NAME'...) has no subcaptures, $/{NAME} is the captured text.
      • Otherwise if (?'NAME'...) is quantified, $/{NAME} is an arrayref of hash captures, one entry for each repetition.
      • Otherwise, $/{NAME} is a hash capture.
      • (maybe) $/{1} through $/{N} are like $/{NAME}, but for all captures, not just named ones.
      For example, I believe the above rules should yield:
      $sexp = qr/ (?<sexp> \s* \( \s* (?&sexp)* \s* \) \s* | \s* (?<atom>\w+) \s* )/x; '(A (B C))' =~ /$sexp/; ## AFTERWARDS %/ = (sexp => { 0 => '(A (B C))', sexp => [{ 0 => 'A', atom => 'A' }, { 0 => '(B C)', sexp => [{ 0 => 'B', atom => 'B' }, { 0 => 'C', atom => 'C' }] }]}; @/ = ('(A (B C))', [['A', 'A'], ['(B C)', [['B', 'B'], ['C', 'C']]]]);

      There are probably some obvious oversights here, but I'll try to get the Regexp::Parser version of this working to shake the bugs out.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://602361]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2020-10-20 06:25 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (209 votes). Check out past polls.