Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: RFC: Regexp::AllMatches

by blokhead (Monsignor)
on Aug 07, 2007 at 02:07 UTC ( [id://630940]=note: print w/replies, xml ) Need Help??


in reply to RFC: Regexp::AllMatches

I'm interested about this being implemented as an iterator. I can imagine two ways this might be done:
  • You exhaustively generate all matches, and the iterator is just added on to have a nice interface. Often if you see an iterator interface you assume that it is only generating the return values "on demand" and not pre-computing them all.
  • The iterator is implemented by somehow jumping out of a regex during a match with lots of back-tracking. If this is the case, I wonder if re-entrancy would be a problem. Could the regex engine be used while this iterator object is "active?" Actually, implementing an iterator instead of a callback in this way seems highly non-trivial to me, so if this is the case, I'd be interested to see the implementation details.
Either way, I think it would be nice if the documentation made clear what was going on with respect to iterators.

I like how the interface provides a way to get the $1, $2, etc match variables.

blokhead

Replies are listed 'Best First'.
Re^2: RFC: Regexp::AllMatches
by lodin (Hermit) on Aug 07, 2007 at 02:37 UTC

    Often if you see an iterator interface you assume that it is only generating the return values "on demand" and not pre-computing them all.

    ... and that is also true here. Since backtracking patterns quickly generate a very large number of matches I don't dare to precompute them.

    Could the regex engine be used while this iterator object is "active?"

    The following code works, and that makes me believe there won't be any other re-entrancy issues. But I know very little about the internals of the engine, and what may blow under certain circumstances.

    use Test::More 'no_plan'; use Regexp::AllMatches; my $str = 'abc'; my $m1 = Regexp::AllMatches::->new($str => qr/.+/); is($m1->next, 'abc'); my $m2 = $m1->clone; is($m1->next, 'ab'); is_deeply([ $str =~ /./g ], [qw/ a b c /]); is($m1->next, 'a'); is($m2->next, 'ab'); __END__ ok 1 ok 2 ok 3 ok 4 ok 5 1..5

    lodin

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://630940]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2024-04-16 07:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found