Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Regex from XS

by creamygoodness (Curate)
on Apr 09, 2006 at 18:44 UTC ( #542164=perlquestion: print w/replies, xml ) Need Help??
creamygoodness has asked for the wisdom of the Perl Monks concerning the following question:

I know this isn't legal perlapi, but I want to run a regex from XS.

This code may be a bottleneck and I'd like to see how much of one it is.

# accumulate token start_offsets and end_offsets my ( @starts, @ends ); 1 while ( m/$separator_re/g and push @starts, pos and m/$token_re/g and push @ends, pos );

I know that doesn't look like it'll gain much from going to XS, but this app has to deal with gobs and gobs of tokens.

What's the hack?

Marvin Humphrey
Rectangular Research ―

Replies are listed 'Best First'.
Re: Regex from XS
by hv (Parson) on Apr 10, 2006 at 00:37 UTC

    I haven't tried to do this, but I'd start by unravelling the code in pp_hot.c:pp_match(), and see if you can extract from that the infrastructure you need.

    The guts is incorporated in the CALLREGEXEC() macro, which (via a pluggable hook, used by the re module) makes its way to regexec.c:Perl_regexec_flags() where the actual match happens - the regexp compile already happens before this - and setting things up correctly before that call is the critical bit.

    It would certainly help if you can find prior art - if you have a greppable CPAN mirror handy, a search for CALLREGEXEC and/or regexec_flags should tell you if there is any to be found there.

    Time permitting, I'd be happy to help debug any problems you run into, feel free to give me a shout.



      Thanks for the kind offer. I understand a bit more about the Perl executable now, and I understand why, in the only previous info I could find on the subject (a Nick Ing-Simmons p5p post), it's stated that you can't do this without "faking an op".

      There doesn't appear to be prior art for this. While I don't happen to have a greppable mirror, Google indexes .pm, .xs, .c, and .h files, and a Google search for those terms limited to doesn't turn up anything useful.

      I banged my head against pp_hot.c for a while, but there's just too many things I don't understand yet. It will be a happy day when I figure out what a "shrieking SV" is. :) An awful lot happens in pp_match, so the right route is presumably "fake an op" and call pp_match directly, rather than duplicate its innards.

      However, manipulation of the op tree is beyond me right now. It doesn't look like this is as simple as calling a couple macros that aren't listed in perlapi, and while I'm very interested in continuing to study the subject, I haven't got time right now to dig this deep in one go. Thanks a bunch for helping me to frame the problem.

      Marvin Humphrey
      Rectangular Research ―

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://542164]
Approved by kvale
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2019-02-19 03:34 GMT
Find Nodes?
    Voting Booth?
    I use postfix dereferencing ...

    Results (101 votes). Check out past polls.