Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Regex from XS

by creamygoodness (Curate)
on Apr 09, 2006 at 18:44 UTC ( #542164=perlquestion: print w/replies, xml ) Need Help??
creamygoodness has asked for the wisdom of the Perl Monks concerning the following question:

I know this isn't legal perlapi, but I want to run a regex from XS.

This code may be a bottleneck and I'd like to see how much of one it is.

# accumulate token start_offsets and end_offsets my ( @starts, @ends ); 1 while ( m/$separator_re/g and push @starts, pos and m/$token_re/g and push @ends, pos );

I know that doesn't look like it'll gain much from going to XS, but this app has to deal with gobs and gobs of tokens.

What's the hack?

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Replies are listed 'Best First'.
Re: Regex from XS
by hv (Parson) on Apr 10, 2006 at 00:37 UTC

    I haven't tried to do this, but I'd start by unravelling the code in pp_hot.c:pp_match(), and see if you can extract from that the infrastructure you need.

    The guts is incorporated in the CALLREGEXEC() macro, which (via a pluggable hook, used by the re module) makes its way to regexec.c:Perl_regexec_flags() where the actual match happens - the regexp compile already happens before this - and setting things up correctly before that call is the critical bit.

    It would certainly help if you can find prior art - if you have a greppable CPAN mirror handy, a search for CALLREGEXEC and/or regexec_flags should tell you if there is any to be found there.

    Time permitting, I'd be happy to help debug any problems you run into, feel free to give me a shout.

    Hugo

      Hugo,

      Thanks for the kind offer. I understand a bit more about the Perl executable now, and I understand why, in the only previous info I could find on the subject (a Nick Ing-Simmons p5p post), it's stated that you can't do this without "faking an op".

      There doesn't appear to be prior art for this. While I don't happen to have a greppable mirror, Google indexes .pm, .xs, .c, and .h files, and a Google search for those terms limited to search.cpan.org doesn't turn up anything useful.

      I banged my head against pp_hot.c for a while, but there's just too many things I don't understand yet. It will be a happy day when I figure out what a "shrieking SV" is. :) An awful lot happens in pp_match, so the right route is presumably "fake an op" and call pp_match directly, rather than duplicate its innards.

      However, manipulation of the op tree is beyond me right now. It doesn't look like this is as simple as calling a couple macros that aren't listed in perlapi, and while I'm very interested in continuing to study the subject, I haven't got time right now to dig this deep in one go. Thanks a bunch for helping me to frame the problem.

      --
      Marvin Humphrey
      Rectangular Research ― http://www.rectangular.com

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://542164]
Approved by kvale
help
Chatterbox?
[LanX]: DBI: is there an easy method to get the content of a column as an array

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2018-07-16 16:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (344 votes). Check out past polls.

    Notices?