As tybalt89 wrote,
@{^CAPTURE} is what you're looking for, but don't forget named captures and
%+. From the perlvar documentation:
For example, $+{foo} is equivalent to $1 after the following match:
'foo' =~ /(?<foo>foo)/;
The next cool feature of perl for parsing that you should probably be aware of is "pos" and "\G" and the /c regex switch. As it happens, you're in luck, because David Raab just wrote a blog post fully explaining it! (just saw that in Perl Weekly email earlier today)
And if that wasn't enough, along your parsing journey you might discover it's a bit slow to iterate through a bunch of @syntax items at each point along the parse. (as in, dozens or more. less than 10 is probably fine the way you are doing it) When you come to this problem, the solution is to dynamically build a string of code that looks like this:
sub {
/\G (?:
... (?{ code1(...); }) # pattern 1, handler for pattern 1
| ... (?{ code2(...); }) # pattern 2, handler for pattern 2
| ... (?{ code3(...); }) # and so on
)/gcx;
}
You then need to eval that to ensure perl compiles it. (qr// notation is not guaranteed to compile it, and usually doesn't)
sub parse {
my $input= shift;
my $code= ... # assemble regex sub text like above
my $lexer= eval $code
or die "BUG: syntax error in generated code: $@";
local $_= $input;
&$lexer || die "Syntax error at '" . substr($_, pos, 10) . "'"
while pos < length;
}
and then you've reached about the highest performance Perl can give you for parsing! The final speedup is to let perl do the looping for you by putting
(...)++ on the regex you built (++ ensures that perl doesn't try to backtrack) but then you lose the ability to stop the loop and it runs until all input is exhausted.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.