Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Parser Performance Question

by kcott (Archbishop)
on Oct 05, 2017 at 00:48 UTC ( [id://1200706]=note: print w/replies, xml ) Need Help??


in reply to Parser Performance Question

G'day songmaster,

As LanX alluded to in his update, I suspect the /o modifier may be the issue: it certainly stood out as I read through the code fragments you provided.

In "perlre: Modifiers" you'll see:

"o  - pretend to optimize your code, but actually introduce bugs"

That provides a link to further information in "perlop: Regexp Quote-Like Operators" but the fragment identifier (#s%2fPATTERN%2fREPLACEMENT%2fmsixpodualngcer) is wrong. The closest to that is probably #s/_PATTERN_/_REPLACEMENT_/msixpodualngcer; however, the one with the most information about /o, and probably more appropriate given the code you've shown, is #m/_PATTERN_/msixpodualngc, which culminates in:

"The bottom line is that using /o is almost never a good idea."

I probably would have created all of those regexes at compile time, and I would have used my instead of our variables. A dispatch table with actions based on matches may also be appropriate.

You don't show sufficient code to make any direct modification recommendations. The following script simply suggests a technique you could adapt to your needs.

#!/usr/bin/env perl use strict; use warnings; my %capture; BEGIN { my $RXone = qr{(?x: 1 )}; my $RXthree = qr{(?x: 3 )}; my $RXnum = qr{(?x: $RXone | $RXthree )}; my $RXstr = qr{(?x: ( [a-z]+ $RXnum ) )}; %capture = ( menu => { regexp => qr{(?x: ^ menu \s+ $RXstr $ )}, action => sub { parse_menu(@_) }, }, driver => { regexp => qr{(?x: ^ driver \s+ $RXstr $ )}, action => sub { parse_driver(@_) }, }, ); } my @capture_keys = keys %capture; while (<DATA>) { for my $capture_key (@capture_keys) { if (/$capture{$capture_key}{regexp}/) { $capture{$capture_key}{action}->($1); last; } } } sub parse_menu { print "MENU: @_\n" } sub parse_driver { print "DRIVER: @_\n" } __DATA__ menu menu1 driver driver1 other other1 menu menu2 driver driver2 other other2 menu menu3 driver driver3 other other3

You may have sufficient, up-front knowledge about those "capture keys" to predefine an ordered @capture_keys rather than relying on the random list returned by keys.

Output from a sample run of that script:

MENU: menu1 DRIVER: driver1 MENU: menu3 DRIVER: driver3

Update (minor code alteration): My original code had $capure_key (missing "t") throughout. I've changed that to $capture_key globally; retested; output unchanged.

— Ken

Replies are listed 'Best First'.
Re^2: Parser Performance Question
by Anonymous Monk on Oct 05, 2017 at 15:05 UTC
    It turns out that qr// is actually the worst way to build regexes, but adding /o mostly fixes the problem.
    use Benchmark qw( cmpthese ); print $], "\n"; open my $W, '<', '/usr/share/dict/words' or die; my @words = <$W>; close $W; my $s = '(?<![cC])[eE][iI]'; my $re = qr/(?<![cC])[eE][iI]/; cmpthese(-5, { re => sub { grep /(?<![cC])[eE][iI]/, @words }, qr => sub { grep /$re/, @words }, qro => sub { grep /$re/o, @words }, s => sub { grep /$s/, @words }, so => sub { grep /$s/o, @words }, }); __END__ 5.026001 Rate qr s qro so re qr 7.14/s -- -57% -67% -68% -69% s 16.8/s 135% -- -22% -26% -26% qro 21.4/s 200% 28% -- -5% -6% so 22.5/s 215% 34% 5% -- -1% re 22.7/s 218% 36% 6% 1% --
      Update: sorry please ignore, misread benchmark


      Well maybe you used the worst way to ask the question.

      qr is meant to precompile, so why should it be used in a loop?

      The OP is building a parser, his grammar doesn't change in the fly.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1200706]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-26 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found