Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: tight loop regex optimization

by ikegami (Pope)
on Nov 01, 2011 at 05:34 UTC ( #935031=note: print w/ replies, xml ) Need Help??


in reply to tight loop regex optimization

You're only going to some limited benefit by tweaking the regex patterns. Shaving 20% off of 60 seconds is still 48 seconds.

One thing I see is that you're scanning the file from to to bottom for each s/// operator, and often more than once per pattern. Your efforts might be better spent avoid that. Some examples are self contained,

while ($contents =~ s/\05([^\n\05]+)\05/$1\05\05/gs) {} | | v $contents =~ s/\05([^\n\05]+)(?=\05)/$1\05/gs;

But that's not going to help you remove this waste in general.


Comment on Re: tight loop regex optimization
Download Code
Re^2: tight loop regex optimization
by superawesome (Initiate) on Nov 02, 2011 at 05:08 UTC

    Hmm... going to have to get help here. I'm not familiar enough with this code or its expected output to really tell when it's working or when I might introduce a subtle bug somewhere. This seems more like the latter scenario. Thanks for pointing this out!

    On this particular example, I'm not understanding how the two things are identical. The substitution pattern in your version has one less \05. Is that intentional? If so, how does that work? I don't really grok the look-around assertion, and/or how that's relevant to not needing the extra \05 in the substitution pattern.

    For that matter, I'm not sure what \05 even is. Most places seem to say that an octal character code requires exactly 3 digits, but some say you can get away with less, as long as the leading digit is zero. But, \05 interpreted as octal is non-printing... don't know what it is, or why it would be in these files. Any thoughts on that?

    Thanks!

      I don't really grok the look-around assertion

      Think of it as a subroutine call. The engine tries to match the sub pattern at the current location, but it current location doesn't change.

      +------------ Matched at pos 0. | +-------- Matched at pos 1. | |+------- Matched at pos 2. | || +----- Matched at pos 1. | || |+---- Matched at pos 2. | || ||+--- Matched at pos 3. | || ||| v vv vvv 'abcd' =~ /a(?!BC)..d/ # Matches

      Since the position on the outer regex isn't affected, replacements don't take what the sub expression matched into consideration.

      my $s = 'abcd'; $s =~ s/a(?=bc)/x/; # xbcd my $s = 'aefg'; $s =~ s/a(?=bc)/x/; # abcd

      The substitution pattern in your version has one less \05. Is that intentional?

      Yes. One less \05 is being replaced, so one less \05 should be added.

      I'm not familiar enough with this code or its expected output to really tell when it's working or when I might introduce a subtle bug somewhere

      Then I guess the next order of business is to figure out the code and write test cases.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935031]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2014-10-02 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (49 votes), past polls