Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^4: Question on Regex

by Athanasius (Archbishop)
on Nov 19, 2012 at 03:39 UTC ( [id://1004470]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Question on Regex
in thread Question on Regex

Which shows that sub athanasius is up to 5 times faster than sub karlgoethebier. Even easier to see when supplying a positive COUNT value to timethese:

#! perl use strict; use warnings; use Benchmark qw( :hireswallclock cmpthese timethese ); my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind +/?APath=2.21.0.0.0"; cmpthese( timethese ( 1_000_000, { 'karlgoethebier' => sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; retu +rn $1; }, 'athanasius' => sub { return (split '/', $string)[4] }, } ) );

Output:

13:09 >perl 390_SoPW.pl Benchmark: timing 1000000 iterations of athanasius, karlgoethebier... athanasius: 5.19894 wallclock secs ( 5.15 usr + 0.00 sys = 5.15 CPU) + @ 194250.19/s (n=1000000) karlgoethebier: 20.2485 wallclock secs (20.08 usr + 0.00 sys = 20.08 +CPU) @ 49808.24/s (n=1000000) Rate karlgoethebier athanasius karlgoethebier 49808/s -- -74% athanasius 194250/s 290% -- 13:16 >

Not really surprising, since regexen with quantifiers can be expensive:

Avoid regular expressions with many quantifiers.... Such patterns can result in exponentially slow backtracking behavior unless the quantified subpatterns match on their first “pass”.
The Camel Book, 4th Edition, p. 693.

So, what happens if we limit the backtracking?

#! perl use strict; use warnings; use Benchmark qw( :hireswallclock cmpthese timethese ); my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind +/?APath=2.21.0.0.0"; cmpthese( timethese ( 1_000_000, { 'karlgoethebier' => sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; retu +rn $1; }, 'karlgoethebier2' => sub { $string =~ m/.+?\/.+?\/.+?\/.+?\/(.+?)\/.+/; + return $1; }, 'athanasius' => sub { return (split '/', $string)[4] }, } ) );

Result:

13:28 >perl 390_SoPW.pl Benchmark: timing 1000000 iterations of athanasius, karlgoethebier, ka +rlgoethebier2... athanasius: 4.91799 wallclock secs ( 4.88 usr + 0.00 sys = 4.88 CPU) + @ 204792.14/s (n=1000000) karlgoethebier: 20.1721 wallclock secs (20.05 usr + 0.00 sys = 20.05 +CPU) @ 49885.26/s (n=1000000) karlgoethebier2: 2.55908 wallclock secs ( 2.53 usr + 0.00 sys = 2.53 + CPU) @ 395569.62/s (n=1000000) Rate karlgoethebier athanasius karlgoethebie +r2 karlgoethebier 49885/s -- -76% -8 +7% athanasius 204792/s 311% -- -4 +8% karlgoethebier2 395570/s 693% 93% +-- 13:29 >perl 390_SoPW.pl

The regex is now significantly faster than split-with-subscript. Interesting!

Athanasius <°(((><contra mundum

Replies are listed 'Best First'.
Re^5: Question on Regex
by karlgoethebier (Abbot) on Nov 19, 2012 at 14:15 UTC

    Thats's nice! It's not a good habit to be too greedy. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Hello, one important thing i learned in the short time i'm on PM is: there is no reason to be ashamed to ask something that i don't know or understand. So i must confess that i have a problem to interpret the output of Benchmark in the right way. I didn't use this module very much yet.

      I know when i use cmpthese, the results are sorted from slow to fast. When i say $count = -10, my code runs for 10 wall clock seconds.

      Then, in the rate column i can see the amount of iterations per second for each of the subs that i benchmarked - in our (first) case:

      karlgoethebier 105189/s athanasius 627362/s

      So i try:

      Karls-Mac-mini:Desktop karl $ perl -e 'printf ("%.0f\n", 627362/105189 +);' 6

      You wrote:...Which shows that sub athanasius is up to 5 times faster than sub karlgoethebier. Perhaps i have a mental block at the moment or i miss something really essential, sorry.

      Thanks for advice and regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        You’re quite right, I should have said “up to 6 times faster”. I was looking at the figure of 496%, and thinking “that’s almost 5,” but that figure is actually the increase in rate expressed as a percentage of the original. That is,

        (627362 - 105189) / 105189 = 4.9641 = 496%

        The problem is, that concept — the increase in rate expressed as a percentage of the original — doesn’t mean much to me. Which may be why I have such a hard time remembering it!

        i have a problem to interpret the output of Benchmark in the right way.

        Me too! :-)

        I know when i use cmpthese, the results are sorted from slow to fast.

        Exactly, that’s the main point to keep in mind.

        Athanasius <°(((><contra mundum

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1004470]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (6)
As of 2024-03-19 03:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found