<?xml version="1.0" encoding="windows-1252"?>
<node id="1004470" title="Re^4: Question on Regex" created="2012-11-18 22:39:25" updated="2012-11-18 22:39:25">
<type id="11">
note</type>
<author id="968231">
Athanasius</author>
<data>
<field name="doctext">
&lt;p&gt;Which shows that &lt;tt&gt;sub athanasius&lt;/tt&gt; is up to 5 times &lt;i&gt;faster&lt;/i&gt; than &lt;tt&gt;sub karlgoethebier&lt;/tt&gt;. Even easier to see when supplying a positive &lt;tt&gt;COUNT&lt;/tt&gt; value to &lt;tt&gt;timethese&lt;/tt&gt;:&lt;/p&gt;

&lt;code&gt;
#! perl
use strict;
use warnings;
use Benchmark qw( :hireswallclock cmpthese timethese );

my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind/?APath=2.21.0.0.0";

cmpthese(
            timethese
            (
                1_000_000,
                {
                    'karlgoethebier' =&gt;
                    sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; return $1; },

                    'athanasius'     =&gt;
                    sub { return (split '/', $string)[4] },
                }
            )
        );
&lt;/code&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;code&gt;
13:09 &gt;perl 390_SoPW.pl
Benchmark: timing 1000000 iterations of athanasius, karlgoethebier...
athanasius: 5.19894 wallclock secs ( 5.15 usr +  0.00 sys =  5.15 CPU) @ 194250.19/s (n=1000000)
karlgoethebier: 20.2485 wallclock secs (20.08 usr +  0.00 sys = 20.08 CPU) @ 49808.24/s (n=1000000)
                   Rate karlgoethebier     athanasius
karlgoethebier  49808/s             --           -74%
athanasius     194250/s           290%             --

13:16 &gt;
&lt;/code&gt;

&lt;p&gt;Not really surprising, since regexen with quantifiers can be expensive:&lt;/p&gt;

&lt;blockquote&gt;
Avoid regular expressions with many quantifiers.... Such patterns can result in exponentially slow backtracking behavior unless the quantified subpatterns match on their first &amp;ldquo;pass&amp;rdquo;.
&lt;br /&gt;
&amp;mdash; &lt;i&gt;The Camel Book&lt;/i&gt;, 4&lt;sup&gt;th&lt;/sup&gt; Edition, p. 693.
&lt;/blockquote&gt;

&lt;p&gt;So, what happens if we limit the backtracking?&lt;/p&gt;

&lt;code&gt;
#! perl
use strict;
use warnings;
use Benchmark qw( :hireswallclock cmpthese timethese );

my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind/?APath=2.21.0.0.0";

cmpthese(
            timethese
            (
                1_000_000,
                {
                    'karlgoethebier'  =&gt;
                    sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; return $1; },

                    'karlgoethebier2' =&gt;
                    sub { $string =~ m/.+?\/.+?\/.+?\/.+?\/(.+?)\/.+/; return $1; },

                    'athanasius'      =&gt;
                    sub { return (split '/', $string)[4] },
                }
            )
        );
&lt;/code&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;code&gt;
13:28 &gt;perl 390_SoPW.pl
Benchmark: timing 1000000 iterations of athanasius, karlgoethebier, karlgoethebier2...
athanasius: 4.91799 wallclock secs ( 4.88 usr +  0.00 sys =  4.88 CPU) @ 204792.14/s (n=1000000)
karlgoethebier: 20.1721 wallclock secs (20.05 usr +  0.00 sys = 20.05 CPU) @ 49885.26/s (n=1000000)
karlgoethebier2: 2.55908 wallclock secs ( 2.53 usr +  0.00 sys =  2.53 CPU) @ 395569.62/s (n=1000000)
                    Rate  karlgoethebier      athanasius karlgoethebier2
karlgoethebier   49885/s              --            -76%            -87%
athanasius      204792/s            311%              --            -48%
karlgoethebier2 395570/s            693%             93%              --

13:29 &gt;perl 390_SoPW.pl
&lt;/code&gt;

&lt;p&gt;The regex is now significantly faster than &lt;tt&gt;split&lt;/tt&gt;-with-subscript. Interesting!&lt;/p&gt;

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-968231"&gt;
&lt;p&gt;Athanasius&amp;emsp;&lt;font color=#008000&gt;&amp;lt;&amp;deg;(((&amp;gt;&amp;lt;&lt;/font&gt;&amp;emsp;&lt;i&gt;contra mundum&lt;/i&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
1004410</field>
<field name="parent_node">
1004437</field>
</data>
</node>
