<?xml version="1.0" encoding="windows-1252"?>
<node id="810581" title="Re^2: better (faster) way of writing regexp" created="2009-12-02 10:04:03" updated="2009-12-02 10:04:03">
<type id="11">
note</type>
<author id="763880">
vitoco</author>
<data>
<field name="doctext">
&lt;p&gt;I'd expect that the &lt;c&gt;unpack&lt;/c&gt; is the faster method to split the fields, but after inserting the following code in the above benchmark, in the average "dirunpk" (direct unpack) took the same amount of time than the "repeat" test.&lt;/p&gt;

&lt;code&gt;
#!perl
use v5.10;
use strict;
use warnings;
use Benchmark qw(:all);

my $results = timethese( 1e6,
  {
    repeat =&gt; sub{
      my $t1 = '20090123';
      $t1 =~ /(\d\d\d\d)(\d\d)(\d\d)/;
      my ($y1,$m1,$d1) = ($1,$2,$3);
    },
    range =&gt; sub{
      my $t2 = '20090123';
      $t2 =~ /(\d{4})(\d{2})(\d{2})/;
      my ($y2,$m2,$d2) = ($1,$2,$3);
    },
    chkunpk =&gt; sub{
      my $t3 = '20090123';
      $t3 =~ m/([0-9]{8})/;
      my ($y3,$m3,$d3) = unpack "A4 A2 A2", $1;
    },
    dirunpk =&gt; sub{
      my $t3 = '20090123';
      my ($y4,$m4,$d4) = unpack "A4 A2 A2", $t3;
    },
    isook =&gt; sub{
      my $t5 = '20090123';
      $t5 =~ /(....)(..)(..)/;
      my ($y5,$m5,$d5) = ($1,$2,$3);
    },
  } );

cmpthese( $results ) ;

__END__

1st run:

Benchmark: timing 1000000 iterations of chkunpk, dirunpk, isook, range, repeat...
   chkunpk:  4 wallclock secs ( 3.11 usr +  0.00 sys =  3.11 CPU) @ 321646.83/s (n=1000000)
   dirunpk:  2 wallclock secs ( 2.06 usr +  0.00 sys =  2.06 CPU) @ 484966.05/s (n=1000000)
     isook:  1 wallclock secs ( 1.95 usr +  0.00 sys =  1.95 CPU) @ 512032.77/s (n=1000000)
     range:  3 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 CPU) @ 463821.89/s (n=1000000)
    repeat:  2 wallclock secs ( 1.97 usr +  0.00 sys =  1.97 CPU) @ 508130.08/s (n=1000000)
            Rate chkunpk   range dirunpk  repeat   isook
chkunpk 321647/s      --    -31%    -34%    -37%    -37%
range   463822/s     44%      --     -4%     -9%     -9%
dirunpk 484966/s     51%      5%      --     -5%     -5%
repeat  508130/s     58%     10%      5%      --     -1%
isook   512033/s     59%     10%      6%      1%      --

2nd run:

Benchmark: timing 1000000 iterations of chkunpk, dirunpk, isook, range, repeat...
   chkunpk:  2 wallclock secs ( 3.11 usr +  0.00 sys =  3.11 CPU) @ 321646.83/s (n=1000000)
   dirunpk:  3 wallclock secs ( 2.05 usr +  0.00 sys =  2.05 CPU) @ 488519.79/s (n=1000000)
     isook:  2 wallclock secs ( 1.98 usr +  0.00 sys =  1.98 CPU) @ 504032.26/s (n=1000000)
     range:  1 wallclock secs ( 2.11 usr +  0.00 sys =  2.11 CPU) @ 474158.37/s (n=1000000)
    repeat:  3 wallclock secs ( 2.06 usr +  0.00 sys =  2.06 CPU) @ 484966.05/s (n=1000000)
            Rate chkunpk   range  repeat dirunpk   isook
chkunpk 321647/s      --    -32%    -34%    -34%    -36%
range   474158/s     47%      --     -2%     -3%     -6%
repeat  484966/s     51%      2%      --     -1%     -4%
dirunpk 488520/s     52%      3%      1%      --     -3%
isook   504032/s     57%      6%      4%      3%      --
&lt;/code&gt;

&lt;p&gt;The faster way seems to be using the capture made of &lt;i&gt;dots&lt;/i&gt; as in "isook", as the OP said that the string IS a date in ISO format.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;UPDATE:&lt;/b&gt; Name of tests changed (for readability) and comparison table added. Results are for two consecutive runs.&lt;/p&gt;
</field>
<field name="root_node">
810549</field>
<field name="parent_node">
810556</field>
</data>
</node>
