<?xml version="1.0" encoding="windows-1252"?>
<node id="846761" title="Regexp optimization - /o option better than precompiled regexp?" created="2010-06-27 05:56:08" updated="2010-06-27 05:56:08">
<type id="115">
perlquestion</type>
<author id="733795">
Hessu</author>
<data>
<field name="doctext">
&lt;p&gt;
Hello,&lt;/p&gt;

&lt;p&gt;While I was playing around with [mod://Devel::NYTProf] version 4.03, I started to wonder about regexp matching optimization. At least in the tests below, it seems that option /o offers better performance that precompiled regexp. There is of course variation between measurements, but the difference between /o option and precompiled regexp remains.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt; Readonly constant with /o option - CORE:regcomp, avg 675ns/call
  &lt;li&gt; Regexp match operator with local variable - CORE:regcomp, avg 718ns/call
  &lt;li&gt; Use constant in match operator - CORE:regcomp avg 721ns/call
  &lt;li&gt; Precompiled regexp - CORE:regcomp, avg 1µs/call
  &lt;li&gt; Readonly constant in match operator - CORE:regcomp avg 6µs/call (Updated)
&lt;/ol&gt;

&lt;p&gt;There are two things that I am wondering:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;It seems that CORE::regcomp is called everytime in the loop with variables and constants in regexp match operator. The time spend there just varies based on regexp. From [doc://perlfaq6#What-is-/o-really-for?] I first assumed that regexp compilation is made only once?&lt;/li&gt;
  &lt;li&gt;It was also a surprise that option /o is actually faster, at least in this case, than precompiled regexp. Is this how it should be or am I missing something?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tests have been made with ActiveState Perl version 5.10.1 Binary build 1006 291086 - Aug 24 2009 13:48:26. Hardware was Win7, 4GB, SSD HD, Intel Core7 920 2.6GHz.&lt;/p&gt;

Thank You

&lt;c&gt;
#!/usr/bin/perl -w

#####################################################################
# Test regexp matching
#
# &gt; perl -MDevel::NYTProf=savesrc=1 optimize_regexp.pl
# &gt; nytprofhtml
#####################################################################
use strict;
use warnings;

use Cwd;
use Readonly;
use Path::Class qw(file dir);
use Date::Calc qw(Today_and_Now);
use Fcntl qw(O_WRONLY O_CREAT O_TRUNC O_RDONLY);

#####################################################################
##
##  CONSTANTS
##
#####################################################################
Readonly my $EMPTY          =&gt; q{};
Readonly my $TOOL_ROOT      =&gt; getcwd;
Readonly my $TEMP_FILE_NAME =&gt; 'temp_file.txt';
Readonly my $TEMP_FILE_SIZE =&gt; 1000000;

#####################################################################
##
##  MAIN
##
#####################################################################
my $l_line      = $EMPTY;
my $l_temp_file = $EMPTY;
my $l_file_h    = $EMPTY;

# Create temporary file that is read for tests.
$l_temp_file = file($TOOL_ROOT, $TEMP_FILE_NAME);
create_temp_file($l_temp_file);

#####################################################################
# Readonly constant in match operator - regcomp avg 6µs/call
#####################################################################
Readonly my $REGEXP_READONLY =&gt; '999986';
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ m/$REGEXP_READONLY/ ) {
    # 5.78s - 1000001 calls to main::CORE:regcomp, avg 6µs/call
    # 4.88s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # 1.83s - 1000001 calls to Readonly::Scalar::FETCH, avg 2µs/call
    # 909ms - 1000001 calls to main::CORE:match, avg 859ns/call
    chomp $l_line;
    LOG("Regexp 01 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

#####################################################################
# Use constant in match operator - regcomp avg 721ns/call
#####################################################################
use constant REGEXP_CONSTANT =&gt; '999986';
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) {
    # 4.83s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # 745ms - 1000001 calls to main::CORE:match, avg 729ns/call
    # 735ms - 1000001 calls to main::CORE:regcomp, avg 721ns/call
    chomp $l_line;
    LOG("Regexp 02 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

#####################################################################
# No constant in match operator - no regcomp called
#####################################################################
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ m/999986/ ) {
    # spent 4.78s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # spent 838ms - 1000001 calls to main::CORE:match, avg 838ns/call
    chomp $l_line;
    LOG("Regexp 03 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

#####################################################################
# Readonly constant with /o option - regcomp, avg 675ns/call
#####################################################################
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ m/$REGEXP_READONLY/o ) {
    # 4.84s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # 754ms - 1000001 calls to main::CORE:match, avg 754ns/call
    # 732ms - 1000001 calls to main::CORE:regcomp, avg 675ns/call
    # 0s    - 2 calls to Readonly::Scalar::FETCH, avg 0s/call
    chomp $l_line;
    LOG("Regexp 04 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

#####################################################################
# Precompiled regexp - regcomp, avg 1µs/call
#####################################################################
my $l_search_r = qr/$REGEXP_READONLY/;
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ $l_search_r ) {
    # 4.77s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # 1.33s - 1000001 calls to main::CORE:regcomp, avg 1µs/call
    # 776ms - 1000001 calls to main::CORE:match, avg 776ns/call
    chomp $l_line;
    LOG("Regexp 05 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

#####################################################################
# Regexp match operator with local variable - regcomp, avg 718ns/call
#####################################################################
my $l_search = $REGEXP_READONLY;
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ m/$l_search/ ) {
    # 4.73s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # 759ms - 1000001 calls to main::CORE:match, avg 766ns/call
    # 741ms - 1000001 calls to main::CORE:regcomp, avg 718ns/call
    chomp $l_line;
    LOG("Regexp 06 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

#####################################################################
# Regexp match with variable and /o option - regcomp, avg 690ns/call
#####################################################################
$l_search = $REGEXP_READONLY;
$l_file_h = IO::File-&gt;new($l_temp_file, O_RDONLY);
while( $l_line = $l_file_h-&gt;getline() ) {
  if( $l_line =~ m/$l_search/o ) {
    # 4.86s - 1000001 calls to IO::Handle::getline, avg 5µs/call
    # 758ms - 1000001 calls to main::CORE:match, avg 758ns/call
    # 690ms - 1000001 calls to main::CORE:regcomp, avg 690ns/call
    chomp $l_line;
    LOG("Regexp 07 - matched line ($l_line)");
  }
}
$l_file_h-&gt;close();

exit 0;

#####################################################################
##
##  SUBROUTINES
##
#####################################################################
sub create_temp_file{
  my $p_file                  = shift;
  
  my $l_file_h                = $EMPTY;
  
  LOG("print file ($p_file)");
  $l_file_h = IO::File-&gt;new($p_file, O_WRONLY|O_TRUNC|O_CREAT);
  for( 0 .. $TEMP_FILE_SIZE ) {
    print {$l_file_h} 'Line number is = ' . $_ . "\n";
  }
  $l_file_h-&gt;close();
 
  return; 
}

sub LOG{

    my $l_time = [Today_and_Now()];
    my $l_string = sprintf('%d-%02d-%02d %02d:%02d:%02d', @{$l_time});
    $l_string = $l_string . q{ - } . $_[0];
    print sprintf("%s\n", $l_string);

    return;
}

&lt;/c&gt;
</field>
</data>
</node>
