Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hello,

While I was playing around with Devel::NYTProf version 4.03, I started to wonder about regexp matching optimization. At least in the tests below, it seems that option /o offers better performance that precompiled regexp. There is of course variation between measurements, but the difference between /o option and precompiled regexp remains.

  1. Readonly constant with /o option - CORE:regcomp, avg 675ns/call
  2. Regexp match operator with local variable - CORE:regcomp, avg 718ns/call
  3. Use constant in match operator - CORE:regcomp avg 721ns/call
  4. Precompiled regexp - CORE:regcomp, avg 1Ás/call
  5. Readonly constant in match operator - CORE:regcomp avg 6Ás/call (Updated)

There are two things that I am wondering:

  1. It seems that CORE::regcomp is called everytime in the loop with variables and constants in regexp match operator. The time spend there just varies based on regexp. From What is /o really for? I first assumed that regexp compilation is made only once?
  2. It was also a surprise that option /o is actually faster, at least in this case, than precompiled regexp. Is this how it should be or am I missing something?

The tests have been made with ActiveState Perl version 5.10.1 Binary build 1006 291086 - Aug 24 2009 13:48:26. Hardware was Win7, 4GB, SSD HD, Intel Core7 920 2.6GHz.

Thank You
#!/usr/bin/perl -w ##################################################################### # Test regexp matching # # > perl -MDevel::NYTProf=savesrc=1 optimize_regexp.pl # > nytprofhtml ##################################################################### use strict; use warnings; use Cwd; use Readonly; use Path::Class qw(file dir); use Date::Calc qw(Today_and_Now); use Fcntl qw(O_WRONLY O_CREAT O_TRUNC O_RDONLY); ##################################################################### ## ## CONSTANTS ## ##################################################################### Readonly my $EMPTY => q{}; Readonly my $TOOL_ROOT => getcwd; Readonly my $TEMP_FILE_NAME => 'temp_file.txt'; Readonly my $TEMP_FILE_SIZE => 1000000; ##################################################################### ## ## MAIN ## ##################################################################### my $l_line = $EMPTY; my $l_temp_file = $EMPTY; my $l_file_h = $EMPTY; # Create temporary file that is read for tests. $l_temp_file = file($TOOL_ROOT, $TEMP_FILE_NAME); create_temp_file($l_temp_file); ##################################################################### # Readonly constant in match operator - regcomp avg 6Ás/call ##################################################################### Readonly my $REGEXP_READONLY => '999986'; $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ m/$REGEXP_READONLY/ ) { # 5.78s - 1000001 calls to main::CORE:regcomp, avg 6Ás/call # 4.88s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # 1.83s - 1000001 calls to Readonly::Scalar::FETCH, avg 2Ás/call # 909ms - 1000001 calls to main::CORE:match, avg 859ns/call chomp $l_line; LOG("Regexp 01 - matched line ($l_line)"); } } $l_file_h->close(); ##################################################################### # Use constant in match operator - regcomp avg 721ns/call ##################################################################### use constant REGEXP_CONSTANT => '999986'; $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ m/${\REGEXP_CONSTANT}/ ) { # 4.83s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # 745ms - 1000001 calls to main::CORE:match, avg 729ns/call # 735ms - 1000001 calls to main::CORE:regcomp, avg 721ns/call chomp $l_line; LOG("Regexp 02 - matched line ($l_line)"); } } $l_file_h->close(); ##################################################################### # No constant in match operator - no regcomp called ##################################################################### $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ m/999986/ ) { # spent 4.78s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # spent 838ms - 1000001 calls to main::CORE:match, avg 838ns/call chomp $l_line; LOG("Regexp 03 - matched line ($l_line)"); } } $l_file_h->close(); ##################################################################### # Readonly constant with /o option - regcomp, avg 675ns/call ##################################################################### $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ m/$REGEXP_READONLY/o ) { # 4.84s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # 754ms - 1000001 calls to main::CORE:match, avg 754ns/call # 732ms - 1000001 calls to main::CORE:regcomp, avg 675ns/call # 0s - 2 calls to Readonly::Scalar::FETCH, avg 0s/call chomp $l_line; LOG("Regexp 04 - matched line ($l_line)"); } } $l_file_h->close(); ##################################################################### # Precompiled regexp - regcomp, avg 1Ás/call ##################################################################### my $l_search_r = qr/$REGEXP_READONLY/; $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ $l_search_r ) { # 4.77s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # 1.33s - 1000001 calls to main::CORE:regcomp, avg 1Ás/call # 776ms - 1000001 calls to main::CORE:match, avg 776ns/call chomp $l_line; LOG("Regexp 05 - matched line ($l_line)"); } } $l_file_h->close(); ##################################################################### # Regexp match operator with local variable - regcomp, avg 718ns/call ##################################################################### my $l_search = $REGEXP_READONLY; $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ m/$l_search/ ) { # 4.73s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # 759ms - 1000001 calls to main::CORE:match, avg 766ns/call # 741ms - 1000001 calls to main::CORE:regcomp, avg 718ns/call chomp $l_line; LOG("Regexp 06 - matched line ($l_line)"); } } $l_file_h->close(); ##################################################################### # Regexp match with variable and /o option - regcomp, avg 690ns/call ##################################################################### $l_search = $REGEXP_READONLY; $l_file_h = IO::File->new($l_temp_file, O_RDONLY); while( $l_line = $l_file_h->getline() ) { if( $l_line =~ m/$l_search/o ) { # 4.86s - 1000001 calls to IO::Handle::getline, avg 5Ás/call # 758ms - 1000001 calls to main::CORE:match, avg 758ns/call # 690ms - 1000001 calls to main::CORE:regcomp, avg 690ns/call chomp $l_line; LOG("Regexp 07 - matched line ($l_line)"); } } $l_file_h->close(); exit 0; ##################################################################### ## ## SUBROUTINES ## ##################################################################### sub create_temp_file{ my $p_file = shift; my $l_file_h = $EMPTY; LOG("print file ($p_file)"); $l_file_h = IO::File->new($p_file, O_WRONLY|O_TRUNC|O_CREAT); for( 0 .. $TEMP_FILE_SIZE ) { print {$l_file_h} 'Line number is = ' . $_ . "\n"; } $l_file_h->close(); return; } sub LOG{ my $l_time = [Today_and_Now()]; my $l_string = sprintf('%d-%02d-%02d %02d:%02d:%02d', @{$l_time}); $l_string = $l_string . q{ - } . $_[0]; print sprintf("%s\n", $l_string); return; }

In reply to Regexp optimization - /o option better than precompiled regexp? by Hessu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (7)
    As of 2014-11-23 02:06 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My preferred Perl binaries come from:














      Results (127 votes), past polls