Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Slowdown when using Moose::Util::TypeConstraints with Regexp::Common

by kevbot (Hermit)
on Sep 02, 2008 at 05:16 UTC ( #708375=perlquestion: print w/ replies, xml ) Need Help??
kevbot has asked for the wisdom of the Perl Monks concerning the following question:

This is my first post to the monastery, so please take it easy on me :) I use Perl mostly as a hobby, and I have been slowly learning by reading various books and lurking around here. So, I think I finally have a question worth asking (most of the time a Super Search reveals that my question has already been answered). I have started to use Moose for anything object oriented and have found it to be very nice.

I recently wrote an application that creates a few thousand Moose objects. It seemed to run a bit slow so, I used Devel::DProf to help locate the bottleneck in my code. I was convinced that one of my methods would be involved. To my surprise, it seemed to be the combination of using Moose::Util::TypeConstraints with Regexp::Common

I have tried to reduce this down to a simple case based on the example given in Moose::Cookbook::Basics::Recipe4.

If I create the subtype using the technique in the recipe:

subtype USZipCodeA => as Value => where { /^$RE{zip}{US}{-extended => 'allow'}$/ };
it seems to run slower than if I create the subtype like this:
my $zip_re = qr/^$RE{zip}{US}{-extended => 'allow'}$/; subtype USZipCodeB => as Value => where { $_ =~ $zip_re; };
The code that I use for benchmarking and profiling the difference between these two techniques is behind the readmore tags.

In AddressA.pm:
package AddressA; use Moose; use Moose::Util::TypeConstraints; use Regexp::Common 'zip'; subtype USZipCodeA => as Value => where { /^$RE{zip}{US}{-extended => 'allow'}$/ }; has 'zip_code' => (is => 'rw', isa => 'USZipCodeA'); 1;
In AddressB.pm:
package AddressB; use Moose; use Moose::Util::TypeConstraints; use Regexp::Common 'zip'; my $zip_re = qr/^$RE{zip}{US}{-extended => 'allow'}$/; subtype USZipCodeB => as Value => where { $_ =~ $zip_re; }; has 'zip_code' => (is => 'rw', isa => 'USZipCodeB'); 1;
Comparing A and B:
#!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); use AddressA; use AddressB; my $zip = 12345; cmpthese (-5, { A => sub { AddressA->new(zip_code => $zip);}, B => sub { AddressB->new(zip_code => $zip);} } ); exit;

I'm using the following:

  • Windows XP
  • perl -v output: This is perl, v5.10.0 built for MSWin32-x86-multi-thread (Strawberry Perl)
  • Moose v0.55
  • Regexp::Common v2.122

Typical results of the comparison are:
Rate A B A 2976/s -- -43% B 5187/s 74% --

I made a couple simple programs to create 10,000 objects using either technique. Then I used Devel::DProf to profile the code. The two highest values of 'ExclSec' for the creation of 10,000 AddressA objects were from these methods:

Regexp::Common::_decache Regexp::Common::new
and for AddressB objects none of the Regexp::Common methods show up and these two methods have the highest values of 'ExclSec'.
Class::MOP::Instance::new Class::MOP::Instance::set_slot_value
I don't want to focus too much on the DProf results, since dprofpp is giving me negative elapsed time values and some negative 'CumulS' values. I have no idea why.

I just tried this out on Mac OS X with perl 5.8.8 (macports) and the results of the comparison are (same Moose and Regexp::Common versions)

Rate A B A 2955/s -- -7% B 3174/s 7% --
So, it's not as noticeable on Mac OS X. I'm not sure why this differs so much from strawberry perl.

My question has a few parts:

  • Is this slowdown expected behavior?
  • Am I doing something wrong?
  • Should I suggest that the Cookbook Recipe be updated?
I my own application, I had five or six subtypes and changing to the faster technique made my application noticeably faster. Thanks in advance for any insight you can offer. This experience resulted in me using Benchmark and Devel::DProf for the first time.

Comment on Slowdown when using Moose::Util::TypeConstraints with Regexp::Common
Select or Download Code
Re: Slowdown when using Moose::Util::TypeConstraints with Regexp::Common
by CountZero (Bishop) on Sep 02, 2008 at 05:51 UTC
    Useing
    my $zip_re = qr/^$RE{zip}{US}{-extended => 'allow'}$/;
    compiles your regex at compile time once and uses that precompiled piece of code in all your object-constructions, whereas useing
    /^$RE{zip}{US}{-extended => 'allow'}$/
    in your Moose-code will have to recompile your regex-code every-time again.

    It doesn't explain why Perl on Mac OS X doesn't show as big a slowdown as Strawberry Perl.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I added a couple other tests (AddressC and AddressD) to the mix. Basically, I wanted to take Regexp::Common out of the code to see if a standard regexp showed the same behavior. So, I grabbed the regular expression from Regexp::Common and placed it in the code directly. In AddressC, I put it directly in the subtype (similar to Address A):

      subtype USZipCodeC => as Value => where { $_ =~ /(?-xism:^(?:(?:(?:USA?)-){0,1}(?:(?:(?:[0-9]{3})(?: +[0-9]{2}))(?:(?:-)(?:(?:[0-9]{2})(?:[0-9]{2}))){0,1}))$)/; }; has 'zip_code' => (is => 'rw', isa => 'USZipCodeC');

      For AddressD, I compile the regexp (similar to AddressB):

      my $zip_re = qr/(?-xism:^(?:(?:(?:USA?)-){0,1}(?:(?:(?:[0-9]{3})(?:[0- +9]{2}))(?:(?:-)(?:(?:[0-9]{2})(?:[0-9]{2}))){0,1}))$)/; subtype USZipCodeD => as Value => where { $_ =~ $zip_re; }; has 'zip_code' => (is => 'rw', isa => 'USZipCodeD');
      So, I would expect that AddressB and AddressD would be the fastest. Here the result of a benchmark on my WinXP/Strawberry Perl 5.10 setup:
      Rate A D C B A 2960/s -- -42% -42% -42% D 5061/s 71% -- -1% -1% C 5099/s 72% 1% -- -0% B 5108/s 73% 1% 0% --

      The method using Regexp::Common directly in the Subtype is the slowest of all the techniques (AddressA). Basically, the other three are tied. So, the compiled regexp doesn't seem to make a difference as long as I avoid Regexp::Common.

        Basically all your versions, except AdressA, now use a single static regex, so they get compiled once at compile time, hence few or no differences. The Regexp::Common version cannot be optimized in that way, unless you force it to become static by using qr//. Such is the price you pay for flexibility.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Slowdown when using Moose::Util::TypeConstraints with Regexp::Common
by karavelov (Monk) on Sep 02, 2008 at 06:20 UTC

    You could try to minimize the difference with:

    => where { /^$RE{zip}{US}{-extended => "allow"}$/o };

    Here on linux with perl 5.10 it gives:

        Rate    A    B
    A 2402/s   -- -22%
    B 3080/s  28%   --
    

    The original benchmarked as:

        Rate    A    B
    A 1851/s   -- -39%
    B 3024/s  63%   --
    
    Best regards

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://708375]
Approved by lidden
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-09-01 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (17 votes), past polls