Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re^2: Never (qr//)

by tye (Sage)
on May 13, 2003 at 14:25 UTC ( #257755=note: print w/replies, xml ) Need Help??

in reply to Re: Never
in thread Never-to-use Perl features?

Um, I guess it helps if you know how to use qr// properly. You don't write /$qr/ if you want it fast, you write $qr!

I get qr// faster than even /o. Though, your benchmark is testing such micro operations that the results can be rather unstable. The most "likely looking" result I got (early on) was:

Rate Without /o With /o With qr Without /o 42725/s -- -26% -39% With /o 57636/s 35% -- -18% With qr 70185/s 64% 22% --
But a more typical result was:
Rate 2// 2/o 1// 1/o 1qr 2qr 2// 31.4/s -- -0% -0% -1% -25% -26% 2/o 31.5/s 0% -- -0% -0% -25% -25% 1// 31.6/s 0% 0% -- -0% -25% -25% 1/o 31.6/s 1% 0% 0% -- -25% -25% 1qr 41.9/s 33% 33% 33% 33% -- -1% 2qr 42.2/s 34% 34% 34% 34% 1% --
Yes, that's right, /o was so close that it even ran slower than // on occasion.

Note that I didn't change any of the code in the subroutines being benchmarked between these two runs (I did change the data used several times, but even other runs with the same data never gave me results very similar to that first result above). It is just that Benchmark has to do some interesting work to try to measure such micro operations and so can easily show differences of around 20% between successive runs of identical code. That is why I usually make sure I have the benchmarking code run each case twice (otherwise you are rather likely to give a 20% disadvantage to the case that gets run first, for example).

Also, always verify that all of your benchmarked cases are doing the same thing:

Without /o:2200 With /o:2200 With qr:2200

So I stand by my assertion that you should never use /o!

                - tye
#!/usr/bin/perl -w use strict; use Benchmark qw( cmpthese ); chomp( my @words = <DATA> ); push @words, map $_ x 100, @words; for( @words ) { if( s/ /0/g ) { s/0/ /; $_= reverse $_; } } seek DATA, 0, 0; push @words, grep chomp, <DATA>; @words= ( @words ) x 100; my $alpha = '[a-zA-Z]'; my $alnum = '[a-zA-Z0-9]'; my $qr= qr/^$alpha$alnum+$/; print 'Without /o:' => testsub(), $/, 'With /o:' => testsubo(), $/, 'With qr:' => testsubqr(), $/; cmpthese( -3, { '1//' => \&testsub, '1/o' => \&testsubo, '1qr' => \&testsubqr, '2//' => \&testsub, '2/o' => \&testsubo, '2qr' => \&testsubqr, }); sub testsub { my $count = 0; foreach (@words) { $count++ if(/^$alpha$alnum+$/); } return $count; } sub testsubo { my $count = 0; foreach (@words) { $count++ if(/^$alpha$alnum+$/o); } return $count; } sub testsubqr { my $count = 0; foreach (@words) { $count++ if $_ =~ $qr; } return $count; } __DATA__ include the real test data or code to generate it when posting benchmarks

Replies are listed 'Best First'.
Re: Re^2: Never (qr//)
by diotalevi (Canon) on May 13, 2003 at 14:44 UTC

    And I recall from looking at the generated optree that =~ /$qr/ and =~ $qr are 100% identical. I don't think anyone here is actually measuring any real difference.

      I'd hoped that. I'd heard enough people claiming otherwise that I guessed that perhaps Perl wasn't that smart.

      I also noticed that none of the benchmarking code in this thread was making a straight substitution of /.../o with $qr between the cases. So I picked the code that had the most similar test cases and added one, got results even better than I expected, quadruple checked things because "better than expected benchmarks" almost always means "mistake made".

      Adding another case for /$qr/, I get it being nearly identical to my $_ =~ $qr case (which is faster than /.../o, perhaps just because the qr/.../ part is done outside the scope of the benchmarking; but "fixing" that would be more work than I care to invest at this point).

      So I suspect that diotalevi is correct in both that =~ /$qr/ and =~ $qr produce identical code and that the benchmark results showing qr// to be slower than //o have to do with other code differences between the cases and/or the order that operations get run (or chance).

      Thanks, diotalevi.

                      - tye
Re: Re^2: Never (qr//)
by grantm (Parson) on May 14, 2003 at 09:46 UTC

    You're right, publishing a benchmark without the test data is pretty meaningless. Here's a revised version that uses the individual words output from 'perldoc -t perlfunc' as the test data.

    #!/usr/local/bin/perl -w use Benchmark; my (@words, $count); open(TESTDATA, "perldoc -t perlfunc|") || die $!; while(<TESTDATA>) { chomp; push @words, /(\S+)/g } print @words . " words\n"; my $alpha = '[a-zA-Z]'; my $alnum = '[a-zA-Z0-9]'; my $qr = qr/^$alpha$alnum+$/; timethese(100, { '/^$alpha$alnum+$/ ' => \&testsub, '/^$alpha$alnum+$/o' => \&testsubo, '/$qr/ ' => \&testsubqr1, '$qr ' => \&testsubqr2, '/$qr/o ' => \&testsubqro, }); sub testsub { foreach (@words) { $count++ if(/^$alpha$alnum+$/); } + } sub testsubo { foreach (@words) { $count++ if(/^$alpha$alnum+$/o); } + } sub testsubqr1 { foreach (@words) { $count++ if(/$qr/); } + } sub testsubqr2 { foreach (@words) { $count++ if($_ =~ $qr); } + } sub testsubqro { foreach (@words) { $count++ if(/$qr/o); } + }

    This is probably a fairer test than the original (less iterations of more data) and the output looks like this:

    /^$alpha$alnum+$/ : 20 wallclock secs (20.41 usr + 0.00 sys = 20.41 C +PU) @ 4.90/s (n=100) /^$alpha$alnum+$/o: 9 wallclock secs ( 8.34 usr + 0.00 sys = 8.34 C +PU) @ 11.99/s (n=100) /$qr/ : 9 wallclock secs ( 9.59 usr + 0.00 sys = 9.59 C +PU) @ 10.43/s (n=100) $qr : 10 wallclock secs ( 9.94 usr + 0.00 sys = 9.94 C +PU) @ 10.06/s (n=100) /$qr/o : 9 wallclock secs ( 8.34 usr + 0.01 sys = 8.35 C +PU) @ 11.98/s (n=100)

    The reason I used /$qr/ rather than =~ $qr was not because I didn't know how to use it, but because I was using it in an if statement and $qr being a reference would simply evaluate to true without even attempting a match. The results above appear to show that plain $qr is slightly slower than /$qr/ but that is almost certainly due to the fact that I had to spell it out as $_ =~ $qr and so the difference should be disregarded.

      There is also the problem with using the =~ $qr form that you cannot use it if you need to apply the /g option.

      Nor if you have a regex that you sometimes want to use with capturing and sometimes without.

      Nor can you use it in substitutions.

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://257755]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2018-06-21 09:22 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (117 votes). Check out past polls.