Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Never

by grantm (Parson)
on May 13, 2003 at 00:17 UTC ( #257609=note: print w/replies, xml ) Need Help??


in reply to Never-to-use Perl features?

Including the /o-modifier, the list now has 3 language features that seemed a nice idea at first, but aren't really usable now.

I've read the thread you quoted and I'm obviously missing something because in my experience /o works exactly as it should:

use Benchmark; my @words = map { chomp; $_ } (<DATA>); my $alpha = '[a-zA-Z]'; my $alnum = '[a-zA-Z0-9]'; timethese(2000, { 'Without /o' => \&testsub, 'With /o' => \&testsubo, }); sub testsub { my $count = 0; foreach (@words) { $count++ if(/^$alpha$alnum+$/); } return $count; } sub testsubo { my $count = 0; foreach (@words) { $count++ if(/^$alpha$alnum+$/o); } return $count; } __DATA__ 1500 words one per line

Which on my system shows that with /o is three times faster than without.

Using variables to give meaningful names to chunks of a regex is very useful for improving the readability, maintainability and reusability of the code. Without /o it would be inefficient. What is it about /o that makes it "not really usable"?

Update: I added this to the test script:

my $qr = qr/^$alpha$alnum+$/; [snip] sub testsubqr { my $count = 0; foreach (@words) { $count++ if(/$qr/); } return $count; } sub testsubqro { my $count = 0; foreach (@words) { $count++ if(/$qr/o); } return $count; }

The qr// approach seems to be about 20% slower than /o and qr// + /o seems to be about the same as /o alone.

Replies are listed 'Best First'.
Re: Re: Never
by BrowserUk (Pope) on May 13, 2003 at 00:56 UTC

    At last, someone else sees the benefits of /o.

    Then, the counter argument is: Use qr// which works and removes the need (most of the time) for /o...

    until you combine a couple of chunks pre-compiler with qr// into another chuck with qr//. Then the /o seems (sometimes at least) to show benefits again.

    I wish I could truly tie down when and why qr/.../o produces these benefits and when not.

    Or is it all just a figment of my imagination.

    The counter-argument that you shouldn't use /o because you might forget you'd used it sometime doesn't cut much ice with me.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      hmm.. yup. after much fiddling with the tests, I couldn't get qr to match /o. My conclusion is:
      if you've got a regex containing a variable that will not change, use /o.
      if you need to loop through multiple regexes, then you can't use /o, so compile them with qr.
      the option to that is to have multiple tests using /o which I think would still be faster.. ok off to test..

      again, hmmm..

      use Benchmark; my @words = map { chomp; $_ } (<DATA>); my $alpha = '[a-zA-Z]'; my $alnum = '[a-zA-Z0-9]'; my @qr = ( qr/^$alpha/, qr/$alnum+$/ ); timethese(500000, { 'With /o' => \&testsub, 'qr' => \&testsubqr, }); sub testsub{ my $count = 0; foreach(@words){ $count += testsubb($_); } } sub testsubb { my $word = $_[0]; return unless $word =~ /^$alpha/o; return unless $word =~ /$alnum+$/o; return 1; } sub testsubqr{ my $count = 0; foreach(@words){ $count += testsubbqr($_); } } sub testsubbqr { my $word = $_[0]; foreach(@qr){ return unless $word =~ $_; } return 1; } Benchmark: timing 500000 iterations of With /o, qr... With /o: 12 wallclock secs (13.13 usr + 0.01 sys = 13.14 CPU) @ 38 +051.75/s (n=500000) qr: 18 wallclock secs (18.90 usr + 0.01 sys = 18.91 CPU) @ 26 +441.04/s (n=500000) Benchmark: timing 100000 iterations of With /o, qr... With /o: 3 wallclock secs ( 2.64 usr + 0.00 sys = 2.64 CPU) @ 37 +878.79/s (n=100000) qr: 4 wallclock secs ( 3.78 usr + 0.00 sys = 3.78 CPU) @ 26 +455.03/s (n=100000)

      I think I'm done defending qr..

      ok, back to work.. nothing to see here...

      update: adding these tests shows that the loop is adding more time than qr saves, but /o is still quicker.. so where does that leave us?

      sub testsubqr2{ my $count = 0; foreach(@words){ $count += testsubbqr2($_); } } sub testsubbqr2 { my $word = $_[0]; return unless $word =~ $qr[0]; return unless $word =~ $qr[1]; return 1; } Benchmark: timing 100000 iterations of With /o, qr, qr2... With /o: 3 wallclock secs ( 2.63 usr + 0.00 sys = 2.63 CPU) @ 38 +022.81/s (n=100000) qr: 4 wallclock secs ( 3.76 usr + 0.00 sys = 3.76 CPU) @ 26 +595.74/s (n=100000) qr2: 3 wallclock secs ( 2.98 usr + 0.00 sys = 2.98 CPU) @ 33 +557.05/s (n=100000)

      cheers,

      J

Re: Re: Never
by perrin (Chancellor) on May 13, 2003 at 07:53 UTC
    The speed differences are small enough to be irrelevant for almost all real world situations. The important difference is that qr// is a clear and fairly obvious modifier, which /o does something totally bizarre, i.e. the first time a regex with /o runs it will do one thing and every other time it will do something different. This frequently wreaks havoc in persistent environments like mod_perl where people forget that /o regexes will not get reset after their script finishes.
      the first time a regex with /o runs it will do one thing and every other time it will do something different

      Actually every other time it will do the same thing despite many people expecting it to do something different :-)

      All flippancy aside though, your point re persistent environments is a good one that I hadn't considered. Mind you, persistent environments wreak all sorts of havoc with file scoped lexicals too but that doesn't mean they're inherently a bad idea - it just means you need to use them with caution.

Re^2: Never (qr//)
by tye (Sage) on May 13, 2003 at 14:25 UTC

    Um, I guess it helps if you know how to use qr// properly. You don't write /$qr/ if you want it fast, you write $qr!

    I get qr// faster than even /o. Though, your benchmark is testing such micro operations that the results can be rather unstable. The most "likely looking" result I got (early on) was:

    Rate Without /o With /o With qr Without /o 42725/s -- -26% -39% With /o 57636/s 35% -- -18% With qr 70185/s 64% 22% --
    But a more typical result was:
    Rate 2// 2/o 1// 1/o 1qr 2qr 2// 31.4/s -- -0% -0% -1% -25% -26% 2/o 31.5/s 0% -- -0% -0% -25% -25% 1// 31.6/s 0% 0% -- -0% -25% -25% 1/o 31.6/s 1% 0% 0% -- -25% -25% 1qr 41.9/s 33% 33% 33% 33% -- -1% 2qr 42.2/s 34% 34% 34% 34% 1% --
    Yes, that's right, /o was so close that it even ran slower than // on occasion.

    Note that I didn't change any of the code in the subroutines being benchmarked between these two runs (I did change the data used several times, but even other runs with the same data never gave me results very similar to that first result above). It is just that Benchmark has to do some interesting work to try to measure such micro operations and so can easily show differences of around 20% between successive runs of identical code. That is why I usually make sure I have the benchmarking code run each case twice (otherwise you are rather likely to give a 20% disadvantage to the case that gets run first, for example).

    Also, always verify that all of your benchmarked cases are doing the same thing:

    Without /o:2200 With /o:2200 With qr:2200

    So I stand by my assertion that you should never use /o!

                    - tye

      And I recall from looking at the generated optree that =~ /$qr/ and =~ $qr are 100% identical. I don't think anyone here is actually measuring any real difference.

        I'd hoped that. I'd heard enough people claiming otherwise that I guessed that perhaps Perl wasn't that smart.

        I also noticed that none of the benchmarking code in this thread was making a straight substitution of /.../o with $qr between the cases. So I picked the code that had the most similar test cases and added one, got results even better than I expected, quadruple checked things because "better than expected benchmarks" almost always means "mistake made".

        Adding another case for /$qr/, I get it being nearly identical to my $_ =~ $qr case (which is faster than /.../o, perhaps just because the qr/.../ part is done outside the scope of the benchmarking; but "fixing" that would be more work than I care to invest at this point).

        So I suspect that diotalevi is correct in both that =~ /$qr/ and =~ $qr produce identical code and that the benchmark results showing qr// to be slower than //o have to do with other code differences between the cases and/or the order that operations get run (or chance).

        Thanks, diotalevi.

                        - tye

      You're right, publishing a benchmark without the test data is pretty meaningless. Here's a revised version that uses the individual words output from 'perldoc -t perlfunc' as the test data.

      #!/usr/local/bin/perl -w use Benchmark; my (@words, $count); open(TESTDATA, "perldoc -t perlfunc|") || die $!; while(<TESTDATA>) { chomp; push @words, /(\S+)/g } print @words . " words\n"; my $alpha = '[a-zA-Z]'; my $alnum = '[a-zA-Z0-9]'; my $qr = qr/^$alpha$alnum+$/; timethese(100, { '/^$alpha$alnum+$/ ' => \&testsub, '/^$alpha$alnum+$/o' => \&testsubo, '/$qr/ ' => \&testsubqr1, '$qr ' => \&testsubqr2, '/$qr/o ' => \&testsubqro, }); sub testsub { foreach (@words) { $count++ if(/^$alpha$alnum+$/); } + } sub testsubo { foreach (@words) { $count++ if(/^$alpha$alnum+$/o); } + } sub testsubqr1 { foreach (@words) { $count++ if(/$qr/); } + } sub testsubqr2 { foreach (@words) { $count++ if($_ =~ $qr); } + } sub testsubqro { foreach (@words) { $count++ if(/$qr/o); } + }

      This is probably a fairer test than the original (less iterations of more data) and the output looks like this:

      /^$alpha$alnum+$/ : 20 wallclock secs (20.41 usr + 0.00 sys = 20.41 C +PU) @ 4.90/s (n=100) /^$alpha$alnum+$/o: 9 wallclock secs ( 8.34 usr + 0.00 sys = 8.34 C +PU) @ 11.99/s (n=100) /$qr/ : 9 wallclock secs ( 9.59 usr + 0.00 sys = 9.59 C +PU) @ 10.43/s (n=100) $qr : 10 wallclock secs ( 9.94 usr + 0.00 sys = 9.94 C +PU) @ 10.06/s (n=100) /$qr/o : 9 wallclock secs ( 8.34 usr + 0.01 sys = 8.35 C +PU) @ 11.98/s (n=100)

      The reason I used /$qr/ rather than =~ $qr was not because I didn't know how to use it, but because I was using it in an if statement and $qr being a reference would simply evaluate to true without even attempting a match. The results above appear to show that plain $qr is slightly slower than /$qr/ but that is almost certainly due to the fact that I had to spell it out as $_ =~ $qr and so the difference should be disregarded.

        There is also the problem with using the =~ $qr form that you cannot use it if you need to apply the /g option.

        Nor if you have a regex that you sometimes want to use with capturing and sometimes without.

        Nor can you use it in substitutions.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://257609]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2018-04-20 01:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?