Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Common Regex Gotchas

by Desdinova (Friar)
on Mar 14, 2001 at 22:41 UTC ( #64448=note: print w/ replies, xml ) Need Help??


in reply to Common Regex Gotchas

Regarding the simple substitutions section just to prove your point about not going overkill i benchmarked the two ways you mentioned (tr and s) as well as just uc with this code

#!/usr/local/bin/perl -w use strict; use Benchmark; my $count =500000; ## Method number one sub One { my $data = 'for bar baz'; $data = uc $data; } ## Method number two sub Two { my $data = 'for bar baz'; $data =~ tr/a-z/A-Z/; } ## Method number Three sub Three { my $data = 'for bar baz'; $data =~ s/([A-Za-z]+)/uc($1)/ge; } ## We'll test each one, with simple labels timethese ( $count, {'Method One UC' => '&One', 'Method Two TR' => '&Two', 'Method Three s'=> '&Three' } ); exit;
And got these results:
Benchmark: timing 500000 iterations of Method One UC, Method Three s, +Method Two TR... Method One UC: 1 wallclock secs ( 1.42 usr + 0.00 sys = 1.42 CPU) @ + 352112.68/s (n=500000) Method Three s: 16 wallclock secs (17.03 usr + 0.00 sys = 17.03 CPU) +@ 29359.95/s (n=500000) Method Two TR: 1 wallclock secs ( 2.04 usr + 0.00 sys = 2.04 CPU) @ + 245098.04/s (n=500000)
I know this is not new information but i figured i'd post here to highlight what you are saying.
PS -- The bechmark method stolen from Benchmarking your code

UPDATE: Xxaxx pointed out to me in This Node That I am not making a fair comparision above. The eval of uc($1) in the s/// regex was eating up a lot of the cycles. The gap is smaller than 17:1 shown above...
For a fairer test I compared a single char substituion with tr/// and s///
my $data = 'for-bar-baz'; $data =~ s/-/_/g; print $data; my $data = 'for-bar-baz'; $data =~tr/-/_/; print $data;
Using the benchmarking above I got hese results:
Benchmark: timing 500000 iterations of Method One TR, Method Two s... Method One TR: 2 wallclock secs ( 1.87 usr + 0.00 sys = 1.87 CPU) @ + 267379.68/s (n=500000) Method Two s: 5 wallclock secs ( 4.84 usr + 0.00 sys = 4.84 CPU) @ +103305.79/s (n=500000)
Still there is an advantage to tr/// over s/// which can be more noticable depending on your data.

Update 2: petral asked me question in the CB about the way i call uc in method one made me realize that it wont actually do anything because I don't assign the return value back to the var. I updated the code to do that.


Comment on Re: Common Regex Gotchas
Select or Download Code
Re: Re: Common Regex Gotchas
by Anonymous Monk on May 08, 2001 at 23:26 UTC
    If I am not mistaken the Benchmark module is plagued by the "$& and friends". That means it makes the regexes slow by defualt. That means that the benchmarks you take are disproportionate and useless, since the ineffectiant single instance of $& ruins any optimizations perl can make on the substitution.
      Happily, that doesn't appear to be the case. I don't see any occurence of the $& et al. variables in the code for Benchmark.pm

      The real problem here is the use of /e on the substitution, when this would work just as well and be much more efficient: s/(\w+)/\U$1/g;

        I don't see any occurence of the $& et al. variables in the code for Benchmark.pm

        I think what our Anonymous friend means is that if one of the routines being benchmarked uses $&, then all routines suffer (unfairly) from the overhead.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://64448]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (17)
As of 2015-07-06 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (80 votes), past polls