Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Re: string manipulation

by Desdinova (Friar)
on Mar 29, 2001 at 23:09 UTC ( #68166=note: print w/ replies, xml ) Need Help??


in reply to Re: string manipulation
in thread string manipulation

I You are looking to just replace a specific char with a different char then tr works and is more effcient This node shows a benchmark comparision of the two methods.

UPDATE: As Xxaxx points out below my benchmark was bit off. I re-ran it using the actuall functions needed for this problem and got these results

Benchmark: timing 500000 iterations of Method One TR, Method Two s... Method One TR: 2 wallclock secs ( 1.87 usr + 0.00 sys = 1.87 CPU) @ + 267379.68/s (n=500000) Method Two s: 5 wallclock secs ( 4.84 usr + 0.00 sys = 4.84 CPU) @ +103305.79/s (n=500000)

For more details click these nodes and see the discussion.


Comment on Re: Re: string manipulation
Download Code
Re: Re: Re: string manipulation
by Xxaxx (Monk) on Mar 30, 2001 at 01:15 UTC
    I recommend running the benchmark again using the actual functions.

    In your referenced benchmark you're comparing:

    $data =~ tr/a-z/A-Z/;
    and
    $data =~ s/(A-Za-z+)/uc($1)/ge;

    The uc($1) is a little different than:
    $outstring =~ s/-/_/g;

    When I ran the actual benchmark I got the following results:

    Run 1: (n=5000000)
    Method One TR: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 166666.67
    Method Two S: 3 wallclock secs ( 3.05 usr + 0.00 sys = 3.05 CPU) @ 163934.43

    Run 2: (n=5000000)
    Method One TR: 2 wallclock secs ( 2.96 usr + 0.00 sys = 2.96 CPU) @ 168918.92
    Method Two S: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 162337.66

    Run 3: (n=5000000)
    Method One TR: 4 wallclock secs ( 2.97 usr + 0.00 sys = 2.97 CPU) @ 168350.17
    Method Two S: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 162337.66

    Seems like they are pretty much equal. But, alas, I'm real new to this benchmark stuff and I could have some weird caching issue.

    Even so without the eval of uc($1) this is certainly no 17 to 1 ratio as the referenced benchmark shows.

    Hope this helps
    Claude
    p.s. Thanks to Desdinova for introducing me to the world of benchmarks.

      Post your code for this Benchmark and I'll show you where it went wrong. I'd bet on the variable you tested against not being in scope inside the benchmark sub/evals.

      My results, Linux on an IBM Netfinity (Intel)

      Benchmark: running regexp, transl, each for at least 10 CPU seconds... regexp: 10 wallclock secs (10.59 usr + 0.00 sys = 10.59 CPU) @ 37 +274.69/s (n=394739) transl: 13 wallclock secs (10.46 usr + 0.00 sys = 10.46 CPU) @ 31 +3981.07/s (n=3284242) Rate regexp transl regexp 37275/s -- -88% transl 313981/s 742% --

      That is a significant differential there for this simple task. A full regexp engine is a big thing to throw at a lightweight string scan. My benchmark code follows:

      use strict; use Benchmark qw(cmpthese); use vars qw( $x ); $x = 'This-is-a-test-string-I-just-typed-in-for-fun'; cmpthese (-10, { 'transl' => '$x =~ tr/-/_/; $x =~ tr/_/-/;', 'regexp' => '$x =~ s/-/_/g; $x =~ s/_/-/g;', } );

      Oh yeah, I sure am happy Benchmark exists too. =)

      Doh! Update: that assignment was:
      $x = 'This_is_a_test_string_I_just_typed_in_for_fun';
      It wasn't result impacting, just stupid since it no-ops half my test. Interestingly, if I change the string to one with spaces rather than the '-' or '_' I wind up with regexp being 50-60% faster at doing nothing but scanning with no changes...

      --
      $you = new YOU;
      honk() if $you->love(perl)

        Hey Extremely, Thanks for taking a look at the code and letting me know where the benchmark may have messed up:

        #!/usr/local/bin/perl -w use strict; use Benchmark; my $count =500000; ## Method number one sub One { my $data = 'for bar baz'; my($outstring); ($outstring = $data) =~ tr/-/_/; } ## Method number two sub Two { my $data = 'for bar baz'; my($outstring) = $data; $outstring =~ s/-/_/g; } ## We'll test each one, with simple labels timethese ( $count, {'Method One TR' => '&One', 'Method Two S' => '&Two', } ); exit;

        Claude
      You have a valid point in that node i was kind of comparing apples to oranges (Of course that node was about uppercasing input which is a bit different). I must have been having brain dead kind of day. As for your number I changed my benchmark code to this for this specific case to this:
      #!/usr/local/bin/perl -w use strict; use Benchmark; my $count = 900000; ## Method number two sub One { my $data='for-bar-baz'; $data =~tr/-/_/; } ## Method number Two sub Two { my $data='for-bar-baz'; $data =~s/-/_/g; } ## We'll test each one, with simple labels timethese ( $count, {'Method One TR' => '&One', 'Method Two s'=> '&Two' } ); exit;
      Which results in the following
      Benchmark: timing 500000 iterations of Method One TR, Method Two s... Method One TR: 2 wallclock secs ( 1.87 usr + 0.00 sys = 1.87 CPU) @ + 267379.68/s (n=500000) Method Two s: 5 wallclock secs ( 4.84 usr + 0.00 sys = 4.84 CPU) @ +103305.79/s (n=500000)
      FYI- I got these numbers using Perl 5.6.0 on Win32 You are right that this is not as big of difference as 17:1 (Which when I first ran seemed odd to me...) But is still a decent gap.
      I got the suggestion about not using S when TR will do from Effective Perl Programming (co-authored by our own merlyn). It is a great book with lots of info tweaking your code.

      off to update that other node now...
      PS- your benchmark doesnt do anything in either case the line:my $data='for bar baz'; Should be changed to have the char being looked for. Like So: my $data='for-bar-baz';

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://68166]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2014-12-27 04:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls