http://www.perlmonks.org?node_id=79807

Well here it is: Proof that you can really perform a task in multiple ways. I needed an efficient way to cut off trailing whitespace. I was pulling values from a database and the scalars all had trailing whitespace. I had mistakenly made the column type CHAR(10) when it should have been a VARCHAR. Let me just say that the details of the job make it alright to cut the whitespace instead of fixing the underlying problem

So I had three ideas (none original) to perform this task. Here is my benchmark script.

use Benchmark; timethese (1000000, { 'unpack' => q{ my $foo = "test "; $foo = unpack("A8",$foo); }, 'regex' => q{ my $foo = "test "; $foo =~ s/\s+$//;}, 'xeger' => q{ my $foo = reverse "test "; $foo =~ s/^\s+//; $foo = reverse $foo;} } );

The standard regex, the unpack function, and the reverse regex. I figured the reverse one maybe wouldn't apply because the is no .* in the match. But it is cool enough an idea to try out.

Here are my results:

Benchmark: timing 1000000 iterations of regex, unpack, xeger... regex: 6 wallclock secs ( 5.71 usr + 0.00 sys = 5.71 CPU) @ 17 +5131.35/s (n=1000000) unpack: 7 wallclock secs ( 5.91 usr + 0.00 sys = 5.91 CPU) @ 16 +9204.74/s (n=1000000) xeger: 9 wallclock secs ( 9.26 usr + 0.00 sys = 9.26 CPU) @ 10 +7991.36/s (n=1000000)

So it seems that the regex and the unpack are very close to the same time. Notice that this is for 1 million iterations. All methods are fast. So you can use various methods for this task!

Thank you, goodnight.

Replies are listed 'Best First'.
Re: Removing Trailing Whitespace: Multiple ways.
by no_slogan (Deacon) on May 11, 2001 at 22:43 UTC
    Hey, good idea. The unpack solution assumes the length of the string is 8. It'll get a bit slower if it's generalized.

    Update: Strike that. unpack "A*" runs faster than unpack "A8" on my machine. Anyone have an idea why that might be?

    Benchmark: timing 1000000 iterations of unpack*, unpack8... unpack*: 7 wallclock secs ( 6.65 usr + 0.00 sys = 6.65 CPU) unpack8: 8 wallclock secs ( 7.40 usr + 0.00 sys = 7.40 CPU)
Re: Removing Trailing Whitespace: Multiple ways.
by perlmonkey (Hermit) on May 12, 2001 at 02:53 UTC
    If it is speed you are after, you can always put on your C hat:
    use Benchmark; use Inline C; timethese (1000000, { 'regex' => q{ my $foo = "test "; $foo =~ s/\s+$//; }, 'inline' => q{ my $foo = "test "; rmsp($foo); } }); __END__ __C__ void rmsp(char * str) { int i = strlen(str) - 1; while( i >= 0 && str[i] == ' ' ) { str[i--] = '\0'; } }


    Results:
    Benchmark: timing 1000000 iterations of inline, regex... inline: 4 wallclock secs ( 4.39 usr + 0.03 sys = 4.42 CPU) regex: 8 wallclock secs ( 6.84 usr + 0.03 sys = 6.87 CPU)

      That would be really cool, but rmsp() leaves the string null-padded instead of removing the extra blanks.
        Yah, so I was lazy and I cheated a bit. $foo will look good when you print it though. What can I say, there are reasons I am a perl programmer.

        But to be all official like and actually update the scalar value to be a proper string of length 4 you can use this code for rmsp:
        void rmsp(SV * sv) { char * end; int length; if( !SvPOK(sv) ) return; end = SvEND(sv); length = SvCUR(sv); end--; /* skip \0 */ while( *end == ' ' && length >=0 ) { end--; length--; } if( length >= 0 ) SvCUR_set(sv, length); }
        And the new results I got are:
        Benchmark: timing 1000000 iterations of inline, regex... inline: 6 wallclock secs ( 4.19 usr + 0.01 sys = 4.20 CPU) regex: 9 wallclock secs ( 7.71 usr + 0.02 sys = 7.73 CPU)
Re: Removing Trailing Whitespace: Multiple ways.
by tune (Curate) on May 11, 2001 at 23:07 UTC
    Just for statistics (on a K7-700) perl 5.6.0:
    Benchmark: timing 1000000 iterations of regex, unpack, xeger... regex: 4 wallclock secs ( 3.74 usr + 0.00 sys = 3.74 CPU) @ 26 +7379.68/s (n=1000000) unpack: 8 wallclock secs ( 6.34 usr + 0.01 sys = 6.35 CPU) @ 15 +7480.31/s (n=1000000) xeger: 8 wallclock secs ( 6.55 usr + 0.01 sys = 6.56 CPU) @ 15 +2439.02/s (n=1000000)

    --
    tune