Removing Trailing Whitespace: Multiple ways.

Well here it is: Proof that you can really perform a task in multiple ways. I needed an efficient way to cut off trailing whitespace. I was pulling values from a database and the scalars all had trailing whitespace. I had mistakenly made the column type CHAR(10) when it should have been a VARCHAR. Let me just say that the details of the job make it alright to cut the whitespace instead of fixing the underlying problem

So I had three ideas (none original) to perform this task. Here is my benchmark script.


use Benchmark;
timethese (1000000, {
        'unpack' => q{  my $foo = "test    ";
                        $foo = unpack("A8",$foo); },
        'regex'  => q{  my $foo = "test    ";
                        $foo =~ s/\s+$//;},
        'xeger'  => q{  my $foo = reverse "test    ";
                        $foo =~ s/^\s+//;
                        $foo = reverse $foo;}
} );
[download]

The standard regex, the unpack function, and the reverse regex. I figured the reverse one maybe wouldn't apply because the is no .* in the match. But it is cool enough an idea to try out.

Here are my results:

Benchmark: timing 1000000 iterations of regex, unpack, xeger...
     regex:  6 wallclock secs ( 5.71 usr +  0.00 sys =  5.71 CPU) @ 17
+5131.35/s (n=1000000)
    unpack:  7 wallclock secs ( 5.91 usr +  0.00 sys =  5.91 CPU) @ 16
+9204.74/s (n=1000000)
     xeger:  9 wallclock secs ( 9.26 usr +  0.00 sys =  9.26 CPU) @ 10
+7991.36/s (n=1000000)
[download]

So it seems that the regex and the unpack are very close to the same time. Notice that this is for 1 million iterations. All methods are fast. So you can use various methods for this task!

Thank you, goodnight.

Comment on Removing Trailing Whitespace: Multiple ways. Select or Download Code

Replies are listed 'Best First'.
Re: Removing Trailing Whitespace: Multiple ways. by no_slogan (Deacon) on May 11, 2001 at 22:43 UTC
Hey, good idea. The unpack solution assumes the length of the string is 8. It'll get a bit slower if it's generalized. Update: Strike that. unpack "A" runs faster than unpack "A8" on my machine. Anyone have an idea why that might be? `Benchmark: timing 1000000 iterations of unpack, unpack8... unpack*: 7 wallclock secs ( 6.65 usr + 0.00 sys = 6.65 CPU) unpack8: 8 wallclock secs ( 7.40 usr + 0.00 sys = 7.40 CPU)` [download]	[reply] [d/l]
Re: Removing Trailing Whitespace: Multiple ways. by perlmonkey (Hermit) on May 12, 2001 at 02:53 UTC
If it is speed you are after, you can always put on your C hat: `use Benchmark; use Inline C; timethese (1000000, { 'regex' => q{ my $foo = "test "; $foo =~ s/\s+$//; }, 'inline' => q{ my $foo = "test "; rmsp($foo); } }); __END__ __C__ void rmsp(char * str) { int i = strlen(str) - 1; while( i >= 0 && str[i] == ' ' ) { str[i--] = '\0'; } }` [download] Results: `Benchmark: timing 1000000 iterations of inline, regex... inline: 4 wallclock secs ( 4.39 usr + 0.03 sys = 4.42 CPU) regex: 8 wallclock secs ( 6.84 usr + 0.03 sys = 6.87 CPU)` [download]	[reply] [d/l] [select]
Re: Re: Removing Trailing Whitespace: Multiple ways. by no_slogan (Deacon) on May 12, 2001 at 03:15 UTC
That would be really cool, but rmsp() leaves the string null-padded instead of removing the extra blanks.	[reply]
Re: Re: Re: Removing Trailing Whitespace: Multiple ways. by perlmonkey (Hermit) on May 12, 2001 at 04:18 UTC
Yah, so I was lazy and I cheated a bit. $foo will look good when you print it though. What can I say, there are reasons I am a perl programmer. But to be all official like and actually update the scalar value to be a proper string of length 4 you can use this code for rmsp: `void rmsp(SV * sv) { char * end; int length; if( !SvPOK(sv) ) return; end = SvEND(sv); length = SvCUR(sv); end--; /* skip \0 / while( end == ' ' && length >=0 ) { end--; length--; } if( length >= 0 ) SvCUR_set(sv, length); }` [download] And the new results I got are: `Benchmark: timing 1000000 iterations of inline, regex... inline: 6 wallclock secs ( 4.19 usr + 0.01 sys = 4.20 CPU) regex: 9 wallclock secs ( 7.71 usr + 0.02 sys = 7.73 CPU)` [download]	[reply] [d/l] [select]
Re: Removing Trailing Whitespace: Multiple ways. by tune (Curate) on May 11, 2001 at 23:07 UTC
Just for statistics (on a K7-700) perl 5.6.0: `Benchmark: timing 1000000 iterations of regex, unpack, xeger... regex: 4 wallclock secs ( 3.74 usr + 0.00 sys = 3.74 CPU) @ 26 +7379.68/s (n=1000000) unpack: 8 wallclock secs ( 6.34 usr + 0.01 sys = 6.35 CPU) @ 15 +7480.31/s (n=1000000) xeger: 8 wallclock secs ( 6.55 usr + 0.01 sys = 6.56 CPU) @ 15 +2439.02/s (n=1000000)` [download] -- tune	[reply] [d/l]

Back to Meditations