Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Removing Trailing Whitespace: Multiple ways.

by DeaconBlues (Monk)
on May 11, 2001 at 22:35 UTC ( [id://79807]=perlmeditation: print w/replies, xml ) Need Help??

Well here it is: Proof that you can really perform a task in multiple ways. I needed an efficient way to cut off trailing whitespace. I was pulling values from a database and the scalars all had trailing whitespace. I had mistakenly made the column type CHAR(10) when it should have been a VARCHAR. Let me just say that the details of the job make it alright to cut the whitespace instead of fixing the underlying problem

So I had three ideas (none original) to perform this task. Here is my benchmark script.

use Benchmark; timethese (1000000, { 'unpack' => q{ my $foo = "test "; $foo = unpack("A8",$foo); }, 'regex' => q{ my $foo = "test "; $foo =~ s/\s+$//;}, 'xeger' => q{ my $foo = reverse "test "; $foo =~ s/^\s+//; $foo = reverse $foo;} } );

The standard regex, the unpack function, and the reverse regex. I figured the reverse one maybe wouldn't apply because the is no .* in the match. But it is cool enough an idea to try out.

Here are my results:

Benchmark: timing 1000000 iterations of regex, unpack, xeger... regex: 6 wallclock secs ( 5.71 usr + 0.00 sys = 5.71 CPU) @ 17 +5131.35/s (n=1000000) unpack: 7 wallclock secs ( 5.91 usr + 0.00 sys = 5.91 CPU) @ 16 +9204.74/s (n=1000000) xeger: 9 wallclock secs ( 9.26 usr + 0.00 sys = 9.26 CPU) @ 10 +7991.36/s (n=1000000)

So it seems that the regex and the unpack are very close to the same time. Notice that this is for 1 million iterations. All methods are fast. So you can use various methods for this task!

Thank you, goodnight.

Replies are listed 'Best First'.
Re: Removing Trailing Whitespace: Multiple ways.
by no_slogan (Deacon) on May 11, 2001 at 22:43 UTC
    Hey, good idea. The unpack solution assumes the length of the string is 8. It'll get a bit slower if it's generalized.

    Update: Strike that. unpack "A*" runs faster than unpack "A8" on my machine. Anyone have an idea why that might be?

    Benchmark: timing 1000000 iterations of unpack*, unpack8... unpack*: 7 wallclock secs ( 6.65 usr + 0.00 sys = 6.65 CPU) unpack8: 8 wallclock secs ( 7.40 usr + 0.00 sys = 7.40 CPU)
Re: Removing Trailing Whitespace: Multiple ways.
by perlmonkey (Hermit) on May 12, 2001 at 02:53 UTC
    If it is speed you are after, you can always put on your C hat:
    use Benchmark; use Inline C; timethese (1000000, { 'regex' => q{ my $foo = "test "; $foo =~ s/\s+$//; }, 'inline' => q{ my $foo = "test "; rmsp($foo); } }); __END__ __C__ void rmsp(char * str) { int i = strlen(str) - 1; while( i >= 0 && str[i] == ' ' ) { str[i--] = '\0'; } }


    Results:
    Benchmark: timing 1000000 iterations of inline, regex... inline: 4 wallclock secs ( 4.39 usr + 0.03 sys = 4.42 CPU) regex: 8 wallclock secs ( 6.84 usr + 0.03 sys = 6.87 CPU)

      That would be really cool, but rmsp() leaves the string null-padded instead of removing the extra blanks.
        Yah, so I was lazy and I cheated a bit. $foo will look good when you print it though. What can I say, there are reasons I am a perl programmer.

        But to be all official like and actually update the scalar value to be a proper string of length 4 you can use this code for rmsp:
        void rmsp(SV * sv) { char * end; int length; if( !SvPOK(sv) ) return; end = SvEND(sv); length = SvCUR(sv); end--; /* skip \0 */ while( *end == ' ' && length >=0 ) { end--; length--; } if( length >= 0 ) SvCUR_set(sv, length); }
        And the new results I got are:
        Benchmark: timing 1000000 iterations of inline, regex... inline: 6 wallclock secs ( 4.19 usr + 0.01 sys = 4.20 CPU) regex: 9 wallclock secs ( 7.71 usr + 0.02 sys = 7.73 CPU)
Re: Removing Trailing Whitespace: Multiple ways.
by tune (Curate) on May 11, 2001 at 23:07 UTC
    Just for statistics (on a K7-700) perl 5.6.0:
    Benchmark: timing 1000000 iterations of regex, unpack, xeger... regex: 4 wallclock secs ( 3.74 usr + 0.00 sys = 3.74 CPU) @ 26 +7379.68/s (n=1000000) unpack: 8 wallclock secs ( 6.34 usr + 0.01 sys = 6.35 CPU) @ 15 +7480.31/s (n=1000000) xeger: 8 wallclock secs ( 6.55 usr + 0.01 sys = 6.56 CPU) @ 15 +2439.02/s (n=1000000)

    --
    tune

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://79807]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2025-06-21 22:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.