Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules

by jwkrahn (Abbot)
on Aug 14, 2020 at 03:58 UTC ( [id://11120707]=note: print w/replies, xml ) Need Help??


in reply to How to trim a line from leading and trailing blanks without using regex or non-standard modules

(IMHO) the most common solution is:

s/^\s+//, s/\s+$// for $line;
  • Comment on Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
  • Download Code

Replies are listed 'Best First'.
Re^2: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by Marshall (Canon) on Aug 14, 2020 at 04:33 UTC
    That used to be the "standard way" to do this and was recommended in the Perl Docs and I thought it was just fine.
    s/^\s+|\s+$//g has been benchmarked. And I now think this is faster and "better" than 2 statements. There is one post at Re^3: script optmization that shows some benchmarks.

    This is certainly not an "abuse" of regex. This is what regex is is for! The Perl regex engine continually becomes better and usually faster between releases.

      That benchmark shows s/^\s+|\s+$//g as always being slower than two regexes. It proposes a third option, to extract the inner portion with a match, which is shows as faster in the benchmark given. But that is a very limited benchmark. A more complete benchmark shows that doing two regexes is almost always the fastest.

      use strict; use warnings; use Benchmark::Dumb qw(cmpthese); my @strings = ( no_trim_short => 'asd', no_trim_mid => 'asdasdasdasdasdasdasd', no_trim_long => 'asd' x 500, no_trim_mid_with_ws => 'asd asd asd asd asd asd asd', no_trim_long_with_ws => (join ' ', ('asd') x 500), short => ' asd ', mid => ' asdasdasdasdasdasdasd ', long => ' '.('asd' x 500).' ', mid_with_ws => ' asd asd asd asd asd asd asd ', long_with_ws => ' '.(join ' ', ('asd') x 500).' ', ); while (my ($name, $string) = splice @strings, 0, 2) { print "$name:\n"; cmpthese(0.0005, { global => sub { my $s = $string; $s =~ s/\A\s+|\s+\z//g; }, startend => sub { my $s = $string; s/\A\s+//, s/\s+\z// for $s; }, match => sub { my $s = $string; ($s) = $s =~ /\A\s*(.*?)\s*\z/s; }, }); print "\n"; } __END__ no_trim_short: Rate match global startend match 1.42373e+06+-420/s -- -29.2% -52.9% global 2.01075e+06+-840/s 41.2% -- -33.4% startend 3.02e+06+-1800/s 112.12+-0.14% 50.2+-0.11% -- no_trim_mid: Rate global match startend global 519190+-150/s -- -12.3% -81.1% match 591890+-290/s 14.0% -- -78.5% startend 2.7543e+06+-1400/s 430.5% 365.3% -- no_trim_long: Rate global match startend global 8420.17+-0.00081/s -- -31.7% -97.8% match 12324.1+-4/s 46.4% -- -96.8% startend 384590+-0.95/s 4467.5% 3020.6% -- no_trim_mid_with_ws: Rate global match startend global 388912+-98/s -- -19.5% -70.2% match 482948+-4/s 24.2% -- -63.0% startend 1.30366e+06+-19/s 235.2% 169.9% -- no_trim_long_with_ws: Rate global match startend global 5750.5+-2.8/s -- -33.6% -81.3% match 8663.4+-3.4/s 50.7% -- -71.9% startend 30807.4+-0.011/s 435.7% 255.6% -- short: Rate global startend match global 968450+-390/s -- -12.8% -32.0% startend 1.11124e+06+-460/s 14.7% -- -22.0% match 1.42383e+06+-490/s 47.0% 28.1% -- mid: Rate global match startend global 387160+-190/s -- -35.3% -56.4% match 598710+-260/s 54.64+-0.1% -- -32.5% startend 887420+-380/s 129.21+-0.15% 48.2% -- long: Rate global match startend global 8410.31+-0.0012/s -- -31.8% -97.2% match 12323.2+-4/s 46.5% -- -95.9% startend 298990+-140/s 3455.0% 2326.2% -- mid_with_ws: Rate global match startend global 303500+-130/s -- -37.2% -48.5% match 482925+-4/s 59.1% -- -18.0% startend 589220+-300/s 94.14+-0.13% 22.0% -- long_with_ws: Rate global match startend global 5691.4+-2.6/s -- -34.4% -81.1% match 8672.9+-4/s 52.4% -- -71.1% startend 30035.1+-0/s 427.7% 246.3% --
        I don't know where I got the idea in my brain that the global substitute expression was faster. It could very well be that I figured that in some normal case, the difference didn't matter based upon some other benchmark.

        I re-wrote your benchmark and yes, this does show that using 2 lines of Perl to do this is faster!
        I show my code below..

        use strict; use warnings; use Benchmark qw(cmpthese); use Data::Dumper; use List::Util qw(pairmap); my @strings = ( no_trim_short => 'asd', no_trim_mid => 'asdasdasdasdasdasdasd', no_trim_long => 'asd' x 50, no_trim_mid_with_ws => 'asd asd asd asd asd asd asd', no_trim_long_with_ws => (join ' ', ('asd') x 20), short => ' asd ', mid => ' asdasdasdasdasdasdasd ', long => ' '.('asd' x 20).' ', mid_with_ws => ' asd asd asd asd asd asd asd ', long_with_ws => ' '.(join ' ', ('asd') x 20).' ', ); my @test_line_array = pairmap{$b}@strings; #print "$_\n" for @test_line_array; #for debug cmpthese (100000, { global => sub{ my @copy = @test_line_array; $_ =~ s/\A\s+|\s+\z//g foreach @copy; }, global2 => sub{ my @copy = @test_line_array; $_ =~ s/^\s+|\s+$//g foreach @copy; }, twoLines => sub { my @copy = @test_line_array; s/\A\s+//, s/\s+\z// foreach @copy; }, twoLines2 => sub { my @copy = @test_line_array; s/^\s+//, s/\s+$// foreach @copy; }, dollar1 => sub { my @copy = @test_line_array; s/\A\s*(.*?)\s*\z/$1/ foreach @copy; } }); print "\n"; __END__ Rate global global2 dollar1 twoLines twoLines2 global 33681/s -- 0% -21% -68% -68% global2 33681/s 0% -- -21% -68% -68% dollar1 42662/s 27% 27% -- -60% -60% twoLines 106610/s 217% 217% 150% -- -0% twoLines2 106724/s 217% 217% 150% 0% --

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120707]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-19 17:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found