Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

How to trim a line from leading and trailing blanks without using regex or non-standard modules

by likbez (Sexton)
on Aug 14, 2020 at 02:24 UTC ( #11120704=perlquestion: print w/replies, xml ) Need Help??

likbez has asked for the wisdom of the Perl Monks concerning the following question:

Is there any way to trim both leading and trailing blanks in a text line (one of the most common operations in text processing; often implemented as trim function which BTW was present in Perl 6) without resorting to regular expressions (which are definitely an overkill for this particular purpose)? This is clearly an important special case.

So far the most common solution is to use something like $line =~ s/^\s+|\s+$//g which clearly is an abuse of regex.

See, for example, https://perlmaven.com/trim

Or install String::Util which is a not a standard module and as such creates difficulties in enterprise env.

  • Comment on How to trim a line from leading and trailing blanks without using regex or non-standard modules
  • Select or Download Code

Replies are listed 'Best First'.
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by hippo (Chancellor) on Aug 14, 2020 at 06:46 UTC
    without resorting to regular expressions (which are definitely an overkill for this particular purpose)?

    Sure, just write your own function to do it. Having written that you will then come to the conclusion that regular expressions are definitely not an overkill for this particular purpose.

    This is clearly an important special case. ... which clearly is an abuse of regex.

    You keep using that word. I don't think it means what you think it means.


    🦛

Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by LanX (Cardinal) on Aug 14, 2020 at 03:28 UTC
      So if you want the exact same semantic, it'll become far more complicated than this regex.

      I agree. That's a good point. Thank you !

      In other words it is not easy to design a good trim function without regex, but it is possible to design one that used regex, but treating the single quoted string as a special case

      For example

      trim(' ',$line)
      vs
      trim(/\s/.$line)
      BTW this is impossible in Python which implements regex via library, unless you add a new lexical type to the Language (regex string instead of raw string that is used).
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by kcott (Bishop) on Aug 14, 2020 at 09:35 UTC

    G'day likbez,

    I will usually reach for one of Perl's string handling functions (e.g. index, rindex, substr, and so on) in preference to a regex when that is appropriate; however, in this case, I would say that the regex makes for much cleaner code.

    You could implement a trim() function using the guts of this code (which uses neither a regex nor any modules, standard or otherwise):

    $ perl -E ' my @x = (" a b c ", "d e f ", " g h i", "j k l", " ", ""); say "*** Initial strings ***"; say "|$_|" for @x; for my $i (0 .. $#x) { my $str = $x[$i]; while (0 == index $str, " ") { $str = substr $str, 1; } my $str_end = length($str) - 1; while ($str_end == rindex $str, " ") { $str = substr $str, 0, $str_end; --$str_end; } $x[$i] = $str; } say "*** Final strings ***"; say "|$_|" for @x; ' *** Initial strings *** | a b c | |d e f | | g h i| |j k l| | | || *** Final strings *** |a b c| |d e f| |g h i| |j k l| || ||

    If your question was genuinely serious, please Benchmark a trim() function using something like I've provided against another trim() function using a regex. You could obviously do the same for ltrim() and rtrim() functions.

    [As others have either asked or alluded to, please explain phrases such as "definitely an overkill", "important special case" and "abuse of regex". Unfortunately, use of such language makes your post come across as some sort of trollish rant — I'm not saying that was your intent, just how it presents itself.]

    — Ken

        G'day Rolf,

        That's a valid point. My main intent with that code was really to show the complexity of the solution when a regex or module were not used. Anyway, adding a little more complexity, you can trim whatever blanks you want:

        $ perl -E ' my @blanks = (" ", "\n", "\r", "\t"); my @x = ( " a b c ", "d e f \r ", " \t g h i", "j k l", " ", "\n", "\n\nXYZ\n\n", "" ); say "*** Initial strings ***"; say "|$_|" for @x; for my $i (0 .. $#x) { my $str = $x[$i]; while (grep { 0 == index $str, $_ } @blanks) { $str = substr $str, 1; } my $str_end = length($str) - 1; while (grep { $str_end == rindex $str, $_ } @blanks) { $str = substr $str, 0, $str_end; --$str_end; } $x[$i] = $str; } say "*** Final strings ***"; say "|$_|" for @x; ' *** Initial strings *** | a b c | | e f | g h i| |j k l| | | | | | XYZ | || *** Final strings *** |a b c| |d e f| |g h i| |j k l| || || |XYZ| ||

        You're quite correct about "The OP should be clearer ...". The word 'blank' is often used to mean various things: a single space, multiple consecutive spaces, a whitepace character, multiple consecutive whitepace characters, and I have also seen it used to refer to a zero-length string. Similarly, the word 'space' can mean a single space, any gap between visible characters, and so on. So, as with many posts, we're left with guessing the most likely meaning from the context.

        My belief, that a regex is a better option, strengthens as the complexity of the non-regex and non-module code increases. :-)

        — Ken

Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by jwkrahn (Monsignor) on Aug 14, 2020 at 03:58 UTC

    (IMHO) the most common solution is:

    s/^\s+//, s/\s+$// for $line;
      That used to be the "standard way" to do this and was recommended in the Perl Docs and I thought it was just fine.
      s/^\s+|\s+$//g has been benchmarked. And I now think this is faster and "better" than 2 statements. There is one post at Re^3: script optmization that shows some benchmarks.

      This is certainly not an "abuse" of regex. This is what regex is is for! The Perl regex engine continually becomes better and usually faster between releases.

        That benchmark shows s/^\s+|\s+$//g as always being slower than two regexes. It proposes a third option, to extract the inner portion with a match, which is shows as faster in the benchmark given. But that is a very limited benchmark. A more complete benchmark shows that doing two regexes is almost always the fastest.

        use strict; use warnings; use Benchmark::Dumb qw(cmpthese); my @strings = ( no_trim_short => 'asd', no_trim_mid => 'asdasdasdasdasdasdasd', no_trim_long => 'asd' x 500, no_trim_mid_with_ws => 'asd asd asd asd asd asd asd', no_trim_long_with_ws => (join ' ', ('asd') x 500), short => ' asd ', mid => ' asdasdasdasdasdasdasd ', long => ' '.('asd' x 500).' ', mid_with_ws => ' asd asd asd asd asd asd asd ', long_with_ws => ' '.(join ' ', ('asd') x 500).' ', ); while (my ($name, $string) = splice @strings, 0, 2) { print "$name:\n"; cmpthese(0.0005, { global => sub { my $s = $string; $s =~ s/\A\s+|\s+\z//g; }, startend => sub { my $s = $string; s/\A\s+//, s/\s+\z// for $s; }, match => sub { my $s = $string; ($s) = $s =~ /\A\s*(.*?)\s*\z/s; }, }); print "\n"; } __END__ no_trim_short: Rate match global startend match 1.42373e+06+-420/s -- -29.2% -52.9% global 2.01075e+06+-840/s 41.2% -- -33.4% startend 3.02e+06+-1800/s 112.12+-0.14% 50.2+-0.11% -- no_trim_mid: Rate global match startend global 519190+-150/s -- -12.3% -81.1% match 591890+-290/s 14.0% -- -78.5% startend 2.7543e+06+-1400/s 430.5% 365.3% -- no_trim_long: Rate global match startend global 8420.17+-0.00081/s -- -31.7% -97.8% match 12324.1+-4/s 46.4% -- -96.8% startend 384590+-0.95/s 4467.5% 3020.6% -- no_trim_mid_with_ws: Rate global match startend global 388912+-98/s -- -19.5% -70.2% match 482948+-4/s 24.2% -- -63.0% startend 1.30366e+06+-19/s 235.2% 169.9% -- no_trim_long_with_ws: Rate global match startend global 5750.5+-2.8/s -- -33.6% -81.3% match 8663.4+-3.4/s 50.7% -- -71.9% startend 30807.4+-0.011/s 435.7% 255.6% -- short: Rate global startend match global 968450+-390/s -- -12.8% -32.0% startend 1.11124e+06+-460/s 14.7% -- -22.0% match 1.42383e+06+-490/s 47.0% 28.1% -- mid: Rate global match startend global 387160+-190/s -- -35.3% -56.4% match 598710+-260/s 54.64+-0.1% -- -32.5% startend 887420+-380/s 129.21+-0.15% 48.2% -- long: Rate global match startend global 8410.31+-0.0012/s -- -31.8% -97.2% match 12323.2+-4/s 46.5% -- -95.9% startend 298990+-140/s 3455.0% 2326.2% -- mid_with_ws: Rate global match startend global 303500+-130/s -- -37.2% -48.5% match 482925+-4/s 59.1% -- -18.0% startend 589220+-300/s 94.14+-0.13% 22.0% -- long_with_ws: Rate global match startend global 5691.4+-2.6/s -- -34.4% -81.1% match 8672.9+-4/s 52.4% -- -71.1% startend 30035.1+-0/s 427.7% 246.3% --
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by perlfan (Priest) on Aug 14, 2020 at 12:23 UTC
    >$line =~ s/^\s+|\s+$//g which clearly is an abuse of regex.

    Why do you say that?

    >trim function which BTW was present in Perl 6

    You say this like it's a good thing. I bet there is also one in PHP.

      You won

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        easily re-implemented in Perl. It seems ...

        DB<33> sub trim { $_[1] //= qr/\s/; $_[0] =~ s/^[$_[1]]+|[$_[1]]+$// +g } DB<34> $a = $b = " \n . aaa . \n " DB<35> trim $a DB<36> trim $b, " " DB<37> x $a,$b 0 '. aaa .' 1 ' . aaa . ' DB<38>

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11120704]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2020-09-18 20:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I don’t succeed, I …










    Results (113 votes). Check out past polls.

    Notices?