Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Any caveats in using unpack to right-trim? Why isn't it advertised more?

by Anonymous Monk
on Feb 08, 2025 at 11:52 UTC ( [id://11163952]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I'm not targeting latest Perl versions and their 'trim' built-in, and reluctant to depend on other modules, -- then practically every FAQ or cookbook or tutorial recommend what's "usual" method in test below. Now, I have fair amount of medium-size "records" (variable length and containing white-space somewhere in the middle, NOT as in test below), which may terminate in unpredictable WS sequences, which I want to trim. Performance freak, I've been unsatisfied with the speed and came with "better" version. Then accidentally discovered "unpack" method. Now I'm puzzled if this nice and fast way is kept secret and why :-)

use strict; use warnings; use 5.014; # s///r use Benchmark 'cmpthese'; my @a = map { ( 'a' x 123 ) . ( ' ' x rand 2 ) . ( "\n" x rand 2 ) } 0 .. 999; cmpthese -1, { usual => sub { my @ret = map { s/\s*$//r } @a; \@ret }, better => sub { my @ret = map { s/.*\S\K\s*$//sr } @a; \@ret }, unpack => sub { my @ret = map { unpack 'A*', $_ } @a; \@ret }, }; __END__ Rate usual better unpack usual 159/s -- -88% -95% better 1284/s 707% -- -56% unpack 2899/s 1722% 126% --
  • Comment on Any caveats in using unpack to right-trim? Why isn't it advertised more?
  • Download Code

Replies are listed 'Best First'.
Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by choroba (Cardinal) on Feb 08, 2025 at 21:33 UTC
    Not sure why you mention trim, when what you show here is rtrim (which is not a builtin). When I added trim, it didn't perform that bad, especially if we consider it has to attempt to left trim, too:
    usual 290/s -- -87% -92% -95% better 2221/s 666% -- -42% -64% trim 3829/s 1220% 72% -- -38% unpack 6164/s 2026% 178% 61% --

    Also, the "better" substitution returns a different output for strings containing only whitespace.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by Fletch (Bishop) on Feb 08, 2025 at 22:21 UTC

    Just as a note trim was only fairly recently added in 5.36 so anything older than 2.5-ish years at this point that hasn't been updated wouldn't even have considered it an option. But I'd say that (IMHO at least) unpack is one of those darker perl4-ish corners that do often get overlooked.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by roho (Bishop) on Feb 09, 2025 at 06:25 UTC
    I have mixed feelings about this. On one hand, I find it fascinating that unpack does this and does it so much faster. On the other hand, unless my application is processing huge volumes of text, for the sake of clarity I would opt for the "usual" way.

    If I were in a situation where this feature of unpack provided a noticeable performance boost (based on the application and the data being processed) I would add a generous dose of comments explaining what was being accomplished using unpack this way, and why, for the sake (and sanity) of future maintainers. :)

    "It's not how hard you work, it's how much you get done."

      I have mixed feelings about this.

      I'm a bit puzzled by it.
      The "A*" format apparently removes trailing whitespace, but not embedded whitespace:
      D:\>perl -wle "print '#' . unpack('A*', 'abc ') . '#';" #abc# D:\>perl -wle "print '#' . unpack('A*', 'abc d') . '#';" #abc d#
      How is all of that explained as being "expected behaviour" ?
      Duh - it's explicitly stated in the documentation - " When unpacking, "A" strips trailing whitespace and nulls".

      Cheers,
      Rob
Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by Tux (Canon) on Feb 09, 2025 at 09:23 UTC

    Today I learned ...

    Using unpack since perl-4.017, I never before realized that unpack "A*" also removed trailing newlines!


    Enjoy, Have FUN! H.Merijn
Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by Anonymous Monk on Feb 08, 2025 at 13:59 UTC

    (Semi-)answering self: with s/\s+$// for "usual", it's now slightly faster than "better", -- my bad, so then no need to invent "better", I guess some optimizations kick in. But "unpack" is still twice as fast, and questions in the title remain.

Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by ikegami (Patriarch) on Feb 09, 2025 at 18:24 UTC

    Any caveats in using unpack to right-trim?

    There are two:

    • It doesn't remove non-ASCII whitespace characters.
      $ perl -Mv5.14 -e'say length "a\xA0" =~ s/\s+\z//r' 1 $ perl -Mv5.14 -e'say length unpack "A*", "a\xA0"' 2
    • It removes NUL.
      $ perl -Mv5.14 -e'say length "a\x00" =~ s/\s+\z//r' 2 $ perl -Mv5.14 -e'say length unpack "A*", "a\x00"' 1

    I've been unsatisfied with the speed and came with "better" version.

    Why doesn't builtin offer this???

Re: Any caveats in using unpack to right-trim? Why isn't it advertised more?
by Anonymous Monk on Feb 09, 2025 at 18:55 UTC
    This was even faster than unpack:
    chomp => sub { my @ret = @a; local $/ = ' '; 1 while chomp @ret; local $/ = "\n"; 1 while chomp @ret; \@ret; },

      Not nearly equivalent. \s matches much more than two characters. And this doesn't even handle the two characters it attempts to handle properly, since it doesn't trim <SP><LF><SP><LF>.

      But it does show that an XS solution that starts at the tail and works in-place would be fastest.

      This brings up my earlier comment, why doesn't builtin offer this???

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11163952]
Approved by choroba
Front-paged by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2025-05-22 06:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.