Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^8: schwartzian transform problem

by ikegami (Patriarch)
on Mar 25, 2025 at 23:02 UTC ( [id://11164431]=note: print w/replies, xml ) Need Help??


in reply to Re^7: schwartzian transform problem
in thread schwartzian transform problem - Solved

You forgot to include the solution I provided! And let's include a basic use of sort to see what that looks like and how the others compare to that.

Rate basic GRT ST keysort basic 1.66/s -- -88% -89% -91% GRT 13.3/s 701% -- -12% -25% ST 15.1/s 810% 14% -- -15% keysort 17.7/s 972% 34% 18% -- Rate basic GRT ST keysort basic 1.79/s -- -87% -89% -91% GRT 13.5/s 654% -- -15% -30% ST 15.8/s 786% 18% -- -18% keysort 19.3/s 980% 43% 22% -- Rate basic GRT ST keysort basic 1.70/s -- -87% -89% -91% GRT 13.3/s 681% -- -15% -27% ST 15.6/s 817% 17% -- -14% keysort 18.3/s 972% 37% 17% --

Sort::Key isn't only the cleanest and simplest of all the solutions (including the builtin sort), it's the fastest! It's 17-22% faster than the next fastest.

#!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); my @unsorted = split /^(?=>>> )/m, read_text( "try3.txt" ); @unsorted = ( @unsorted ) x 10_000; # 90_000 lines (30_000 records) sub basic { my @sorted = sort { my ( $an ) = $a =~ /(\d+)%/; my ( $bn ) = $b =~ /(\d+)%/; $bn <=> $an } @unsorted; } sub ST { my @sorted = map $_->[0], sort { $b->[1] <=> $a->[1] } map [ $_, /(\d+)%/ ], @unsorted; } sub GRT { my @sorted = map substr( $_, 4 ), sort map { /(\d+)%/ ? ( ~ pack( "N", $1 ) . $_ ) : () } @unsorted; } sub keysort { my @sorted = rikeysort { ( /(\d+)%/ )[0] } @unsorted; } cmpthese( -3, { basic => \&basic, ST => \&ST, GRT => \&GRT, keysort => \&keysort, } );

Note: I found substr( $_, 4 ) to be slightly faster than unpack( "xa4", $_ ), thus the change in GRT.

Replies are listed 'Best First'.
Re^9: schwartzian transform problem
by Anonymous Monk on Mar 26, 2025 at 22:21 UTC

    Regexes ARE slow (of course everyone here knows). To the extent I can trust my Strawberries and/or my crippled Kaby Lake to benchmark anything, the next is twice as fast (+ I guess rnkeysort should have been used (and it's slower)):

    sub keysort1 { my @sorted = rnkeysort { substr $_, rindex( $_, '%' ) - 3, 3 } @uns +orted; }

    And assuming $s keeps the (un|pre)split input, and if each 2nd '%' is guaranteed to be an anchor, even if records have different length/layout, then the next unclean ugly one is still faster than keysort for me:

    sub test1 { my $i = my $j = 0; my @nums; $j ^= 1 or push @nums, substr $s, $i - 3, 3 while -1 != ( $i = index $s, '%', $i + 1 ); my @sorted = @unsorted [ sort { $nums[$b] <=> $nums[$a] } 0 .. $#nums ] } __END__ Rate keysort test1 keysort 21.6/s -- -30% test1 30.6/s 42% --

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11164431]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2025-06-21 18:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.