Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Sorting path names with Sort::Key

by Anonymous Monk
on Jan 10, 2025 at 02:46 UTC ( [id://11163653]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to sort a bunch of path names and wanted to use Sort::Key because I'm already using it several other places in my code. A proper path sorting algorithm requires splitting the path into its components for comparison, and the number of components can be arbitrary. All of the documentation for Sort::Key and its family of other modules assume the number of comparison keys is known. Is it possible to use Sort::Key for this use case and if so, how? Thanks.

Below is the script I am using for testing other sort routines:

use List::Util qw(min); my @path = qw( /abc /abc/1/2/a/b/c /abc/2 /abc/def/1/2/a/b/c /abc!def/1/2/a/b/c /abc-def/1/2/a/b /abcdef/1/2/a/b/c /usr/a ); say 'Wanted path order:'; say ' 'x4, $_ for @path; my @sorted = sort @path; say 'Lexical order (default sort):'; say ' 'x4, $_ for @sorted; say 'Ordered by dirname, then basename:'; @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] or $a->[2] cmp $b->[2] } map { [$_, m{^(.*/)?(.*)}s ] } @path; say ' 'x4, $_ for @sorted; say 'Ordered by path components:'; @sorted = sort bypath @path; say ' 'x4, $_ for @sorted; say 'Ordered by path components (with caching):'; @sorted = map { $_->[0] } sort { for my $i (0 .. min $#{$a->[1]}, $#{$b->[1]}) { return $a->[1][$i] cmp $b->[1][$i] if $a->[1][$i] ne $b->[ +1][$i]; } return $#{$a->[1]} <=> $#{$b->[1]}; } map { [$_, [split '/'] ] } @path; say ' 'x4, $_ for @sorted; exit; # https://stackoverflow.com/questions/5124994/sorting-directory-filena +mes-with-perl/19506331#19506331 sub bypath { my @a = split m'/', $a; my @b = split m'/', $b; for (my $i = 0; $i <= $#a; $i++) { last if $i > $#b; return $a[$i] cmp $b[$i] if $a[$i] cmp $b[$i]; } return $#a <=> $#b; }

Replies are listed 'Best First'.
Re: Sorting path names with Sort::Key
by salva (Canon) on Jan 10, 2025 at 09:05 UTC
    Just replace the slashes in the paths by character 0x00.
    # untested @sorted = keysort { s|[\\\/]|\x00|gr } @paths
    You may also need to normalize the paths so that cases like ./foo and foo are sorted properly.

      A couple of tips:

      • Paths often have numbers, so they're a prime candidate for natural sorting. The same distro provides the means to do this.
      • tr/// is faster than s///.
      use Sort::Key::Natural qw( natkeysort ); my @sorted = natkeysort { tr|\\/|\0|r } @paths;

      Of course, this still ignores case-insensitive volumes, as well as paths containing «.» and «..». But what can you do? (Well, «.» can be addressed, but there's no obvious solution if any exist for the rest.)

        I would agree if this produced output like keysort but with better number handling, but because the null characters are treated equivalently to any other non-word characters, and natsort splits the words on these, the output differs more than just on numbers. Note that 'abc-def' now sorts before '/abc/def' in the output:
        Ordered with keysort: /abc /abc/1/2/a/b/c /abc/2 /abc/22 /abc/3 /abc/def/1/2/a/b/c /abc!def/1/2/a/b/c /abc-def/1/2/a/b /abcdef/1/2/a/b/c /usr/a Ordered with natkeysort: /abc /abc/1/2/a/b/c /abc/2 /abc/3 /abc/22 /abc-def/1/2/a/b /abc/def/1/2/a/b/c /abc!def/1/2/a/b/c /abcdef/1/2/a/b/c /usr/a
        Of course, this still ignores case-insensitive volumes, as well as paths containing «.» and «..». But what can you do? (Well, «.» can be addressed, but there's no obvious solution if any exist for the rest.)
        There's also portability issues like what File::Spec handles. Volumes and different path separators, etc. You can also be processing a list of paths generated on a different system (e.g. processing windows paths on unix).
      Thank you, works like a charm!
Re: Sorting path names with Sort::Key
by harangzsolt33 (Deacon) on Jan 10, 2025 at 09:36 UTC
    What if we would encode the path depth as a character and place it in front of each path and also convert every '/' character to "\xFF" temporarily while sorting? What do you think?

    #!/usr/bin/perl use strict; use warnings; my @path = qw( /abc/def/1/2/3/4/5/6/7/8 /abc /abc/1/2/a/b/c /abc/2 /tmp/1/2/a/../b/c /abc/def/1/2/a/b/c abc/def/../../john/Desktop/work abc/../john/Desktop/work ../john/Desktop/work /abc!def/1/2/a/b/c/. /abc/def /abc-def/1/2/a/b /./home ./abcdef/1/2/a/b/c /tmp /usr/a ); foreach (@path) { # Let's get rid of . and .. $_ =~ s!^\./!!; $_ =~ s!^/\./!/!; $_ =~ s!/\.$!/!; $_ =~ s!/\./$!/!; $_ =~ s!/\./!/!g; while ($_ =~ s![^/.]+/[.]{2}/!!) {} # Next, we change every "/" to "\xFF" # and insert the path depth encoded as a # character in front of the path. my $test = substr($_, 1); my $depth = $test =~ tr|/|/|; $_ =~ tr|/|\xff|; $_ = chr($depth) . $_; } @path = sort @path; # Sort list foreach (@path) { $_ = substr($_, 1); $_ =~ tr|\xff|/|; # Undo changes print "\n $_"; # Print sorted list }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11163653]
Approved by LanX
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2025-01-24 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (68 votes). Check out past polls.