Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Excel-like sort for multiple fields unix 'ls -l' output

by cmv (Chaplain)
on Jul 08, 2010 at 16:25 UTC ( #848721=perlquestion: print w/replies, xml ) Need Help??
cmv has asked for the wisdom of the Perl Monks concerning the following question:


I have the need to allow a GUI user to select various ways to sort multi-line text output from a (SunOS based) 'ls -l' command (think auto-sort on excel).

Given the ls format as follows:

2 drwxr-x--- 2 myuser mygroup 512 Jun 14 09:20 filename
In my opinion, they may choose to sort on any field except file permissions (but then again...), here's a list from right-to-left (as I understand them) and how they should be sorted:

  • filename - alphabetic
  • timestamp - oldest to newest
  • #ofBytes - numeric
  • group - alphabetic
  • user - alphabetic
  • #ofLinks - numeric
  • filePermissions - ??
  • #ofBlocks - numeric

    I have all the data in a list (it comes from a remote host, so I can't access the file information directly from perl).

    My current plan is to create a sub that is called like this:

    @sorted = fileSort($field, @data);
    This sub should be smart enough to do the right thing when sorting the various fields. I'm looking for advice on how to design it (or pointers to an existing sub that does this, or CPAN modules that may be helpful).

    So far, my investigations initially led me to Sort::Fields which gives me a nice framework to do field sorting on a list of text, but would break down on the timestamp stuff (I could pre-process that to shoehorn it to work with this module).

    Am I on the right track here, or is there something better I should be considering? I have some fears about the following:

  • timestamp - sorting this will be the biggest headache, I first need to collect all the correct "fields" that make up the timestamp, then need to change the variable ways that ls prints timestamps back into EPOCH time (using Date::Calc?) for numerical sorting, then put the initial text back in the final sorted list.
  • filename - Typical filename fears: what about goofy characters (spaces, non-ascii, etc).
  • group/user - Do I have the same problem here with goofy characters?
  • filepermissions - Ok, if I talk myself into allowing the user to sort on this, what way should it be sorted?

    Any thoughts or suggestions are appreciated!



  • Replies are listed 'Best First'.
    Re: Excel-like sort for multiple fields unix 'ls -l' output
    by sierpinski (Chaplain) on Jul 08, 2010 at 17:25 UTC
      Ok, based on what I've read here, the first thing that comes to mind to handle the date is to parse out the 3 fields that comprise the date. One factor that you may be missing is that in general, files older than a year don't show the time, instead they show like this: (from my Solaris workstation):
      drwxr-x--- 4 root root 6 Jan 29 11:03 data drwxr-xr-x 2 root root 4 Nov 6 2008 Desktop
      One option, at least on Solaris, (I don't know if all flavors of unix share this parameter) you can use -e to list the time in a standard format regardless of how old the file is:
      drwxr-x--- 4 root root 6 Jan 29 11:03:30 2010 data drwxr-xr-x 2 root root 4 Nov 6 11:54:50 2008 Desktop
      Now, that being said, you know what columns comprise the date, you can capture those (you can use split to separate by whitespace, then use columns 6, 7, 8, 9 for the date in this case) and build the date yourself into a format that Date::Calc (or whatever date function you want) can read.

      Once you get to that point, you should have all of the information you need to perform the sort functions that you like. I've never used Sort::Fields, but if its an easy framework to use, I'm sure it shouldn't be too difficult to set up to sort on whatever column you click on.

      For the data structure, again the first thing that comes to mind is an array of hashes. Each hash representing a line in the listing (or it could be an array of arrays, but I like having non-numeric keys) and the array would represent that directory's contents.

      I can't imagine a good reason for sorting on the permissions, but I suppose it could be done. They are alphanumeric characters, and are in order (r < w < x) to match the sequence shown. Doing queries to see if a file is executable,writable,etc could easily be done with a regex on field 2 (note in my listing on Solaris 10, the first field is the permissions, and second field are the links -- not sure why yours are different, haven't looked into it) but whether or not you want to do that is entirely up to you of course.

      Hope that helps!
    Re: Excel-like sort for multiple fields unix 'ls -l' output
    by Utilitarian (Vicar) on Jul 08, 2010 at 17:32 UTC
      Hi Craig, You need to use different sorts based on whether this is numeric , alphanumeric or timestamp data.

      I would suggest you use a dispatch table as your sort block, ie:

      sub sortFiles{ my ($field, @data)=@_; my $sortFunction={ timestamp => \&dateSort, filename => \&alphasort, group => \&alphasort, owner => \&alphasort, permissions => \&permSort, } return sort $sortFunction{$field} @data; }
      The permissions I'd sort them by their numeric equivalent.

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
    Re: Excel-like sort for multiple fields unix 'ls -l' output
    by graff (Chancellor) on Jul 08, 2010 at 21:18 UTC
      While you are constrained as to the kind of input you get, maybe you don't need to stick rigorously to that constraint when presenting the info to the user. I would seriously consider editing the input list to normalize the date field to a consistent form, so that sorting on the date field will be trivial (as opposed to being complicated).

      As suggested in an earlier reply, you should (if at all possible) use the SunOS 'ls' option that yields a consistent date format in its output, then use one of the many Date::whatever modules (or a regex) to convert that to YYYY-MM-DD HR:MI:SC, which doesn't need any special treatment when sorting (ascii-betic sort == chronological sort). People ought to find this display format easy enough to read, and you can use that edited form of the data for both sorting and display.

      Don't hesitate to allow sorting on the permission field -- an (ascending or descending) ascii-betic sort on that can be surprisingly handy in some situations (e.g. to understand why some people aren't able to see the contents of some files...)

      Meanwhile, the link-count and block-count fields are relatively worthless -- if you're going to be editing the 'ls' output anyway (to make the date field manageable), I would take those two out completely; don't even show them to a GUI user.

    Log In?

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: perlquestion [id://848721]
    Approved by moritz
    and the monks are chillaxin'...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (6)
    As of 2018-06-19 16:50 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (114 votes). Check out past polls.