Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Sorting dates with the Schwartzian Transform

by Wobbel (Acolyte)
on Aug 03, 2011 at 13:28 UTC ( #918248=perlquestion: print w/ replies, xml ) Need Help??
Wobbel has asked for the wisdom of the Perl Monks concerning the following question:

The sorting on 4 columns of a 40 column textfile with the Schwartzian Transform works fine, except the date column.

It's in DD-MM-YYYY format and alfabetically/numerically handled as a string.

Do I have to convert this column before the ST to a number, sort it during the ST as a number, and after the ST convert it back to the DD-MM-YYYY format?

Or is there an elegant solution within the ST possible?

I think, the first option will harm the performance...

Any expert advice is very welcome.

Thanks.

Comment on Sorting dates with the Schwartzian Transform
Re: Sorting dates with the Schwartzian Transform
by moritz (Cardinal) on Aug 03, 2011 at 13:37 UTC
Re: Sorting dates with the Schwartzian Transform
by Anonymous Monk on Aug 03, 2011 at 13:45 UTC

    Do I have to convert this column before the ST to a number, sort it during the ST as a number, and after the ST convert it back to the DD-MM-YYYY format?

    Um no, just do a schwartzian-transform ;)

    @dates = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { my $f = $_; s/\D//g; [ $f, $_ ] } @dates;

      Schwartzian, no doubt, but not enough transformation. The dates are originally in DD-MM-YYYY format and must be rearranged for effective sorting.

      @dates = map { $_->[0] } sort { $a->[1] >= $b->[1] } map { my $f = $_; /(\d\d)-(\d\d)-(\d{4})/g; [ $f, "$3$2$1" ] } @dates;

      Also, incomplete. There are 3 other columns that should figure in the sort. I'd like to see the OP's code.

      On a side note, I had never heard of the ST before. I looked briefly in perlsyn to see how I would know that the map {} sort {} map {} would be executed in reverse order, but I didn't find it. Any clues?

        ... how I would know that the map {} sort {} map {} would be executed in reverse order, but I didn't find it. Any clues?

        Both map and sort take a list (which is on the RHS), do some transformation of it then return the transformed list (to the LHS).

        my @mapped = map { # some transform code } @unmapped; my @sorted = sort { # sorting code } @unsorted;

        The ST code is just an extension of this right-to-left pattern:

        • the first map extracts the sorted dates in the original DD-MM-YYYY format and assigns to the @dates array on the LHS of the assignment operator (=);

        • but it can't do that before the sort has evaluated, sorting the items;

        • which in turn can't do any sorting before the bottom (or rightmost) map has transformed some dates into something sort can work with, taking its raw material from the rightmost part of the expression which is the original, unsorted @dates array.

        I hope this makes things a bit clearer.

        Cheers,

        JohnGG

Re: Sorting dates with the Schwartzian Transform
by osbosb (Monk) on Aug 03, 2011 at 14:02 UTC
    alfabetically? Really? Come on.
Re: Sorting dates with the Schwartzian Transform
by FunkyMonk (Canon) on Aug 03, 2011 at 16:20 UTC
    Do I have to convert this column before the ST to a number,
    A number or a string, it makes little difference
    and after the ST convert it back to the DD-MM-YYYY format?
    No, you just throw the sortable number/string away. Something like this:
    print for map { $_->[0] } # extract original date sort { $a->[1] cmp $b->[1] } # Sort, using sortable date map { m/(\d\d)-(\d\d)-(\d{4})/; [$_, "$3$2$1"] # [original date, sortable date +] } <DATA>; __DATA__ 02-02-2007 01-01-2006 03-03-2009 02-02-2009

    Output:

    01-01-2006 02-02-2007 02-02-2009 03-03-2009


    Unless I state otherwise, all my code runs with strict and warnings
Re: Sorting dates with the Schwartzian Transform
by ikegami (Pope) on Aug 03, 2011 at 18:21 UTC

    A Schwartzian Transform when all you have to do is parse dates? That will make things *slower*. Creating all those arrays and references adds up.

    This could speed up the sorting (because it creates few extra variables and it uses the specially optimised $a cmp $b callback):

    my @sorted = map substr($_, 8), sort map join('', (/(..)-(..)-(....)/)[2,1,0], $_), @dates; # DD-MM-YYYY

    Naïve:

    my @sorted = sort { join('', ($a =~ /(..)-(..)-(....)/)[2,1,0]) cmp join('', ($b =~ /(..)-(..)-(....)/)[2,1,0]) } @dates; # DD-MM-YYYY

    Schwartzian Transform:

    my @sorted = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, join('', (/(..)-(..)-(....)/)[2,1,0]) ], @dates; # DD-MM-YYYY

      Dear Perl experts, thanks for all the usefull replies and different approaches! The PerlMonks website is a real good place to learn new things and why you choose a certain solution. Great.

      The sorting is not only about dates. Patient_ID, Course_ID, Session_number, Session_date, Imaging_type, and a lot of measurements. The sorting question is the last part of a bigger project. "The doctor" needs the data in an Excel-friendly format :-( .

      I can't wait to fine tune my code, but I'll have to wait till tomorrow.

      I'm so close to the last step, but....

      What if you sort on two or three special columns? In my case date 11 and time 12. Is your original code limited to one column, our is it possible to "map" on more then one time/date format?

      I've Googled and tried a lot last week, but I'm stuck (on the syntaxis).

      my @sorted = map $_->[0], sort { $a->[11] cmp $b->[11] || #Date, original # $a->[12] cmp $b->[12] #Time, to do list } map [ $_, join('', (/(..)-(..)-(....)/)[2,1,0]) ], # map [ $_, join('', (/(..):(..):(..)/)[2,1,0]) ], # Is it possible + to map two columns date and time? @dates; # DD-MM-YYYY # HOURS:MIN:SEC

        I've Googled and tried a lot last week, but I'm stuck (on the syntaxis).

        I hope you understand my message despite the wording

        Actually it seems more like you're stuck on syntax and arrays.

        You need to read perlintro and Basic debugging checklist and How do I post a question effectively? and References quick reference

        Also, when you have a program, with real, named variables, talking about columns can get confusing , talk about your variables instead ;)

        The code you pasted will never have a 12 element array, nor do you want one.

        I would go back to

        my @sorted = map substr($_, 8), sort map join('', (/(..)-(..)-(....)/)[2,1,0], $_), @dates; # DD-MM-YYYY
        Don't get it? To understand, you would write a program like this
        #!/usr/bin/perl -- use strict; use warnings; use Data::Dumper; my @dates = qw[ 08-15-2011 08-10-2011 08-05-2011 ]; print "\ndates ", Dumper( \@dates ); #~ my @firstTransform = map join('', (/(..)-(..)-(....)/)[2,1,0], $_) +, @dates; # DD-MM-YYYY my @firstTransform = map join('', ReorderForCmp($_), $_), @dates; # +DD-MM-YYYY print "\nfirstTransform ", Dumper( \@firstTransform ); my @firstSorted = sort @firstTransform ; print "\nfirsSorted ", Dumper( \@firstSorted ); my @finalTransform = map substr($_, 8), @firstSorted ; print "\nfinalTransform ", Dumper( \@finalTransform ); sub ReorderForCmp { my( $one ) = @_; my @date = $one =~ /(..)-(..)-(....)/; #~ return @date[2,1,0]; return $date[2], $date[1], $date[0]; } __END__
        which produces this output
        dates $VAR1 = [ '08-15-2011', '08-10-2011', '08-05-2011' ]; firstTransform $VAR1 = [ '2011150808-15-2011', '2011100808-10-2011', '2011050808-05-2011' ]; firsSorted $VAR1 = [ '2011050808-05-2011', '2011100808-10-2011', '2011150808-15-2011' ]; finalTransform $VAR1 = [ '08-05-2011', '08-10-2011', '08-15-2011' ];
        So yes, it is possible to "map two columns date and time", just adjust sub ReorderForCmp to return iso-8601 style datetime ( YYYYMMDDHHMMSS)
Re: Sorting dates with the Schwartzian Transform
by salva (Monsignor) on Aug 04, 2011 at 09:06 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://918248]
Approved by Marshall
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2014-08-28 05:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (257 votes), past polls