http://www.perlmonks.org?node_id=318176

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i'm writing a script to organize images in an online gallery, and i can't seem to work my mind around how to sort these images by date. i've got a text file with entries like such:

FILE:WIDTH:HEIGHT:TIME

where FILE is the alphanumeric filename, WIDTH and HEIGHT are numeric dimensions, and TIME is a time string such as you'd get from time().

now, all these entries are in a text file, seperated by newlines like so:

image1.jpg:100:250:1062352538 image2.jpg:650:175:1062340359

and so on. what i need to do, is rearrange this text file to sort the entries by that time string so that the newer ones are at the beginning. any ideas on the best way to do this?

Replies are listed 'Best First'.
Re: sorting entires by date
by Aristotle (Chancellor) on Jan 01, 2004 at 23:04 UTC
    Basically you want a Schwartzian Transform. You parse the lines into a list of lists, sort it by the desired field, then extract the original data.
    my @sorted = map $_->[0] sort { $a->[4] <=> $b->[4] } map [ $_, split /:/ ], <>;
    Of course if you want to keep them around in the parsed form for later steps, you can leave out the copy of the line and the extraction step:
    my @sorted = sort { $a->[4] <=> $b->[4] } map [ split /:/ ], <>;

    I am too lazy to explain the Schwartzian transform all over again :), so I Super Searched for some explaination. Surprisingly I came up empty for the time being..

    The probably most important to understand parts are references and how to use them to create complex data structures in Perl. If you understand that, the Transform itself should be almost self-explanatory. Check out perldoc perlreftut, perldoc perldsc, and perldoc perllol.

    Makeshifts last the longest.

Re: sorting entires by date
by Coruscate (Sexton) on Jan 01, 2004 at 23:06 UTC

    How about something like this:

    use Fcntl ':flock'; open my $fh, '+<', 'images.dat' or die "open failed: $!"; flock $fh, LOCK_EX; seek $fh, 0, 0; my @images; while (<$fh>) { chomp; push @images, [(split /:/)[-1], $_]; } my @ordered = map { $_->[1] } sort { $a->[0] <=> $b->[0] } @images; seek $fh, 0, 0; truncate $fh, 0; print $fh join "\n", @ordered; close $fh;

    Update: My brain isn't taking shortcuts today. I like Aristotle's approach of passing the file read directly to map(). Didn't think of it for some reason. So a slightly simplified version:

    use Fcntl ':flock'; open my $fh, '+<', 'images.dat' or die "open failed: $!"; flock $fh, LOCK_EX; seek $fh, 0, 0; my @ordered = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { chomp; [(split /:/)[-1], $_] } <$fh>; seek $fh, 0, 0; truncate $fh, 0; print $fh join "\n", @ordered; close $fh;

      I would advise writting to a temp file and moving that tempfile over the original since moves are closer to atomic as a perl instruction. If someone SIG'd your program in the middle, you may lose data.

      Play that funky music white boy..
        Good point - and to be truly lazy you hide this behind IO::AtomicFile. :)

        Makeshifts last the longest.

Re: sorting entires by date
by jweed (Chaplain) on Jan 01, 2004 at 23:15 UTC
    If you want to sort a file by such a method, it would be best to read the file into an array (angle operator or Tie::File) and then use the Schwartzian Transform (read about it here) like this:
    @file = map { $$_[0] } sort { $$a[1] <=> $$b[1] } map { [$_, (split /:/)[3] ] } @file;
    Hope that helps!


    Update
    Thought I'd make the ST clearer. Basically, you start with a map statement which takes every line in your file and puts it in an anonymous array with the timestamp (via split). Then, it passes an array with each of these to the sort routine, which sorts them based on the timestamp by getting that info from each anonymouss array in turn. Finally, the map statment at the end transforms this array of anonymous arrays back into an array with the lines from the file, properly sorted. Tada!

    Update*2
    You could also try a GRT, which might look something like this:
    @file = map { join ':', (split /:/)[1,2,3,0] } sort map { join ':', (split /:/)[3,0,1,2] } @file;
    This has the clear limitation that it sorts asciibetically rather than numerically (doing it numerically might mitigate the benifits of GRT over ST, I'm not sure), which shouldn't be a problem really with time() but might be something to watch out for. Also, I'm sure this should have been done with pack or some such nonsense, but I don't know how. And finally, it may not even be faster. But, TMTOWTDI.


    Who is Kayser Söze?
    Code is (almost) always untested.
Re: sorting entires by date
by exussum0 (Vicar) on Jan 01, 2004 at 23:09 UTC
    #!/usr/bin/perl use strict; use warnings; my %file = () ; while(<>) { chop; my( $file, $width, $height, $time ) = split(/:/); $file{$time} = [ $file, $width, $height ]; } foreach ( sort keys %file ) { my $data = $file{$_}; print join(":", ( @$data, $_ ) ) . "\n"; }
    Note, it doesn't work with files that have colons in them. I'm a big fan of letting the command line do stuff... so you'd run it ala..
    script.pl < inFile > outFile

    Play that funky music white boy..
      $file{$time} = [ $file, $width, $height ];

      Note that this method would break if there comes to be two rows of data in the file with the same timestamp. You'd only keep the last entry with said timestamp as the rest would be written over.

        Yup, i'm assuming unique time stamps. :) I could use file names for keys and put the sort by the time stamp.

        Play that funky music white boy..
Re: sorting entires by date
by pg (Canon) on Jan 01, 2004 at 23:18 UTC

    First get your file into an array thru Tie::File, then:

    use Data::Dumper; my @a = ("image1.jpg:100:250:1062352538", "image2.jpg:650:175:10623403 +59"); @a = sort {(split(":", $a))[-1] <=> (split(":", $b))[-1]} @a; print Dumper(\@a);
Re: sorting entires by date
by BrowserUk (Patriarch) on Jan 02, 2004 at 00:17 UTC

    If, as your sample data indicates, your lines are of a consistant fixed format, then a simpler sort using substr would suffice.

    my @sorted = sort{ substr( $a, 19 ) cmp substr( $b, 19 ) } <FILE>;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      He has filenames in there, so assuming that the format is fixed is more than likely moot. But seeing as it is the last field we're interested, the following will work:
      my @sorted = sort { substr( $a, 1 + rindex, $a, ':' ) cmp substr( $b, 1 + rindex, $b, +':' ) } <>;

      Makeshifts last the longest.

        If you have dates before and after time 1000000000 (Sept 2001), you need to use <=> instead of cmp (assuming traditional Unix epoch).

        Being able to do a numeric comparison on two substrings is one of those things that makes Perl Perl.

        The problem with this being that you have to substr twice for every comparison, which, when the file becomes large, is substatially more time consuming than the ST or GRT which does a substr for each line only. He explains that in this Unix Review Column.


        Who is Kayser Söze?
        Code is (almost) always untested.
Re: sorting entires by date
by duff (Parson) on Jan 02, 2004 at 14:23 UTC

    If you are on a unixish platform, there's the good 'ole sort command.

    sort -n -t: -k4 filename

    Remember, TMTOWTDI includes not using perl :-)