http://www.perlmonks.org?node_id=264384

campbell has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I am new to Perl and I am trying to sort a list of documents according to the date they were issued, and display the sorted list as a table. (At the moment they are sorted according to title.)

At the moment, what I am doing is this:

I have the documents listed in a subdirectory, and I create an array of the documents using:

@array=readpipe 'ls 'directory'/'subdirectory'.*html' ;

The list is then sorted using:

sort @array;

The issue date of each document has already been stored in the document header as 'description'. The titles and issue dates and are therefore extracted from the documents, and turned into a table, as follows:

@headings=('Title','Issue Date'); @rows=th(\@headings); foreach $n (@array) { open (HTMLFILE,$n); $issue_date=""; while (<HTMLFILE>) { if (m/<TITLE>(.+)<\/TITLE>/i){$title=$1}; if (m/<o:Description>(.+)/o:Description>/i) {$issue_date=$1}; } close HTMLFILE; push(@rows, td([$title,$issue_date]) } print table({width=>"100%",border=>"1",bordercolor=>"#d9dae6",cellspacing=>" +1",cellpadding=>"1"},Tr({bgcolor=>"#97a5f0"},\@rows));

I would like to modify this whole program so that the documents are sorted according to issue date, rather than title.

There's probably some really simple way of doing this. Can anybody tell me what it is?

Thanks,

Campbell Reid

Replies are listed 'Best First'.
Re: Sorting according to date
by hipe (Sexton) on Jun 09, 2003 at 19:18 UTC
    Use Date::Calc, here is an example:
    #! /usr/bin/perl -w use CGI qw/:standard/; # first install module Date::Calc use Date::Calc qw(Date_to_Days Add_Delta_Days check_date); #@array=readpipe 'ls 'directory'/'subdirectory'.*html' ; @array=readpipe 'ls ./tmp/*html' ; # we will be sorting later #sort @array; @headings=('Title','Issue Date'); @rows=th(\@headings); foreach $n (@array) { open (HTMLFILE,$n); $issue_date=""; while (<HTMLFILE>) { if (m/<TITLE>(.+)<\/TITLE>/i){$title=$1}; # had to change / in the middle? if (m/<o:Description>(.+)<o:Description>/i) {$issue_date=$1}; } close HTMLFILE; #push(@rows, td([$title,$issue_date]) push(@temp_rows,[$title, $issue_date, # in case your date looks like "DD.MM.YYYY" Date_to_Days((split(/\./, $issue_date, 3))[2,1,0]) ]); } for my $i (sort {$a->[2] <=> $b->[2] } @temp_rows) { push(@rows, td([@{$i}[0,1]])); } print table({width=>"100%",border=>"1",bordercolor=>"#d9dae6",cellspacing=>" +1",cellpadding=>"1"},Tr({bgcolor=>"#97a5f0"},\@rows));

      First you assume the date format he is using and then you go on to suggest a byzantine method of sorting dates in that format...

      If his dates really are in a format like "DD.MM.YYYY" then Date::Calc is hardly necessary:

      my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, join "", reverse (split /\./) ] } @dates;

      -sauoq
      "My two cents aren't worth a dime.";
      
        I like your solution, it is elegant but a bit slower. Conversion of date strings to integers is expensive, but it pays off in the sorting faze. Dates that are close together have 4 to 6 equal characters from the beginning of the string, and the for loop is also faster than map. I made a couple of benchmarks to verify that. The difference in speed is not significant.
        I had to assume some date format to be able to produce working solution. It is easy to modify for different format.
Re: Sorting according to date
by sauoq (Abbot) on Jun 09, 2003 at 17:28 UTC

    We can't really answer that without knowing the format that the dates are in. The basic process, however, will probably consist of putting the dates in a hash keyed by the filenames and then sorting the keys of the hash based on the values. The sort will look something like:

    my @files = sort { $hash{$a} <=> $hash{$b} } keys %hash;

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Sorting according to date
by OM_Zen (Scribe) on Jun 09, 2003 at 17:13 UTC
    Hi ,

    Just try to do ls -t directory/sub_directory/*.html , so that the output is itself date sorted .

      You didn't read the question. The dates he wants to sort by are in the documents. He doesn't want to sort by the last modified time associated with the file.

      -sauoq
      "My two cents aren't worth a dime.";
      
        Hi Sauoq ,

        The part of his posting has the array of the ls being sorted and there is not anywhere else that he sorts the title .That made me to write a sorted list by the dates initially itself.But when I am reading , I could see that he has the issue date obtained from the header . I am just curious where he sorts the title obtained from the document header.Please let me know where my understanding has gotten blurred.I am sure my answer is not the apt one either. But I am just trying to find where the present implementation is justified (the sorting by the title obtained from the header of the HTML data )