Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Sorting dates and times

by Anonymous Monk
on Mar 30, 2002 at 20:47 UTC ( #155484=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've been searching online for 2 hours for this piece of perl code snippet to do a very simple thing (I think!), to no avail.

I simply need to sort a standard Web log file by date and time. I have achronological Web log files that I need to place in chronological order. The first part of each line in the file looks like:

21.70.5.206 - - [30/Mar/2002:00:03:44 -0500] "GET ......

I would think this is pretty simple, yet I haven't been able to get anything I've tried to work. Thanks for your help in advance!

Comment on Sorting dates and times
Download Code
Re: Sorting dates and times
by Ryszard (Priest) on Mar 30, 2002 at 21:24 UTC
    Use Date::Calc qw(Delta_Days).

    I guess a pretty inefficient method would be to grab the 1st day, plop it into an array, grab the next day, iterate thru' the array using Delta_Days to determine if it would be above/behind below/infront (depending on your orientation) of the date your comparing it to.

    I'm not an expert on sorting algorithms, but using brute force would certianly work.

    Just having a quick think about it, you could convert the date to a julianformat then use the regular perl sort function, and wah-lah there you have it, sorted dates.

    Of course that wouldnt work for cross year boundaries, where you could convert the time to seconds (with thanks to merlyn).

    If you're using a database somewhere in the mix, then you could just pump all your data into it, then extract it, sorted by date.

Re: Sorting dates and times
by Kanji (Parson) on Mar 30, 2002 at 22:34 UTC

    The non-trivial part of sorting log files is efficiency, especially when you get into logs that are of any significant size.

    If you have small(ish) log files or memory to burn, then a Schwartzian or Guttman-Rossler Transform can make for a short and simple (YMMV :)) script...

    #!/usr/bin/perl -w use strict; use Time::Piece; die "$0 [input] [output]\n" unless @ARGV; die "Input/output can't match\n" if $ARGV[0] eq $ARGV[1]; open UNSORTED, "< $ARGV[0]" or die "Can't open $ARGV[0] for reading: $!\n"; open SORTED, "> $ARGV[1]" or die "Can't open $ARGV[1] for writing: $!\n"; print SORTED map { $_->[1] } # restore data to original form sort { $a->[0] <=> $b->[0] } # sort by time/date map { [ epoch_date(), $_ ] } # prepend time/date as secs <UNSORTED>; sub epoch_date { # Convert Apache style-dates to epoch seconds return unless /^\S+ \S+ \S+ (\S+)/; return Time::Piece->strptime( $1, "[%d/%b/%Y:%T" ); }

        --k.


Re: Sorting dates and times
by ejf (Hermit) on Mar 31, 2002 at 01:42 UTC
    This is a problem I had myself, and I solved it here ... That code has been in production for some time and it just works ;) It does expect the different files to be in chronological order, though ... Even if that's not the case on your end, the script might get you on the way ...
Re: Sorting dates and times
by pizza_milkshake (Monk) on Mar 31, 2002 at 04:51 UTC
      whoops
      perl -MDate::Parse -n -le'chomp; s/:/ /; s/\// /g; s/\s\-\d+//; m/\[(. +*?)\]/; print $1; $h{str2time($1)} = $_; END{ print $h{$_} for sort k +eys %h }' dates.log
      perl -MLWP::Simple -e'getprint "http://parseerror.com/p"' |less
Re: Sorting dates and times
by zakzebrowski (Curate) on Mar 31, 2002 at 16:45 UTC
    If you're looking to do web statistics, analog is pretty good.

    ----
    Zak
Re: Sorting dates and times
by Anonymous Monk on Apr 01, 2002 at 01:10 UTC
    Parse the log, convert the date to an epoch using Time::Local, and sort the epochs.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://155484]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2014-07-30 19:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (240 votes), past polls