Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Extracting Log File data for a given date range

by vishi (Sexton)
on Dec 13, 2011 at 12:37 UTC ( #943322=perlquestion: print w/ replies, xml ) Need Help??
vishi has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have a really interesting situation here. I have a log file (basically a CSV) where each line is something that's got to do with a user's session. Each of these lines has a timestamp. A Sample is given below:

ABC01, 91XYZ889=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 02-Dec-2011_ +00.34.51, bigFatLog_02-Dec-2011_00.34.06.log ABC03, 93XYZ272=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 07-Dec-2011_ +09.21.58, bigFatLog_07-Dec-2011_09.20.57.log ABC02, 93XYZ807=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 08-Dec-2011_ +23.00.15, bigFatLog_08-Dec-2011_22.59.34.log ABC05, 91XYZ525=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 10-Dec-2011_ +10.01.36, bigFatLog_10-Dec-2011_10.01.00.log ABC01, 93XYZ252=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 12-Dec-2011_ +11.58.23, bigFatLog_12-Dec-2011_11.57.20.log ABC03, 93XYZ543=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 12-Dec-2011_ +23.34.07, bigFatLog_12-Dec-2011_23.33.23.log ABC04, 92XYZ066=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 13-Dec-2011_ +01.00.31, bigFatLog_13-Dec-2011_00.59.29.log ABC05, 93XYZ184=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 13-Dec-2011_ +01.54.41, bigFatLog_13-Dec-2011_01.54.04.log

Now, here's what am trying to do - My program accepts 2 dates as command line args, the first arg is the start date and the second one is the end date. Something like this :

perl getMyStats.pl 01-Dec-2011 11-Dec-2011

Now, I need to extract only those lines from the above CSV file which fall in this date range. I'm confused how to go about this one. I have thought of the following approach:

  • Expand the date range - get each date in the range and do a pattern matching for each line in the file that matches this date, write the line to another temp file, loop continues.

I need to know whether this is the right approach, or whether Perl has better tricks up its sleeve to make my job easier. I am familiar with Date::Calc, but I don't know how it will be useful here. If what I have thought of is correct, How do I go about "expanding" the date range, i.e., in my example, how do I convert 01-Dec-2011 11-Dec-2011 to a list with individual dates as its elements?

Suggestions please?!?!

Comment on Extracting Log File data for a given date range
Select or Download Code
Re: Extracting Log File data for a given date range
by keszler (Priest) on Dec 13, 2011 at 12:48 UTC

    Date::Parse understands a variety of date formats - if it recognizes the format in your log files it could help.

Re: Extracting Log File data for a given date range
by choroba (Abbot) on Dec 13, 2011 at 12:50 UTC
    I'd go the opposite way: go through the lines, for each line check whether it falls within the range, report if yes. Using a simpler timestamp (2011-12-24_23.11.22) would make your problem even easier - simple string comparison would work.
      Agreed, but how do I get hold of all the dates in the range I have?

        If you convert all dates to the format yyyy-mm-dd, you can simply use ge and lt to compare whether a date lies in your range or not.

Re: Extracting Log File data for a given date range
by moritz (Cardinal) on Dec 13, 2011 at 12:51 UTC

    I'd covert the two dates that you get from the command line into numbers (for example days since 1900-01-01 or so), and do the same for each date that you parse. Then you need to only do two comparisons per line.

    I personally like Date::Simple for such tasks (the ones that only involves dates, not times, and it seems you can ignore times here).

      But I need to get the dates in the first place - any Idea how I can get that in an array? Oh, yes - I will be ignoring the time for this - My requirement is only dates, so I will be pattern matching for the dates only.
Re: Extracting Log File data for a given date range
by TJPride (Pilgrim) on Dec 13, 2011 at 14:50 UTC
    Not really that hard. Convert your from and to dates to something you can sort on (YYYY-MM-DD), then pattern match each line, convert the date there as well, and compare. When testing this, make sure to call the Perl script with the from and to dates as arguments and in the format specified by the OP.

    use strict; use warnings; { my %mon = ( 'JAN' => 1, 'FEB' => 2, 'MAR' => 3, 'APR' => 4, 'MAY' => 5, 'JUN' => 6, 'JUL' => 7, 'AUG' => 8, 'SEP' => 9, 'OCT' => 10, 'NOV' => 11, 'DEC' => 12 ); sub convDate { my $d = [split /-/, $_[0]]; return sprintf('%04d-%02d-%02d', $d->[2], $mon{uc $d->[1]}, $d->[0]); }} my $from = convDate($ARGV[0]); my $to = convDate($ARGV[1]); my @d = <DATA>; for (@d) { if (m/(\d+-\w+-\d+)_\d+\.\d+\.\d+/g) { print if convDate($1) ge $from && convDate($1) le $to; } } __DATA__ ABC01, 91XYZ889=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 02-Dec-2011_ +00.34.51, bigFatLog_02-Dec-2011_00.34.06.log ABC03, 93XYZ272=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 07-Dec-2011_ +09.21.58, bigFatLog_07-Dec-2011_09.20.57.log ABC02, 93XYZ807=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 08-Dec-2011_ +23.00.15, bigFatLog_08-Dec-2011_22.59.34.log ABC05, 91XYZ525=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 10-Dec-2011_ +10.01.36, bigFatLog_10-Dec-2011_10.01.00.log ABC01, 93XYZ252=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 12-Dec-2011_ +11.58.23, bigFatLog_12-Dec-2011_11.57.20.log ABC03, 93XYZ543=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 12-Dec-2011_ +23.34.07, bigFatLog_12-Dec-2011_23.33.23.log ABC04, 92XYZ066=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 13-Dec-2011_ +01.00.31, bigFatLog_13-Dec-2011_00.59.29.log ABC05, 93XYZ184=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 13-Dec-2011_ +01.54.41, bigFatLog_13-Dec-2011_01.54.04.log
Re: Extracting Log File data for a given date range
by vlashua (Initiate) on Dec 13, 2011 at 16:54 UTC
    [set up input file, output file FILE] use Date::Manip::DM5; @argdates = ParseDate(@ARGV); for inputfile, read line from input file { @linedates = ParseDate(inputline) for @argdates { if $linedates[0] eq $_ or $linedates[1] eq $_ then print FILE inputline; } }

    Date::Manip will take virtually any date format as input

Re: Extracting Log File data for a given date range
by Cristoforo (Deacon) on Dec 14, 2011 at 01:19 UTC
    A solution using Date::Parse.
    #!/usr/bin/perl use strict; use warnings; use Date::Parse qw/ str2time /; #my ($start, $end) = map str2time($_), qw/ 01-Dec-2011 11-Dec-2011 /; my ($start, $end) = map str2time($_), @ARGV; while (<DATA>) { (my $date = (split /,/)[3]) =~ s/_.+//; my $time = str2time( $date ); print if $start <= $time && $time <= $end; } __DATA__ ABC01, 91XYZ889=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 02-Dec-2011_ +00.34.51, bigFatLog_02-Dec-2011_00.34.06.log ABC03, 93XYZ272=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 07-Dec-2011_ +09.21.58, bigFatLog_07-Dec-2011_09.20.57.log ABC02, 93XYZ807=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 08-Dec-2011_ +23.00.15, bigFatLog_08-Dec-2011_22.59.34.log ABC05, 91XYZ525=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 10-Dec-2011_ +10.01.36, bigFatLog_10-Dec-2011_10.01.00.log ABC01, 93XYZ252=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 12-Dec-2011_ +11.58.23, bigFatLog_12-Dec-2011_11.57.20.log ABC03, 93XYZ543=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 12-Dec-2011_ +23.34.07, bigFatLog_12-Dec-2011_23.33.23.log ABC04, 92XYZ066=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 13-Dec-2011_ +01.00.31, bigFatLog_13-Dec-2011_00.59.29.log ABC05, 93XYZ184=_=SOMEBODY.NAME@DOMAIN.COM, HighPriority, 13-Dec-2011_ +01.54.41, bigFatLog_13-Dec-2011_01.54.04.log
    This assumes there are no embedded commas in the data, so it is safe to just split on the comma. Otherwise, a module for parsing comma separated values like Text::CSV_XS would be needed to parse the log.

    And this is how I called it from the command line: C:\Old_Data\perlp>perl t33.pl 20111201 20111211

      Thanks all for the overwhelming response!! I cracked this one with the CPAN modules Date::Simple and Date::Range. I was able to get the dates in the format I wanted and put all of them in an array.

      May be this implementation is a bit "noobish" or elaborate... but it worked! I will definitely make note of all your suggestions, and perhaps, may be get to use it when I face a similar problem in future.

      Okay! Here's what I did :D .....

      my $date1 = $ARGV[0]; my $date2 = $ARGV[1]; my ( $start, $end ) = ( date($date1), date($date2) ); my $range = Date::Range->new( $start, $end ); my @all_dates = $range->dates; my %hash=("01"=>"Jan","02"=>"Feb","03"=>"Mar","04"=>"Apr","05"=>"May", +"06"=>"Jun","07"=>"Jul","08"=>"Aug","09"=>"Sep","10"=>"Oct","11"=>"No +v","12"=>"Dec"); foreach my $numericDate (@all_dates) { while ( my ($key, $value) = each(%hash) ) { $numericDate =~ s/\-$key-/\-$value\-/g; } } my @tempArray; foreach my $reverseDate (@all_dates) { split (/-/,$reverseDate); my $correctFormat = $_[2]."-".$_[1]."-".$_[0]; push (@tempArray, $correctFormat); } print "\n@tempArray\n";

      So, the output looks something like this:

      $ ./test.pl 2011-12-10 2011-12-13 ============================== 10-Nov-2011 11-Nov-2011 12-Nov-2011 13-Nov-2011 14-Nov-2011 15-Nov-201 +1 16-Nov-2011 17-Nov-2011 18-Nov-2011 19-Nov-2011 20-Nov-2011 21-Nov- +2011 22-Nov-2011 23-Nov-2011 24-Nov-2011 25-Nov-2011 26-Nov-2011 27-N +ov-2011 28-Nov-2011 29-Nov-2011 30-Nov-2011 01-Dec-2011 02-Dec-2011 0 +3-Dec-2011 04-Dec-2011 05-Dec-2011 06-Dec-2011 07-Dec-2011 08-Dec-201 +1 09-Dec-2011 10-Dec-2011 11-Dec-2011 12-Dec-2011 13-Dec-2011 ==============================
      Thanks a ton for all your suggestions!
        I had this problem before. I ended up writing a utility script I call 'grange' for 'grep a range'. It depends on the fact that I'm usually parsing logfiles so the dates are sequential. So I used the range operator .. to match lines between the start and end regexes. If that matches your situation, then here's the whole utility:
        #!/usr/bin/perl -n BEGIN { print "Usage: $0 <start pattern> <end_pattern>\n" and exit unless +@ARGV == 2; $start = shift @ARGV; $end = shift @ARGV; } next if 1 .. /$start/; last if /$end/; print
        HTH!
        Hey, Here is the error I am getting when I run your code: Useless use of split in void context at C:\Prady\perl files\dates.pl line 31. Undefined subroutine &main::date called at C:\Prady\perl files\dates.pl line 12. Line 31 is : split (/-/,$reverseDate); Line 12 is : my ( $start, $end ) = ( date($date1), date($date2) ); Please help me find out what I am missing.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://943322]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (12)
As of 2014-10-21 17:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (106 votes), past polls