Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

How to calculate earliest date from log file

by jaffinito34 (Acolyte)
on Nov 10, 2012 at 21:46 UTC ( [id://1003282]=perlquestion: print w/replies, xml ) Need Help??

jaffinito34 has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script where I'm given a log file and need to print some of its info to an HTML file. One problem I'm having is that at the top of the HTML file, I need to print the start of the log date, which will be the earliest date found in the log. I believe I need to implement this in my while loop but I'm not sure how it will work. Here's my code:

#! /usr/bin/perl # CSC 310 Project # 4 #opening log and html file open (LOG, '<', 'IN-access.log'); open (HTML, '>>', 'OUT-access.html'); #printing header/title/format of html print HTML "<HTML><TITLE>Visitors Log</TITLE><BODY>\n"; print HTML "The log file start date is: date() <BR>\n"; <----HERE +!!! print HTML "There were 11 unique visitors in the logfile.<BR>\n"; print HTML "There were 2 visits yesterday<BR>\n"; print HTML "<TABLE border=1><TR><TD>IP</TD><TD>LOGFILE</TD></TR>\n"; #reading log file and assigning values while ($lines = <LOG>){ ($remoteIP,$rfc,$userID,$dateTime,$timeZone,$requestType,$fileRequ +ested,$requestProtocol,$statusCode,$sizeOfFile) = split ' ', $lines; print HTML "<TR><TD>$remoteIP</TD><TD>$remoteIP $rfc $userID $date +Time $timeZone $requestType $fileRequested $requestProtocol $statusCo +de $sizeOfFile</TD></TR>\n"; }

Just for reference, here is an example of what is found in the log file

66.249.65.107 - - 08/Oct/2007:04:54:20 -0400 "GET /support.html HTTP/1.1" 200 11179

111.111.111.111 - - 05/Oct/2004:15:17:55 -0400 "GET / HTTP/1.1" 200 10801

111.111.111.111 - - 06/Oct/2007:11:17:55 -0400 "GET /style.css HTTP/1.1" 200 3225

Do I need to convert the date somehow so I can compare them? Or can I tell perl to just check certain parts of the variable? All suggestions are welcome.

Replies are listed 'Best First'.
Re: How to calculate earliest date from log file
by mbethke (Hermit) on Nov 11, 2012 at 04:27 UTC
    The easiest if not fastest way would be Date::Parse:
    $ perl -MDate::Parse -E'say scalar gmtime str2time("08/Oct/2007:04:54: +20 -0400");' Mon Oct 8 08:54:20 2007

    str2time() gives you the seconds since the epoch that you can easily compare.

    Because Date::Parse has to do its format guessing every time you call the parsing function, you'd be much faster using a hand-crafted regexp with a little hash like

    %months = ( Jan => '01', Feb => '02', ... ); s!(..)/(...)/(....):(..):(..):(..)!$3.$months{$2}.$1.$4.$5.$6!e;
    or something along these lines, maybe correcting for the time zone if these should be allowed differ too in your log lines. However, if you're doing anything significant with your data afterwards, the speedup may not even be noticeable. A program I wrote in 2008 uses Date::Parse and parses millions and millions of lines a day---it was just never worth fixing and testing the fix for such a little speed gain.

Re: How to calculate earliest date from log file
by CountZero (Bishop) on Nov 10, 2012 at 22:24 UTC
    If it is a real log file then the first entry should be the oldest and you can extract the date easily from the first line with a regex such as /(\d{2}\/.{3\/\d{4})}/

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      It's not though, its a sample file where all the data is mixed and matched, what I'm thinking is store the date somewhere and have the while loop check it each time to the date in the loop, and then switch if one is earlier. I just have no idea how I can compare the dates given the format that they're in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1003282]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-25 05:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found