http://www.perlmonks.org?node_id=339863

mvam has asked for the wisdom of the Perl Monks concerning the following question:

i had a post a few days ago asking the best way to parse apache log files. i wasnt able to get apache::parselog to work so i wrote a small script using sed and awk. it produces output like so:
[24/Mar/2004:12:26:52 /manual/misc/perf-tuning.html 0_seconds [24/Mar/2004:12:27:33 /manual/mod/mod_status.html 0_seconds [24/Mar/2004:12:27:39 /manual/mod/module-dict.html 0_seconds [24/Mar/2004:12:27:46 /manual/misc/rewriteguide.html 0_seconds [24/Mar/2004:12:27:53 /manual/mod/mod_rewrite.html 0_seconds [24/Mar/2004:12:27:53 /manual/images/mod_rewrite_fig1.gif 0_seconds [24/Mar/2004:12:27:53 /manual/images/mod_rewrite_fig2.gif 0_seconds [24/Mar/2004:12:28:05 /manual/new_features_1_3.html 0_seconds
these results are sent to a file for viewing. obviously this isnt the best way to view stats for each file. is there a simple way with perl to parse the file and have the ability to pass a file name as an argument?

i know that i'd have to call the awk script each time to generate a current file which i can do.. but i dont have any ideas how to get perl involved.. is this easy or am i in over my head here?

Edit by tye, escape [

20040327 Edit by BazB: Changed title from 'how do i sort thee?'

Replies are listed 'Best First'.
Re: Obtaining Apache logfile stats?
by DamnDirtyApe (Curate) on Mar 25, 2004 at 20:05 UTC

    mvam, I'm unclear about what you want to accomplish, but I suspect you can come up with a better solution that skips the sed/awk portion of your process. Please post:

    • The actual problem (maybe with examples) you're trying to solve here
    • Your code so far
    • Your output
    • Your desired output

    _______________
    DamnDirtyApe
    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche
Re: Obtaining Apache logfile stats?
by sauoq (Abbot) on Mar 25, 2004 at 20:53 UTC
    i wasnt able to get apache::parselog to work

    From this comment and your data sample, I have to guess that you aren't using a standard log format, right? Can you show us a sample of your log data and/or the CustomLog directive you use in your Apache configuration? Without that, we can't help you slice and dice it in Perl.

    is there a simple way with perl to parse the file and have the ability to pass a file name as an argument?

    Yes. Something like the following might work well enough for you depending, of course, on what you haven't told us yet...

    #!perl -lan BEGIN { $SUM = $N = 0; $file = shift; } if ($F[1] eq $file) { my ($secs) = ($F[2] =~ /^(\d+)/); $SUM += $secs; $N++ } END { print "Average: " , $SUM/ $N; }
    Put that in a file and run it with two arguments, the full pathname of the file you want stats on and the pathname of the file your parsed log data (i.e. the sample you provided) is in. Something like:
    perl get_stats /manual/misc/perf-tuning.html log.data

    -sauoq
    "My two cents aren't worth a dime.";
    
      a sample log line:

      x.x.x.x - 24/Mar/2004:12:26:52 -0800 "GET /manual/misc/perf-tuning.html HTTP/1.1" 200 0 48296 "http://localhost/manual/" "Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.6) Gecko/20040211 Firefox/0.8"

      and this is my logformat line:

      LogFormat "%v %{x-up-subno}i %t \"%r\" %>s %T %b \"%{Referer}i\" \"%{User-Agent}i\"" wap

        The quick and dirty approach would be to just carve it up on white space like you are doing with awk anyway. The conversion is straight forward. Use split or perl's -a option (as in my example above.)

        Regardless of how you parse the input, you'll probably find it worthwhile to compute the statistics for every file accessed on one pass through your log. That's a lot more efficient than reading your whole log once for each file you want stats on. That's easy enough; just use a hash to maintain data for each filename as you traverse the log.

        -sauoq
        "My two cents aren't worth a dime.";
        

        How are you calculating the "0_seconds" portion of your sample data?

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: Obtaining Apache logfile stats?
by mvam (Acolyte) on Mar 25, 2004 at 20:18 UTC
    ok here we go: the problem i need to solve is taking an apache log file thats rotated daily and getting the time taken to serve each page in the log. i was able to get this data in a basic format using
    awk '{print $3, %6, $9}' > /tmp/resultsfile
    this does a nice job of outputting the relevant fields in the log file. the next step would to be process this output in such a way that i could type say 'mod_rewrite.html' and find out how many times it was served and what the average of those time is.

      Alright, perhaps try something along these lines:

      #! /usr/bin/perl use strict; use warnings; my $file = shift @ARGV; my @times = map { /(\d+)_seconds/; $1 } grep { /$file/ } <DATA>; my $totaltime; $totaltime += $_ for @times; my $avgtime = $totaltime / @times; print "Average time: $avgtime\n\n"; __DATA__ [24/Mar/2004:12:26:52 /manual/misc/perf-tuning.html 0_seconds [24/Mar/2004:12:27:33 /manual/mod/mod_status.html 0_seconds [24/Mar/2004:12:27:39 /manual/mod/module-dict.html 0_seconds [24/Mar/2004:12:27:46 /manual/misc/rewriteguide.html 0_seconds [24/Mar/2004:12:27:53 /manual/mod/mod_rewrite.html 5_seconds [24/Mar/2004:12:27:53 /manual/images/mod_rewrite_fig1.gif 0_seconds [24/Mar/2004:12:27:53 /manual/images/mod_rewrite_fig2.gif 0_seconds [24/Mar/2004:12:28:05 /manual/new_features_1_3.html 0_seconds [24/Mar/2004:12:29:53 /manual/mod/mod_rewrite.html 6_seconds [24/Mar/2004:12:29:54 /manual/mod/mod_rewrite.html 7_seconds [24/Mar/2004:12:29:55 /manual/mod/mod_rewrite.html 8_seconds [24/Mar/2004:12:29:56 /manual/mod/mod_rewrite.html 9_seconds

      I still think you should try the format manipulation in Perl, though; it's easy to do, and you'll only have one script to maintain.


      _______________
      DamnDirtyApe
      Those who know that they are profound strive for clarity. Those who
      would like to seem profound to the crowd strive for obscurity.
                  --Friedrich Nietzsche
        this did produce the average time, but ended up with

        Use of uninitialized value in regexp compilation at ./avgtime.pl line 6, <DATA> line 12.

        this repeated for each line in DATA. i'm a perl moron as you can see, but i'm trying. the down side to these log files is that they can reach 2GB in a matter of hours so creating the temp result file can get somewhat expensive. i'm thinking about grepping out anything with a zero value since really we only want to see when the server has a load
Re: Obtaining Apache logfile stats?
by Not_a_Number (Prior) on Mar 25, 2004 at 23:29 UTC
    use strict; use warnings; my %HoA; while ( <DATA> ) { next unless (split '/')[-1] =~ /(.*)\s+(\d+)_seconds?$/; $HoA{$1}[0] += $2; # Total seconds per 'user' $HoA{$1}[1] ++; # Total times accessed per 'user' } # Print average access time for a given 'user': my $user = 'mod_rewrite.html'; print "Unknown user: $user" and exit unless $HoA{$user}; print "User: $user\n"; print "Total seconds: $HoA{$user}[0]\n"; print "Total accesses: $HoA{$user}[1]\n"; print "Av. access time: ", $HoA{$user}[0] / $HoA{$user}[1]; print "\n\n"; # Print the whole HoA: print "$_: @{ $HoA{$_} }\n" for keys %HoA; __DATA__ [24/Mar/2004:12:26:52 /manual/misc/perf-tuning.html 0_seconds [24/Mar/2004:12:27:33 /manual/mod/mod_status.html 0_seconds [24/Mar/2004:12:27:33 /manual/mod/mod_status.html 33_seconds [24/Mar/2004:12:27:39 /manual/mod/module-dict.html 0_seconds [24/Mar/2004:12:27:46 /manual/misc/rewriteguide.html 0_seconds [24/Mar/2004:12:27:53 /manual/mod/mod_rewrite.html 5_seconds [24/Mar/2004:12:27:53 /manual/images/mod_rewrite_fig1.gif 0_seconds rabbit!!! [24/Mar/2004:12:27:53 /manual/images/mod_rewrite_fig2.gif 0_seconds [24/Mar/2004:12:28:05 /manual/new_features_1_3.html 0_seconds [24/Mar/2004:12:29:53 /manual/mod/mod_rewrite.html 6_seconds [24/Mar/2004:12:29:54 /manual/mod/mod_rewrite.html 7_seconds [24/Mar/2004:12:29:55 /manual/mod/mod_rewrite.html 8_seconds [24/Mar/2004:12:29:56 /manual/mod/mod_rewrite.html 9_seconds

    dave