Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Statistical Graphs with gnuplot and mod_perl

by projekt21 (Friar)
on Sep 11, 2003 at 09:19 UTC ( [id://290624]=sourcecode: print w/replies, xml ) Need Help??
Category: Web Stuff
Author/Contact Info Alex Pleiner <alex@zeitform.de>
Description:

When my company's sysadmin emailed me some log files from sysstat (http://perso.wanadoo.fr/sebastien.godard/) with the note "Can you make some graphs out of those?", I started to:

  • write a converter from sysstat logs into a gnuplot data file (basically a row to col converter)
  • write a form based web-frame to select files, columns and options, generate queries for a gnuplot backend and display the resulting graphs
  • finally write a gnuplot backend that outputs corresponding graphs

The gnuplot backend is a module currently called Apache::Gnuplot that can be used for nearly any statistical data files in gnuplot format.

The converter and web-frame are not published yet, but you may contact me if you like to have a look at it. See http://alex.zeitform.de/unsorted/saplot.png for a screenshot.

Fellow monks, I would like to read your comments on possible errors or improvements. And please let me know if this is of any interest to put it on CPAN. Thanks.

Update: code cleanup for readability

Update2: added input validation thanks to Abigail-II

# Apache::Gnuplot
# (c) 2003 zeitform Internet Dienste - zeitform@zeitform.de
# This program is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#
# Plots data via gnuplot.
#
# [Note: we currently use it as plotting backend in a system for displ
+aying
# sysstat logs (see: http://perso.wanadoo.fr/sebastien.godard/) after 
+piping
# those logs though a simple row to col converter. It can be used to p
+lot nearly
# any preprocessed data files.]
#
# httpd.conf:
#-----------------------------
# <Location /gnuplot>
#    SetHandler  perl-script
#    PerlHandler Apache::Gnuplot
#    Options +ExecCGI
#    PerlAddVar DataDir       /usr/local/apache/gnuplot/data
#    PerlAddVar Gnuplot       "/usr/local/bin/gnuplot - 2>/dev/null"
#    PerlAddVar IgnoreInvalid 1
#    PerlSendHeader On
# </Location>
#
# Path_Info:
#-----------------------------
# - directory below "DataDir" that contains the data files
#
#
# GET-Parameters:
#----------------------------
# Parameters marked as "multiple" can be given multiple times.
#
# - f (multiple): filename containing the data in the following format
#
#       # format col1-name col2-name ...
#       10:30    0.45      0.56      ...
#       10:40    0.63      0.23      ...
#       ...
#
#       where "format" can be "time" or "date" followed by an optional
+ strftime
#       compliant format string
#       Examples: "time[%H:%M]", time[%y-%m-%d], date[%H%M], etc
#
# - c (multiple) : column name to plot
#
# - ymin  : start of y range
# - ymax  : end of y range
# - scale : scale factor (a scale of 1 generates images of size 640x48
+0)
# - sum   : include calculated sum of multiple columns into the plot
#
# Example URLs
#-----------------------------
#
# http://host.com/gnuplot/subdir?f=datafile1&c=4&c=5&c=6
#   prints the columns 4 to 6 from the data file "datafile1"
#
# http://host.com/gnuplot/subdir?f=datafile1&f=datafile2&c=4&c=5&c=6
#   prints the columns 4 to 6 from the data files "datafile1" and "dat
+afile2"
#
# http://host.com/gnuplot/subdir?f=datafile1&c=4&c=5&c=6&scale=1.5
#   same as example 1 but scales the image by 1.5
#
# http://host.com/gnuplot/subdir?f=datafile1&c=4&c=5&c=6&sum=1
#   same as example 1 but add the sum of the colums into the plot
#

package Apache::Gnuplot;

use strict;
use Apache::Constants qw(:common :response);
use CGI;

use vars qw($VERSION);

$VERSION = "0.4";

## variables (edit if you need to -- and understand them :-)
my $combined_title = "combined"; # title for plot if more than 1 file 
+is plotted
my $sum_title      = "sum";      # title for column sums if plotted
my $n_a_title      = "n/a";      # title for non-existent columns if p
+lotted

############################################################
sub handler
############################################################
  {

    my $r = shift;

    ## get gnuplot path gnuplot
    my $gnuplot   = $r->dir_config('Gnuplot') 
        || "/usr/local/bin/gnuplot - 2>/dev/null";

    ## get data file path and add path_info
    my $data_dir  = $r->dir_config('DataDir') || return SERVER_ERROR;
    my $plot_path = join("/", $data_dir, $r->path_info());
    return NOT_FOUND unless -d $plot_path;

    ## get setting to ignore invalid values
    my $ignore_invalid = $r->dir_config('IgnoreInvalid') || 0;

    ## cgi object
    use CGI;
    my $q = CGI->new;

    ## params (columns and files)
    ## instead of "return BAD_REQUEST" you may "print_empty_gif($r)"
    ## I added input validation thanks to a note from Abigail-II
    ## I guess only @columns and @files were vulnerable.
    my @columns = sort map { $_ =~ s/[^\w.%\/-]//g } grep { $_ } $q->p
+aram("c") or return BAD_REQUEST;
    my @files   = sort map { $_ =~ s/[^\w.-]//g } grep { $_ } $q->para
+m("f") or return BAD_REQUEST;

    ## other options
    my $ymin   = $q->param("ymin");  $ymin += 0;
    my $ymax   = $q->param("ymax");  $ymax += 0;
    my $scale  = $q->param("scale"); $scale += 0;
    my $sum    = $q->param("sum");

    ## read data file for field names from newest data file
    my @plotrule;
    my $time_format = ""; ## this is not very good (see below)

    foreach my $file (@files)
      {
        open DAT, "$plot_path/$file" or return NOT_FOUND;

        ## @col_name is an array containing the column names listed in
+ the 
        ## first row.
        my @col_name = split(/\s+/, <DAT>);

        close DAT;

        ## if the first row does not contain a comment sign, there see
+ms
        ## to be no line with column names, so we create one with dumm
+y
        ## values. The first "column name" can be "#" or "#format". 
        ## We check this and correct it if necessary.
        if ($col_name[0] !~ /^#/)       { $col_name[$_] = "column_$_" 
+foreach (0 .. @col_name); }
        if ($col_name[0] =~ /^#+[^#]+/) { $col_name[0] =~ s/^#+//; uns
+hift @col_name, "#"; }

        ## we can use @col_name to get a column name for the column nu
+mber, to 
        ## get a  column number for a given name we need a hash (rever
+se lookup)
        my %col_number = map { $col_name[$_] => $_ } (0 .. @col_name);

        ## check for time format - this should be done once for all fi
+les, but how?
        ## I mean, what if the formats are mixed in different files?
        if    ($col_name[1] =~ /(time|date)\[(.+)\]/) {  $time_format 
+= $2; }
        elsif ($col_name[1] =~ /(time|date)/)         {  $time_format 
+= "%H:%M"; } ## default

        ## create a plot rule for this files columns
        ## plot all columns but ignore invalid
        if ($ignore_invalid)
          {
            push @plotrule,
              join(",",
                map {
                  sprintf("\"%s\" using 1:%d title \"%s%s\" with lines
+",
                     "$plot_path/$file",              ## path to data 
+file
                     $col_number{$_},                 ## column number
                     (@files > 1 ? "$file-" : ""),    ## filename in l
+egend?
                     $_                               ## column name f
+or legend
                  )
                }
                grep { $col_number{$_} } @columns     ## only existing
+ columns
              );
      }
        ## plot all columns and mark non-existing with "n/a"
        else
          {
             push @plotrule,
               join(",",
                 map {
                   ($col_number{$_}               ## do we have that c
+olumn?
                      ? sprintf("\"%s\" using 1:%d title \"%s%s\" with
+ lines",
                          "$plot_path/$file",           ## path to dat
+a file
                          $col_number{$_},              ## column numb
+er
                          (@files > 1 ? "$file-" : ""), ## filename in
+ legend?
                          $_                            ## column name
                        )
                      : sprintf("0 title \"%s%s\"",
                          (@files > 1 ? "$file-" : ""), ## filename in
+ legend?
                          $n_a_title                    ## column name
+ ("n/a")
                        )
                    )
                  } @columns                            ## all columns
                );

          }

     ## do we plot a sum?
     if ($sum && @columns > 1)
       {
          push @plotrule,
            sprintf("\"%s\" using 1:%s title \"%s%s\" with lines",
                      "$plot_path/$file",              ## path to data
+ file
                        sprintf("(%s)",                  ## sum ($1+$2
++$3...)
                          join("+",
                            map { '$' . $col_number{$_} } #'keep syn-h
+ilight
                            grep { $col_number{$_} } @columns
                          )
                        ),
                        (@files > 1 ? "$file-" : ""),    ## filename i
+n legend 
                        $sum_title                       ## column nam
+e ("sum")
                );
        }
      }

    ## do we have something to plot
    if (@plotrule)
      {
        ## gnuplot config file
        ## see the gnuplot manual if you need to change things here.
        my $plot = "";
        $plot .= "set xdata time\n"                if $time_format;
        $plot .= "set timefmt \"$time_format\"\n"  if $time_format;
        $plot .= "set format x \"$time_format\"\n" if $time_format;
        $plot .= "set key below\n";
        $plot .= "set output\n";
        $plot .= "set terminal png\n";
    
        $plot .= "set size $scale $scale\n"        if $scale;
        $plot .= "set title \"" . 
          (@files > 1 ? $combined_title : join(" ", @files)) . "\"\n";
        $plot .= "plot [:] " . (($ymax - $ymin) ? "[$ymin:$ymax]" : ""
+) . 
          join(",", @plotrule);

        ## output image
        $r->content_type("image/png");
        $r->send_http_header;

        open PLOT, "echo \'$plot\' | $gnuplot |" or return SERVER_ERRO
+R;
        print while sysread(PLOT, $_, 1024);
        close PLOT;
      }
    else
      {
        ## no image
        print_empty_gif($r);
      }

    return OK;

  }

############################################################
sub print_empty_gif
############################################################
  {
    my $r = shift;
    ## print empty 1x1 pixel gif
    $r->content_type("image/gif");
    $r->send_http_header;

    print pack("H*", $_)
      foreach (qw/ 47 49 46 38  39 61 01 00  01 00 80 00  00 ff ff ff
               00 00 00 21  f9 04 01 14  00 00 00 2c  00 00 00 00
               01 00 01 00  00 02 02 44  01 00 3b                  /);
  }

############################################################
1;
Replies are listed 'Best First'.
Re: Statistical Graphs with gnuplot and mod_perl
by Abigail-II (Bishop) on Sep 11, 2003 at 09:31 UTC
    open PLOT, "echo \'$plot\' | $gnuplot |" or return SERVER_ERROR;

    Considering that $plot partially consists of unfiltered user input, this looks like a very, very bad idea to me. Not to mention that you do the same with open().

    Abigail

      Yes, you are totally right. I will add the necessary input checking.

      Thanks++ for reminding.

      As a partial defense: We run this in a private network, with high restrictions. So only our admin has access to that.

      alex pleiner <alex@zeitform.de>
      zeitform Internet Dienste

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://290624]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-03-29 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found