Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Code review, good 'ol CGI.

by meonkeys (Chaplain)
on Jul 29, 2001 at 08:55 UTC ( [id://100645]=sourcecode: print w/replies, xml ) Need Help??
Category: CGI Programming
Author/Contact Info Adam Monsen <meonkeys@hotmail.com>
Description: The attached code is a CGI that allows someone to maintain the CSV used to generate a Web site for the University of Washington Men's Water Polo team. I seek help, comments, and criticism on: 1)security, 2)speed, 3)the general approach of using a CSV-driven Web site, 4)anything blatanly wrong or stupid, 5)other cool ways to do what I'm doing. My main goal was maintainability. It was easy for me to teach my Polo coach how to update CSV files, and CSV is relatively easy to deal with for the CGI.

I would REALLY appreciate feedback on the entire Web site (two CGIs and a few packages), as well. The entire codebase is viewable: every file has a ".txt" added to the end. The code_review/ dir is analagous to the document root for the actual working Web site.

The main CGI is in "code_review/", Perl modules are in "code_review/myperl/", the CSV updating tool is in "code_review/admin/", and the CSV datafiles are in "code_review/data/". codebase
#!/usr/local/bin/perl -Tw

# Please see entire codebase for this Web site at
# http://students.washington.edu/dawgpolo/code_review/

# Pre-live checklist:
# [X] add some file locking for Godsakes!!!
# [X] link to this thing and do auth ONLY through SSL!!
# [X] protect /admin dir with .htaccess
# [X] use $ENV{ REMOTE_USER } once .htaccess is in place
# [X] make sure to untaint $ENV{ REMOTE_USER }
# [X] write a file-storing routine ... just did it inline instead
# [X] bonus points: use XML :)
# [X] app tempfiles?
# [X] move to "live" data dir automatically?
# [X] add CSV validation to _validate_upload
# [X] test CSV validation to _validate_upload
# [X] count fields in a cool way for $num_fields

use lib '/dw02/d37/dawgpolo/myperl';
use lib '/dw02/d37/dawgpolo/myperl/lib';

use CGI;
use CGI::Carp qw( fatalsToBrowser );  # I think this slows us down
use Data::Dumper;
use Date::Format;
use Dawg qw( %CONFIG %CSV );
use Dawg::DB;
use Dawg::Draw;
use Fcntl qw( :flock );
use strict;

$ENV{ HTTPS } or die "Insecure! Change URL to https://$ENV{ HTTP_HOST 
+}".
                     "$ENV{ SCRIPT_NAME }\n";
$ENV{ HTTPS } =~ tr/A-Za-z//cd;
$ENV{ SERVER_ADMIN } = $CONFIG{ EMAIL };
$ENV{ PATH }         = '/usr/local/bin:/usr/bin:/bin';

my $cgi    = CGI->new;
my $draw   = Dawg::Draw->new;
my $title  = 'CSV Update Tool';
my $self   = $ENV{ SCRIPT_NAME };
my $user   = $ENV{ REMOTE_USER };
$user =~ tr/A-Za-z//cd;
my $userIP = $ENV{ REMOTE_ADDR };

sub main {
  my %dispatch = (
    upload  => \&upload,
    debug   => \&debug,
    DEFAULT => \&upload,
  );

  $draw->headers();
  $draw->html_start( { title => $title } );

  my $action = $cgi->param('action') || 'DEFAULT';
  my $do_sub = $dispatch{ $action };

  if ( $do_sub ) {
    &$do_sub();
  } else {
    warn "[warn] Invalid action: [$action]";
    die "[die] You can't make me do that!\n";
  }

  $draw->html_end();
}

sub upload {
  my @errors = ();  # array of errors found during execution
  my @lines  = ();  # input lines of uploaded file
  my $page   = '';  # page that uploaded file corresponds to
  my $ucpage = '';  # uppercased version of the same thang

  my $mode    = $cgi->param(    'mode'        );
  $mode       =~ tr#a-z_##cd;                        # untaint that pa
+ram!
  my $in      = $cgi->upload( 'datafile_in' );       # returns a fileh
+andle
  my $logfile = "$CONFIG{ ADMIN }/transactions.log"; # log entire tran
+saction
  my $qwiklog = "$CONFIG{ ADMIN }/quicklog.csv";     # abbreviated log

  open( LOG, ">> $logfile" ) or die "Couldn't open $logfile";
  open( QWK, ">> $qwiklog" ) or die "Couldn't open $qwiklog";

  if ( $mode ) {
    # They're trying to upload a file

    push @errors, "Invalid filename" unless ( ref $in eq 'Fh' );

    # retrieve the uploaded file and check for data integrity
    unless ( @errors ) {

      my $rv = Dawg::DB->slurp_csv_safe( $in );
      if ( $rv ) {
        @lines = @$rv;
      } else {
        die "No lines returned while slurping CSV";
      }

      if ( $mode =~ /^([a-z_]+)_go$/ ) {
        $page   = $1 or die "no page given!";
        $ucpage = uc($page);
        die "unknown page [$ucpage]" unless ( exists $CSV{$ucpage} );
        my $result = Dawg::DB->validate_csv( $ucpage, \@lines );
        @errors = @$result unless $result == 1;
      }

    }

    if ( @errors ) {
      print qq[ <FONT COLOR="RED">ERRORS OCCURRED DURING FILE UPLOAD:\
+n<BR> ];
      print qq[ <UL>\n ];
  
      foreach my $error ( @errors ) {
        print "<LI>$error\n";
      }
  
      print qq[ </UL></FONT>\n ];

    } else {
      # upload was good.
      my $time    = time2str( "%C", time() );
      my $outfile = "$CONFIG{DATA}/$page.csv"
        or die "cannot draft datafile filename for [$page]";

      open( CSV, "> $outfile" ) or die "Couldn't open $outfile";

      print qq[ <FONT COLOR="GREEN">$ucpage UPLOAD SUCCESSFUL</FONT><B
+R>\n ];

      # - 1 - lock all files to be written to
      flock( CSV, LOCK_EX|LOCK_NB ) or die "Can't lock CSV";
      flock( LOG, LOCK_EX|LOCK_NB ) or die "Can't lock LOG";
      flock( QWK, LOCK_EX|LOCK_NB ) or die "Can't lock QWK";

      # - 2 - write the datafile itself
      for (@lines) { print CSV "$_\n" }  # write csv to datafile

      # - 3 - write the entire transaction
      print LOG "[ $time ] $ucpage UPDATED by $user [$userIP]\n";
      for (@lines) { print LOG "$_\n" }  # log the transaction

      # - 4 - write a "qwiklog" entry
      print QWK "[ $time ] $ucpage UPDATED by $user [$userIP]\n";

      # - 5 - close all filehandles, drop locks
      close( CSV ) or die "Couldn't close file $outfile";
      close( LOG ) or die "Couldn't close file $logfile";
      close( QWK ) or die "Couldn't close file $qwiklog";
      flock( CSV, LOCK_UN ) or warn "Can't lock CSV";
      flock( LOG, LOCK_UN ) or warn "Can't lock LOG";
      flock( QWK, LOCK_UN ) or warn "Can't lock QWK";
    }
  } else {
    # They're not trying to upload a file YET, so give them tips

  print <<End_HTML;
  
Hints:<BR>
<UL>
<LI>Make sure the uploaded file is TRUE CSV, eg. blah,1,2,"blow, joe"
  <LI>Click the respective Upload button once, then <EM>wait</EM>
<LI>You can only upload one file at a time
</UL>

End_HTML
  }

  foreach my $table ( sort keys %CSV ) {
    my $name = lc($table);

  print <<UploadForms;
  
<BR>
Download <A HREF="$CONFIG{ WEBDATA }/$name.csv">$name</A> datafile
<FORM ACTION="$self" METHOD="POST" ENCTYPE="multipart/form-data">
<INPUT TYPE="FILE" NAME="datafile_in" SIZE="50" MAXLENGTH="80">
<BR>
<INPUT TYPE="SUBMIT" VALUE="Upload $name">
<INPUT TYPE="HIDDEN" NAME="mode" VALUE="${name}_go">
</FORM>
<BR>

UploadForms
  }
}

sub debug {
  print "<PRE>\n";
  print Data::Dumper->Dump( [\%ENV], ['ENV'] );
  print "</PRE>\n";
}

main();
Replies are listed 'Best First'.
Re: Code review, good 'ol CGI.
by tachyon (Chancellor) on Jul 29, 2001 at 11:26 UTC

    A few points. When locking a file I use this little snippet to allow a script to wait for its lock:

    my $flock = 1; # set to true to flock my $timeout = 10; # set to number of seconds before flock timeout .... if ($flock) { my $count = 0; until (flock FILE, LOCK_SH) { sleep 1; DieNice("Can't lock file '$file': $!\n") if ++$count >= $timeout +; } }

    Also closing a filehandle removes the underlying file lock so you only need to close them, Perl will unlock them for you thus all the LOCK_UN lines are redundant.

    This line does nothing useful, it does not untaint $mode

    $mode =~ tr#a-z_##cd; # untaint that param!

    This bit uses $mode in a regex and the value in $1 assigned to $page is untainted.

    if ( $mode =~ /^([a-z_]+)_go$/ ) { $page = $1 or die "no page given!";

    The or die makes no logical sense as the assignment $page = $1 will always succeed thus this can never execute.

    You have a number of or die "blah" statements. You should include the $! special var as this stores Perl's explanation of the error that has triggered the die ie "File does not exist". The standard synatax goes like:

    open FILE, ">>$file" or die "Can't append to $file, Perl says $!\n";

    This code is a very obtuse way to do something quite simple, namely go to one of two subs depending on the user input.

    sub main { my %dispatch = ( upload => \&upload, debug => \&debug, DEFAULT => \&upload, ); $draw->headers(); $draw->html_start( { title => $title } ); my $action = $cgi->param('action') || 'DEFAULT'; my $do_sub = $dispatch{ $action }; if ( $do_sub ) { &$do_sub(); } else { warn "[warn] Invalid action: [$action]"; die "[die] You can't make me do that!\n"; } $draw->html_end(); }

    You create a hash of sub references and then eventually call them. This is much easier to follow:

    sub main { $draw->headers(); $draw->html_start( { title => $title } ); if ($cgi->param('action') eq 'debug'){ &debug; } else { &upload; } $draw->html_end(); }

    The logic is the same with the exception of not checking for 'incorrect' action params. This is the typical logic I use as it is simple - either you ask for something with the correct name or you get the default page. The use of warn and die as you do is also redundant. Both warn and die print to STDERR. Warn just does this whereas die throws an expception afterwards. Logically you are just doing a die here.

    Your open files in two different places in the script then lock them all in the one place. This makes no logical sense to me and makes it hard to follow your logic train.

    Finally in CGI die is oftem suboptimal. You should not run a production script with use CGI::Carp qw( fatalsToBrowser ) active as it makes it easier to hack a script as the exact errors are reported in the browser. Without carp die will give the user a 500 error when it gets called. For these reasons amongst others most developers write a die_nice routine. Here is a die_nice routine:

    sub DieNice { my $message = shift; my ($package, $file, $line) = caller(); my $user_message = "Sorry, the system is currently unable to proce +ss your request<br>\n"; $user_message .= "due to routine maintenance. Please try again lat +er. Our Apologies\n"; $message = Unindent <<" MESSAGE"; A fatal die was trapped by the $scriptname die nice routine, detai +ls: Time: $datetime List name: $list_name Script name: $scriptname Package: $package File: $file Line Number: $line Error Message: $message MESSAGE &TellUser($user_message); # the bullshit message! &WarnAdmin($message); # the real facts! exit; } sub WarnAdmin { my $message = shift; my $name = "Administrator"; my $email = $our_email; &EmailUser ($name, $email, $message); return; } sub Unindent { my $unindent = shift; $unindent =~ s/^[ \t]+//gm; return $unindent; }

    The unindent sub just lets me indent the herepage stuff with the sub and drop this indentation off the final output. Hope this helps, all up looks good with -Tw, use strict, setting a secure path etc.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      First of all, I just want to say thank you for the excellent comments you offered. I've coded Perl for about a year now, but within Text::Forge I'm shielded from most of the issues involved in traditional CGI programming, hence my immature code. I ignored some of your suggestions because I'm lazy.

      This line does nothing useful, it does not untaint $mode
      tr/// can't be used to untaint? I didn't know that.

      This code is a very obtuse way to do something quite simple, namely go to one of two subs depending on the user input.
      Quite true. This is just my "cookie cutter" dispatch hash... it makes more sense in a CGI that has many functions; although a CGI with many functions probably doesn't make much sense...

      The main thing I've changed below is file locking. Could you take a look at this and tell me what you think? I'm attempting to use semaphores as advised by KM in a few of his/her posts.

      I like the following and will definitely use them in my next CGI:
      1. waiting for lock sleeper 2. your DieNice, WarnAdmin, and Unindent routines
      #!/usr/local/bin/perl -Tw # Pre-live checklist: # [X] add some file locking for Godsakes!!! # [X] link to this thing and do auth ONLY through SSL!! # [X] protect /admin dir with .htaccess # [X] use $ENV{ REMOTE_USER } once .htaccess is in place # [X] make sure to untaint $ENV{ REMOTE_USER } # [X] write a file-storing routine ... just did it inline instead # [X] bonus points: use XML :) # [X] app tempfiles? # [X] move to "live" data dir automatically? # [X] add CSV validation to _validate_upload # [X] test CSV validation to _validate_upload # [X] count fields in a cool way for $num_fields use lib '/dw02/d37/dawgpolo/myperl'; use lib '/dw02/d37/dawgpolo/myperl/lib'; use CGI; use CGI::Carp qw( fatalsToBrowser ); # I think this slows us down use Data::Dumper; use Date::Format; use Dawg qw( %CONFIG %CSV ); use Dawg::DB; use Dawg::Draw; use Fcntl qw( :flock ); use strict; $ENV{ HTTPS } or die "Insecure! Change URL to https://$ENV{ HTTP_HOST +}". "$ENV{ SCRIPT_NAME }\n"; $ENV{ HTTPS } =~ tr/A-Za-z//cd; $ENV{ SERVER_ADMIN } = $CONFIG{ EMAIL }; $ENV{ PATH } = '/usr/local/bin:/usr/bin:/bin'; my $cgi = CGI->new; my $draw = Dawg::Draw->new; my $title = 'CSV Update Tool'; my $self = $ENV{ SCRIPT_NAME }; my $user = $ENV{ REMOTE_USER }; $user =~ tr/A-Za-z//cd; my $userIP = $ENV{ REMOTE_ADDR }; sub main { my %dispatch = ( upload => \&upload, debug => \&debug, DEFAULT => \&upload, ); $draw->headers(); $draw->html_start( { title => $title } ); my $action = $cgi->param('action') || 'DEFAULT'; my $do_sub = $dispatch{ $action }; if ( $do_sub ) { &$do_sub(); } else { warn "[warn] Invalid action: [$action]"; die "[die] You can't make me do that!\n"; } $draw->html_end(); } sub upload { my @errors = (); # array of errors found during execution my @lines = (); # input lines of uploaded file my $page = ''; # page that uploaded file corresponds to my $ucpage = ''; # uppercased version of the same thang my $mode = $cgi->param( 'mode' ); $mode =~ tr#a-z_##cd; # untaint that pa +ram! my $in = $cgi->upload( 'datafile_in' ); # returns a fileh +andle my $logfile = "$CONFIG{ ADMIN }/transactions.log"; # log entire tran +saction my $qwiklog = "$CONFIG{ ADMIN }/quicklog.csv"; # abbreviated log open( LOG, ">> $logfile" ) or die "Couldn't open $logfile"; open( QWK, ">> $qwiklog" ) or die "Couldn't open $qwiklog"; if ( $mode ) { # They're trying to upload a file push @errors, "Invalid filename" unless ( ref $in eq 'Fh' ); # retrieve the uploaded file and check for data integrity unless ( @errors ) { my $rv = Dawg::DB->slurp_csv_safe( $in ); if ( $rv ) { @lines = @$rv; } else { die "No lines returned while slurping CSV"; } if ( $mode =~ /^([a-z_]+)_go$/ ) { $page = $1; $ucpage = uc($page); die "unknown page [$ucpage]" unless ( exists $CSV{$ucpage} ); my $result = Dawg::DB->validate_csv( $ucpage, \@lines ); @errors = @$result unless $result == 1; } } if ( @errors ) { print qq[ <FONT COLOR="RED">ERRORS OCCURRED DURING FILE UPLOAD:\ +n<BR> ]; print qq[ <UL>\n ]; foreach my $error ( @errors ) { print "<LI>$error\n"; } print qq[ </UL></FONT>\n ]; } else { # upload was good. my $time = time2str( "%C", time() ); my $outfile = "$CONFIG{DATA}/$page.csv" or die "cannot draft datafile filename for [$page]"; open( CSV, "> $outfile" ) or die "Couldn't open $outfile"; print qq[ <FONT COLOR="GREEN">$ucpage UPLOAD SUCCESSFUL</FONT><B +R>\n ]; # - 1 - open and lock semaphores my $csv_sem = "$CONFIG{ADMIN}/.csv.lock"; my $log_sem = "$CONFIG{ADMIN}/.transactions.lock"; my $qwk_sem = "$CONFIG{ADMIN}/.quicklog.lock"; open( CSV_SEM, "> $csv_sem" ) or die "$csv_sem: $!"; open( LOG_SEM, "> $log_sem" ) or die "$log_sem: $!"; open( QWK_SEM, "> $qwk_sem" ) or die "$qwk_sem: $!"; flock( CSV_SEM, LOCK_EX ) or die "Can't lock CSV"; flock( LOG_SEM, LOCK_EX ) or die "Can't lock LOG"; flock( QWK_SEM, LOCK_EX ) or die "Can't lock QWK"; # - 2 - write the datafile itself for (@lines) { print CSV "$_\n" } # write csv to datafile # - 3 - write the entire transaction print LOG "[ $time ] $ucpage UPDATED by $user [$userIP]\n"; for (@lines) { print LOG "$_\n" } # log the transaction # - 4 - write a "qwiklog" entry print QWK "[ $time ] $ucpage UPDATED by $user [$userIP]\n"; # - 5 - close all filehandles, drop locks close( CSV ) or die "Couldn't close file $outfile"; # file lock +s close( LOG ) or die "Couldn't close file $logfile"; # are remov +ed close( QWK ) or die "Couldn't close file $qwiklog"; # automatic +ally close( CSV_SEM ) or warn "close error on $csv_sem: $!"; close( LOG_SEM ) or warn "close error on $log_sem: $!"; close( QWK_SEM ) or warn "close error on $qwk_sem: $!"; } } else { # They're not trying to upload a file YET, so give them tips print <<End_HTML; Hints:<BR> <UL> <LI>Make sure the uploaded file is TRUE CSV, eg. blah,1,2,"blow, joe +" <LI>Click the respective Upload button once, then <EM>wait</EM> <LI>You can only upload one file at a time </UL> End_HTML } foreach my $table ( sort keys %CSV ) { my $name = lc($table); print <<UploadForms; <BR> Download <A HREF="$CONFIG{ WEBDATA }/$name.csv">$name</A> datafile <FORM ACTION="$self" METHOD="POST" ENCTYPE="multipart/form-data"> <INPUT TYPE="FILE" NAME="datafile_in" SIZE="50" MAXLENGTH="80"> <BR> <INPUT TYPE="SUBMIT" VALUE="Upload $name"> <INPUT TYPE="HIDDEN" NAME="mode" VALUE="${name}_go"> </FORM> <BR> UploadForms } } sub debug { print "<PRE>\n"; print Data::Dumper->Dump( [\%ENV], ['ENV'] ); print "</PRE>\n"; } main();
Re: Code review, good 'ol CGI.
by seanbo (Chaplain) on Jul 29, 2001 at 19:59 UTC
    <snip> use CGI::Carp qw( fatalsToBrowser ); # I think this slows us down </snip>

    In regards to the above code, when going live, I would drop the errors to the log, not the browser. You may end up giving away too much information to the person viewing the webpage and that could be a security issue.


    seanbo
    Ahh..the odd dog is a strange beast indeed, nobody wants him, but he always seems to be there.
Re: Code review, good 'ol CGI.
by DBX (Pilgrim) on Jul 31, 2001 at 10:12 UTC
    I may have missed it so forgive me, but I recommend you check out DBD::CSV. I use it for simple sites that require a csv database for several reasons.

    First, it's scalable to a real DB incredibly easily because your code, except for one line in the connection routine is compatible with any DBD::*. So if you upgrade to mysql for example, you just move the data to the db, change how you connect and your code works.

    Also, all your DB calls are in SQL, which is fairly standard especially for simple stuff and allows you not to worry about the order of the file or the fields, just make your query.

    Finally, the DB can still be edited, uploaded or created by hand and work with your cgi scripts.

    Again, I took a fairly quick glance at your code, so forgive me if I missed something, I just thought this might interest you.

Re: Code review, good 'ol CGI.
by kschwab (Vicar) on Jul 29, 2001 at 09:26 UTC
    Minor nit, these three lines should end in "Can't unlock..." :

    flock( CSV, LOCK_UN ) or warn "Can't lock CSV"; flock( LOG, LOCK_UN ) or warn "Can't lock LOG"; flock( QWK, LOCK_UN ) or warn "Can't lock QWK";

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://100645]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-28 16:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found