Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Automatically generate form parsing code

by Anonymous Monk
on Jun 15, 2001 at 02:03 UTC ( #88647=sourcecode: print w/ replies, xml ) Need Help??

Category: CGI Programming
Author/Contact Info

This is Ovid posting anonymously. No need for XP for what is essentially a 'double posting'.

Description:

Run this code with an HTML file as the argument. It will create a series of files with one file created for each form in the HTML document(try it against a perlmonks page, you might be surprised). Each document will contain complete form-handling code to read in the data and even has generic untainting routines.

Read the POD for full documentation on its use. Very easy to use and moderately customizable.

#!/usr/bin/perl -w
require 5.004;
use strict;
use CGI;
use File::Basename;
use HTML::TokeParser;
    
my $doc      = shift or &usage;
my $basename = basename( $doc );
   $basename =~ s/\..*$//;
my $p        = HTML::TokeParser->new($doc) || die "Can't open: $!";
my $formnum  = 1;

######--------------------------------------######
#       Begin user config section                #
######--------------------------------------######

# The following variables should be set by the user to control        
+                
# the output of the code generator
#
# The only thing you *need* to set it the shebang line.  The other opt
+ions
# are just for configuration.
#
# The following variable MUST only containt letters, numbers, or under
+scores:
#   qw/ $taint_pfx $cgi_obj $err_var $err_sub $log_sub /
# If you slip up, the script will die rather than produce bad code.

# This is the prefix of all variables that need to be untainted.
# should be letters, numbers, or underscores.
my $taint_pfx = 'tainted_'; 

# set this to false to have OO cgi code written out.
my $cgi_std   = 0;

# if $cgi_std is set to false, use this to specify the variable
# name of the CGI object (e.g. 'q' becomes 'my $q = CGI->new;')
my $cgi_obj   = 'q';

# This is the shebang line that will be used.   
# If left blank, it will be skipped.
my $shebang   = '#!/usr/bin/perl -wT';

# use this for the name of the hash that contains the errors 
my $err_var   = 'errors';

# set this to true to have the program print the &error stub
my $print_err = 1;

# set this to the name of your error handling routine
my $err_sub   = 'error'; 

# if $err_sub is true, this will be the stub of your security
# log routine.  If the form has been tampered with (i.e. data
# in @safe_types does not untaint), the use this to log the info.
my $log_sub   = 'sec_log';

# set this to true for lower case variable names
# If your forms elements that have the same name except
# for case, this could cause problems.
my $lc_names  = 1;

######--------------------------------------######
#       End user config section                  #
######--------------------------------------######

# These are the form elements for which we can *safely* create regular
+ expressions
# for untainting.  Do not change this array unless you know what you a
+re doing.
my @safe_types = qw/ hidden checkbox radio select submit /;

my %element;       # holds all of the form element types, names, and v
+alues
my %select;        # holds select form elements so we know if we've se
+en them
my @element_order; # order that elements appear in the form.  It's not
+ really
                   # needed, but we do this so that variables in gener
+ated code
                   # appear roughly in the same order as the form.
my $select_token;  # holds the select token when parsing <select> valu
+es


# walk through document and get each tag
# we're building a list of form elements here
while (my $token = $p->get_tag) {
    my $tag = $token->[0];
    if ( my $form_pos = ( $tag eq 'form' .. $tag eq '/form' ) ) {

        # Oh!  We're in a form.  Start looking for stuff.
        if ( $form_pos != 1 and substr( $form_pos, -2 ) ne 'E0' ) {
            add_input_element( $token )  if $tag eq 'input';

            # <select> is a pain, so we need to handle it differently
            if ( my $select_pos = ( $tag eq 'select' .. $tag eq '/sele
+ct' ) ) {
                $select_token = $token if $tag eq 'select';
                if ( $select_pos != 1 and substr( $select_pos, -2 ) ne
+ 'E0' ) {
                    add_select_element( $token, $p, $select_token ) if
+ $tag eq 'option';
                } elsif ( substr( $select_pos, -2 ) eq 'E0' ) {
                    # we've finished the <select>, so add it to
                    # %select so we knows we've seen it.
                    $select{ $select_token->[1]->{ 'name' } } = '';
                }
            } # end if (select)

            foreach ( qw/ textarea button / ) {
                add_generic_element( $token ) if $tag eq $_;
            }
        } elsif ( substr( $form_pos, -2 ) eq 'E0' ) {
            # we've finished the form, so let's write the document, cl
+ear the vars,
            # and start looking for more forms.
            &write_template;
            %element       = ();
            @element_order = ();
            $formnum++;
        }
    } # end if (form)
}

# This is to extract necessary data from form element
# that will later be added to %element
sub add_generic_element {
    my $token = shift;
    my $tag   = $token->[0];
    my $name  = $token->[1]->{ 'name' }  || '';
    my $value = $token->[1]->{ 'value' } || '';

    # I don't want to pass 'text' defaults as they can often be huge
    if ( $tag eq 'textarea' ) {
        $value = '';
    }
    update_element_hash( $name, $tag, $value, 1 );    
}

# <select> needs to be handled different.
sub add_select_element {
    my ( $token, $p, $select_token ) = @_;

    my $name  = $select_token->[1]->{ 'name' } || '';
    my $value = $token->[1]->{ 'value' }       || '';
    
    # The following is because an <option> with a select will
    # assume the value of the text if no value is specified
    if ( ! $value ) {
        $value = $p->get_trimmed_text;
    }
    
    # If <select> has the 'multiple' attribute or we've previously
    # encountered a <select> with the same name, set multiple to true
    my $multiple = exists $select_token->[1]->{ 'multiple' } ? 1 :
                   exists $select{ $name }                   ? 1 :
                                                               0 ;
    update_element_hash( $name, 'select', $value, $multiple );
}

# Need to format the data from the input element
sub add_input_element {
    my $token = shift;
    my $type  = $token->[1]->{ 'type' }  || '';
    my $name  = $token->[1]->{ 'name' }  || '';
    my $value = $token->[1]->{ 'value' } || '';
    my $multiple = $type eq 'radio' ? 0 : 1;
    update_element_hash( $name, $type, $value, $multiple );
}

# here's where the formatted form element data is actually added
# to the element hash.
sub update_element_hash {
    my ( $name, $type, $value, $multiple ) = @_;
    return if $type eq 'reset'; # This only affects the form, not the 
+script
    $value =~ s/\n/\\n/g;
    if ( $name ) {
        if ( ! exists $element{ $name } ) {
            $element{ $name }{ 'multiple' } = 0;
            $element{ $name }{ 'type' }     = $type;
            $element{ $name }{ 'value' }    = $value ? [ $value ] : []
+;
            push @element_order, $name;
        } else {
            $element{ $name }{ 'multiple' } += $multiple;
            push @{ $element{ $name }{ 'value' } }, $value if $value;
        }
    }
}
    
# Duh!
sub usage {
    print <<"    END_HERE";
    Usage:  formparse.pl some.html
    END_HERE
    exit;
}

# Oh, goody!  We actually get to start writing the code :)
sub write_template {
    my $filename = "${basename}_form_${formnum}.cgi";
    my $cgi_var = $cgi_std ? '' : "\$${cgi_obj}->";
    my $cgi_line = 'use CGI';
    $cgi_line   .= $cgi_std ? " qw/:standard/;\n" : ";\n";

    my $max_var_length = 1;
    foreach ( keys %element ) {
        $max_var_length = length if length > $max_var_length;
    }
    
    open OUT, "> $filename" or die "Can't open $filename for writing: 
+$!";

    print OUT $shebang."\n" if $shebang;
    print OUT &template;
    print OUT "use strict;\n";
    print OUT $cgi_line;

    if ( ! $cgi_std ) {
        print OUT "my \$$cgi_obj = CGI->new;\n";
    }
    print OUT "my \%$err_var;\n";
    print OUT "\n# Grab all data\n";

    # Here's where we print param() calls
    foreach my $element ( @element_order ) {
        my $var_name  = get_var_name( $element );
        my $data_type = $element{ $element }{ 'multiple' } ? '@'  : '$
+';
        my $default   = $element{ $element }{ 'multiple' } ? '()' : "'
+'";
        print OUT qq/my ${data_type}${taint_pfx}${var_name} / .
                  ' ' x ( $max_var_length - length $var_name ) .
                  qq/= ${cgi_var}param( '$element' )/ .
                  ' ' x ( $max_var_length - length $var_name ) .
                  qq/ || $default; # $element{ $element }{ 'type' }\n/
+;
    }
    print OUT <<"    END_HERE";
    
# The following is just a rough "fill in" template for untainting your
+ data.
# It will need to be customized to suit your particular needs.  You'll
+ need
# to create regular expressions to untaint your data and if you skimp 
+on this,
# it's at your peril!!!

    END_HERE
    # here's where we print the untainting template
    foreach my $element ( @element_order ) {
        my $var_name  = get_var_name( $element );
        my $type      = $element{ $element }{ 'type' };
        my $multiple  = $element{ $element }{ 'multiple' };

        if ( $multiple ) {
            # Ooh, multiple values.  Need to untaint an array.
            print OUT qq!my \@${var_name}; # $type values: ! . join( "
+,", @{ $element{ $element }{ 'value' } } ) . "\n";
            print OUT "foreach ( 0 .. \$#${taint_pfx}${var_name} ) {\n
+".
                  qq!    ( \$${var_name}\[\$_] ) ! .
                  untainting_code( $element, $var_name, $multiple, $ty
+pe, '[$_]' ) .
                  "}\n\n";
        } else {
            # Untainting a scalar.
            print OUT qq!# $type values: ! . join( ",", @{ $element{ $
+element }{ 'value' } } ) . "\n";
            print OUT qq!my ( \$${var_name} ) ! .
                      untainting_code( $element, $var_name, $multiple,
+ $type, '' ) . "\n";
        }
    }
    
    print OUT err_stub() if $print_err;
    close OUT or die "Can't close $filename: $!";
}

# return the code that will actually go in the untainting routine
sub untainting_code {
    my ( $element, $var_name, $multiple, $type, $index ) = @_;
    my $var_prefix = $multiple ? '@' : '$';
    return qq!= ( ! . untainting_regexes( $element, $var_name, $multip
+le ) . " )\n" .
           qq!        or \$${err_var}\{ '$var_name' } = [ "$type", \\$
+{var_prefix}${taint_pfx}${var_name}, "You must supply a proper value 
+for '$var_name'.  Allowed characters are letters, numbers, or punctua
+tion." ];\n!;    
}

# need to create the regexes for untainting.
sub untainting_regexes {
    my ( $element, $var_name, $multiple ) = @_;
    my $type    = $element{ $element }{ 'type' };
    my $code;

    # Can't create a safe regex, so we insert code to kill the script 
+unless
    # the programmer creates his or her own regex
    $code =  '"" ), die  # could not auto-create regex # ';
    return $code if ! grep { /$type/ } @safe_types;

    my @values  = @{ $element{ $element }{ 'value' } };
    @values = return_unique_items( @values ) if @values;
    if ( scalar @values > 1 ) {
        my ( $all_length_of_one, $formatted_values ) = escape_values( 
+\@values );

        my $array_index = $multiple ? '[$_]' : '';
        
        if ( $all_length_of_one ) {
            # we're returning a character class
            $code = qq!\$${taint_pfx}${var_name}$array_index =~ /^(! .
+ '['. join( '', @$formatted_values ) .'])$/';
        } else {
            # we have multiple values that a character class is not su
+itable for,
            # so we return separate regex tests for each value
            #my $offset = $multiple ? length( $taint_pfx ) + 17 : leng
+th( $taint_pfx ) + 12;
            my $offset = $multiple ? 18 : 13;
            $code = qq!\$${taint_pfx}${var_name}$array_index =~ /^(! .
+ $$formatted_values[0] . ")\$/ or \n";
            for ( 1 .. scalar @$formatted_values - 1 ) {
                $code .= ' ' x ( $offset + length $var_name );
                $code .= qq!\$${taint_pfx}${var_name}$array_index =~ /
+^(! . $values[$_] . ")\$/";
                $code .= " or \n" if $_ < scalar @$formatted_values - 
+1;
            }
        }
    } else {
        $code = qq!\$${taint_pfx}${var_name} =~ /^(! . quotemeta( $val
+ues[0] ) . ')$/';
    }
    return $code;
}

###############################################################
# if you want to maintain this, please note that this appears #
# to have subs within subs.  Look closer:  HERE docs.  Rather #
# tricky.                                                     #
###############################################################

sub err_stub {
    my $code =<<END_CODE;
$err_sub( \\\%$err_var ) if \%$err_var;

sub $err_sub {
    my \$err_hash = shift;
    # \$err_hash is a reference to an anoymous hash.  Keys are form at
+tribute names and values
    # are an anonymous array: [ element type, value, error_message ]
    # Example: 
    # \$err_hash = { 
    #       'username' => [ 'text', '????', "You must supply a proper 
+value for 'username'..." ]
    # }
    # Might be generated from: <input type="text" name="username">
END_CODE
    $code .= "    my \@safe_elements = qw/ ";

    foreach ( @safe_types ) {
        $code .= "$_ " if $_ ne 'checkbox'; # skip checkbox as it's po
+ssible for no data to be sent
    }    
    $code .= "/;\n";
    $code .=<<'END_CODE';
    foreach my $key ( keys %$err_hash ) {
        if ( grep { /$$err_hash{ $key }->[0]/ } @safe_elements ) {
            # The value of an element whose value shouldn't change app
+ears to be wrong
            sec_log( $$err_hash{ $key }, $key );
        } else {
            # insert error handling code here
        }
    }
}
END_CODE
    $code .= "sub $log_sub {\n";
    $code .=<<'END_CODE';
    my ( $bad_data, $element_name ) = @_;
    # $bad_data is a reference to an anoymous array.
    # [ element type, value, error_message ]
    # Example: [ 'text', '%#$!#$', "You must supply a proper value for
+ 'username'..." ]
    # Might be generated from: <input type="text" name="username">
}
END_CODE
}
##########################
# End 'HERE' doc section #
##########################

# return a safe variable name based upon the element name
sub get_var_name {
    my $var_name  = shift;
    $var_name     = lc $var_name if $lc_names;
    $var_name     =~ s/\W/_/g; # nice and safe variable names
    return $var_name;
}

# escape data for created regexes
sub escape_values {
    # if all values are only one character in length, we'll escape cha
+racters
    # with special meaning in a character class.  Otherwise, we'll esc
+ape all
    # non-word characters.
    my $values = shift;
    my $all_length_of_one = 1;
    foreach ( @$values ) {
        $all_length_of_one = 0, last if length > 1;
    }
    if ( $all_length_of_one ) {
        # We don't want a negated character class!
        if ( $$values[0] eq '^' ) {
            $$values[0] = '\^';
        }
        for ( 1 .. scalar @$values - 1 ) {
            # quote ']', '\', and '-' in character class
            if ( $$values[ $_ ] =~ /[\]\\\-]/ ) {
                $$values[ $_ ] = '\\' . $$values[ $_ ];
            }
        }
    } else {
        @$values = map { quotemeta( $_ ) } @$values if @$values;
    }
    return ( $all_length_of_one, $values );
}

# remove duplicates from array
sub return_unique_items {
    my %seen;
    grep { ! $seen{ $_ }++ } @_;
}

# This is the 'comment' template written at the top of the code
sub template {
    my ( $day, $month, $year ) = (localtime)[3..5];
    $year += 1900;
    my @months = qw/January February March     April   May      June
                    July    August   September October November Decemb
+er/;
    return <<END_HERE;
#
# Program:
# Author:
# Date Created:
# Purpose:
#
# Inputs:
#
# Outputs:
#
# --------------- Maintenance log ------------------------
# Date:        $months[ $month ] $day, $year
# Programmer:  parse_form.pl
# Action:      automatic code generation template
# --------------------------------------------------------
# Date:
# Programmer:
# Action:
# --------------------------------------------------------

END_HERE
}

# If the user config information is bad, kill the script rather than w
+rite bad code
sub validate {
    unless( $taint_pfx =~ /^\w+$/ ) { bad_config( '$taint_pfx' ) };
    unless( $cgi_obj   =~ /^\w+$/ ) { bad_config( '$cgi_obj' ) };
    unless( $err_var   =~ /^\w+$/ ) { bad_config( '$err_var' ) };
    unless( $err_sub   =~ /^\w+$/ ) { bad_config( '$err_sub' ) };
    unless( $log_sub   =~ /^\w+$/ ) { bad_config( '$log_sub' ) };

    if ( $HTML::Parser::VERSION < 3.25 ) {
        warn "\n    You have HTML::Parser version $HTML::Parser::VERSI
+ON, which is out of date.\n".
             "    XHTML may not parse correctly\n".
             "    If you cannot upgrade \$HTML::Parser, you will need 
+to carefully double-check\n".
             "    that all form elements have been included.";
    }
}

# This is where the script is actually killed
sub bad_config {
    my $config_var = shift;
    die "$config_var must only containt letters, numbers, or underscor
+es:\n\t".
        "$config_var =~ /^\\w+\$/ or die;";
}

__END__

=head1 NAME

parse_form.pl Quick and easy form handling code

=head1 USAGE

 parse_form.pl some.html

=head1 DESCRIPTION

One of the most tedious tasks of writing CGI scripts is creating the f
+orm handling routines.
The larger the form, the more tedious this can be.  Typically, this is
+ a series of 'param'
calls followed by a bunch of rather boring untainting expressions.  Th
+is program will read
an HTML document and produce a series of files, one per form in the sc
+ript.  

Each file will be the beginnings of the CGI script used to process eac
+h form in the HTML
document.  Each file has several logical sections:

=over 4

=item Shebang Line

The programmer sets the C<$shebang> variable to point to the Perl inte
+rpreter and add 
necessary switches:

 my $shebang = '/usr/bin/perl -wT';

If C<$shebang> is set to a false value (e.g., the empty string), then 
+the shebang line will
not be included.

=item Header Template

The C<&template> subroutine should be altered to fit the programmer's 
+shop standards.  This 
template is a series of comments added to the beginning of the output 
+files, after the
shebang line, which identify the programmer, author, purpose of the pr
+ogram, etc.

=item C<use> Statements

After the template, it is assumed the the programmer will C<use strict
+> and C<use CGI>.
If the programmer sets the C<$cgi_std> variable to false, then this se
+ction will be:

 use strict;
 use CGI;
 my $q = CGI->new;

The variable name for the CGI object is set with the C<$cgi_obj> varia
+ble.

=item Grabbing Form Data

The next section will be a series of C<CGI::param> calls.  These will 
+grab in all data from
the CGI form the variable name set to whatever value is assigned to C<
+$taint_pfx> followed 
by the C<name> attribute in each CGI element.  To ensure that safe var
+iable names are created,
the C<name> attribute will be parsed through the following regex:

 $var_name = s/\W/_/g;

Here is an example of a text field and the resulting C<param> call (as
+sumes using the
function-oriented version of CGI.pm:

 <input type="text" name="first name">
 
 my $_first_name = param( 'first name' ); # text

Note the the type of form element generating the data is appended as a
+ comment.  If
C<parse_form.pl> determines that more than one value could be for a pa
+rticular name,
it will use an array instead of a scalar:

 <select name="sports" multiple>
   <option>Football
   <option>Basketball</option>
   <option>Other
 </select>

 my @_sports = param( 'sports' ); # select

Note that the HTML is sloppy.  "Basketball" has a closing E<lt>/option
+E<gt>, while the
other options do not.  C<parse_form.pl> does not care.

=item Untainting the data

Untainting the form data is perhaps one of the most onerous tasks and 
+it's an easy one
to skip (whether through false laziness or genuine oversight).  C<pars
+e_for.pl> creates
a series of stub untainting methods.  These B<are not fully functional
+>!!!  Blindly
untainting data is foolish.  Instead, we create generic routines that 
+the programmer will
need to complete.  However, for certain types of form elements, the fo
+rm data should be
relatively simple to untaint.  These are elements for which the data i
+s already specified
and should not change:


=over 4

=item *

hidden

=item *

checkbox

=item *

radio

=item *

select

=item *

submit

=back

For these five items, the program will read the default values and cre
+ate a regular expression
to match this data.  If the default values are all single character, i
+t will use a character class
to match the data.  Otherwise, it will use alternation.  For the C<@_s
+ports> example above, the
following untainting code will be created:

 my @sports; # select values: Football,Basketball,Other
 foreach ( 0 .. $#_sports ) {
     ( $sports[$_] ) = ( $_sports[$_] =~ /^(Football)$/ or 
                         $_sports[$_] =~ /^(Basketball)$/ or 
                         $_sports[$_] =~ /^(Other)$/ )
         or push @{ $errors{ 'sports' } }, "You must supply a proper v
+alue for 'sports'.  Allowed characters are letters, numbers, or punct
+uation.";
 }

The first line is a comment listing the type of form element and any d
+efault values that were
provided by the form and a declaration of the variable to hold the unt
+ainted data.  The subsequent
lines represent the untainting code that's generated..

If you prefer, you could use alternation for the regex:

     ( $sports[$_] ) = ( $_sports[$_] =~ /^(Football|Basketball|Other)
+$/ )

I have elected not to do this as alternation is much less efficient.  
+If you have many values,
particularly with larger forms, this could have a a significant impact
+ on your script's performance.
For simple alternation, however, you I<probably> wouldn't notice much 
+of a speed difference.

It is recommended, however, that you go through the regular expression
+s manually and optimize them.
The untainting code for days of the month might be something like:

 my ( $user_datesuse2_day ) = ( $tainted_user_datesuse_day =~ /^(01)$/
+ or 
                                $tainted_user_datesuse_day =~ /^(02)$/
+ or 
                                $tainted_user_datesuse_day =~ /^(03)$/
+ or 
                                ...
                                $tainted_user_datesuse_day =~ /^(31)$/
+ )

Needless to say, that's a waste!  A quick optimization produces someth
+ing like:

 my ( $user_datesuse_day ) = ( $tainted_user_datesuse_day =~ /^([0-2]\
+d|3[01])$/ )

For form elements other than the five listed above, the code will be s
+lightly different.  This is 
to force the programmer to actually go through and add all of these re
+gexes.  In the C<$_first_name>
example above, the following untainting code will be created:

 # text values:
 my ( $first_name ) = ( "" ), die  # could not auto-create regex #  )

What this does is set C<$first_name> to the empty string and kill the 
+script.  This is to force the
programmer to create the regex manually.  Also, with large forms, it e
+nsures that you are less likely
to overlook the untainting.

=back

=head1 User Configuration

No point in having auto-generated code unless you have some control of
+ the output.  At the beginning
of the program is a 'user config' section.  The following variables ca
+n be changed by the user to
affect the output of the script.

=over 4

=item C<$taint_pfx>

This is the prefix of all variables that need to be untainted.  Must m
+atch /^\w+$/

    my $taint_pfx = 'tainted_'; 

=item C<$cgi_std>

Set this to false to have OO cgi code written out.

    my $cgi_std   = 0;

=item C<$cgi_obj>

If C<$cgi_std> is set to false, use this to specify the variable name 
+of the CGI object 
(e.g. 'q' becomes C<my $q = CGI->new;>).  Must match /^\w+$/

    my $cgi_obj   = 'q';

=item C<$shebang>

This is the shebang line that will be used.   If left blank, it will b
+e skipped.

    my $shebang   = '#!/usr/bin/perl -wT';

=item C<$err_var>

Use this for the name of the hash that will contain the errors generat
+ed by scalars that
do not untaint.  Must match /^\w+$/

    my $err_var   = 'errors';

=item C<$print_err>

Set this to true to have the program print the C<&error> stub.

    my $print_err = 1;

=item C<$err_sub>

Set this to the name of your error handling routine.  Must match /^\w+
+$/

    my $err_sub   = 'error'; 

=item C<$log_sub>

If C<$err_sub> is true, this will be the stub of your security log rou
+tine.  If the form has
been tampered with (i.e. data in C<@safe_types> does not untaint), the
+ use this to log the info.
Must match /^\w+$/

    my $log_sub   = 'sec_log';

=item C<$lc_names>

Set this to true for lower case variable names.  If your forms element
+s that have the same name
except for case, this could cause problems.

    my $lc_names  = 1;

=back

=head1 Sample Input and Output

=head2 Input:  HTML document named test.html

Note the the following document has very poorly-formed HTML.  Not all 
+element attributes are quoted.
The case varies on input elements and whether or not a closing E<lt>/o
+ptionE<gt> tag has been used
varies.

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <html>
     <head>
         <title>This is for a form parsing test</title>
     </head>
     <body>
         <p>I don't think anyone will even <em>look</em> at this docum
+ent.</p>
         This is a line with no tags.
         <form action='C:\Inetpub\wwwroot\garban\cgi-bin\cookie.cgi' m
+ethod=post enctype='multipart/form-data'>
             <input type='hidden' name=somename value="asdf">
             <input type=text name=name value=Ovid size="30" maxsize="
+30">
             <br />
             <br>
             <input type="checkbox" name="group1" value="1" checked />
+ box 1 group 1
             <br>
             <input type="checkbox" NAME="group1" value="2"> box 2 gro
+up 1
             <br>
             <input type="password" name="pass"> Password
             <br />
             <select name="sports">
                 <option>Tiddly winks
                 <option>Mud wrestling</option>
             </select>
             <br>
             <textarea name="test">Some text</textarea>
             <br>
             <input type="radio" name="radio1" value="^" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="^" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="^" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="\" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="]" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="1" checked /> ra
+dio 1 group 1
             <br>
             <input type="radio" NAME="radio1" value="2"> radio 2 grou
+p 1
             <!-- a comment -->
             <br />
             <SELEct name="religions" multiple>
                 <option>Democrat
                 <option>Republican</option>
             </select>
             <textarea name="test">Some text</textarea>
             <br />
             <select name="asdf">
                 <option>Dasdfsdf
                 <option>asfdasdfasdf</option>
             </select>
             <br>
             <select name="sports">
                 <option>Twister
                 <option>Jello wrestling</option>
             </select>
             <br>
             <input type="submit" name=".submit" value="why bother?">
             <br>
             <input type="radio" name="radio1" value="3" checked /> ra
+dio 3 group 1
             <br>
              <input type="radio" NAME="radio1" value="4"> radio 4 gro
+up 1
         </form>
     </body>
 </html>

=head2 Output:  document named test_form_1.cgi

Note that from a 60 line HTML document, we have instantly generated a 
+124 line Perl script.

 #!/usr/bin/perl -wT
 #
 # Program:
 # Author:
 # Date Created:
 # Purpose:
 #
 # Inputs:
 #
 # Outputs:
 #
 # --------------- Maintenance log ------------------------
 # Date:        June 14, 2001
 # Programmer:  parse_form.pl
 # Action:      automatic code generation template
 # --------------------------------------------------------
 # Date:
 # Programmer:
 # Action:
 # --------------------------------------------------------
 
 use strict;
 use CGI;
 my $q = CGI->new;
 my %errors;
 
 # Grab all data
 my $tainted_somename  = $q->param( 'somename' )  || ''; # hidden
 my $tainted_name      = $q->param( 'name' )      || ''; # text
 my @tainted_group1    = $q->param( 'group1' )    || (); # checkbox
 my $tainted_pass      = $q->param( 'pass' )      || ''; # password
 my @tainted_sports    = $q->param( 'sports' )    || (); # select
 my @tainted_test      = $q->param( 'test' )      || (); # textarea
 my $tainted_radio1    = $q->param( 'radio1' )    || ''; # radio
 my @tainted_religions = $q->param( 'religions' ) || (); # select
 my $tainted_asdf      = $q->param( 'asdf' )      || ''; # select
 my $tainted__submit   = $q->param( '.submit' )   || ''; # submit
     
 # The following is just a rough "fill in" template for untainting you
+r data.
 # It will need to be customized to suit your particular needs.  You'l
+l need
 # to create regular expressions to untaint your data and if you skimp
+ on this,
 # it's at your peril!!!
 
 # hidden values: asdf
 my ( $somename ) = ( $tainted_somename =~ /^(asdf)$/ )
         or $errors{ 'somename' } = [ "hidden", \$tainted_somename, "Y
+ou must supply a proper value for 'somename'.  Allowed characters are
+ letters, numbers, or punctuation." ];
 
 # text values: Ovid
 my ( $name ) = ( "" ), die  # could not auto-create regex #  )
         or $errors{ 'name' } = [ "text", \$tainted_name, "You must su
+pply a proper value for 'name'.  Allowed characters are letters, numb
+ers, or punctuation." ];
 
 my @group1; # checkbox values: 1,2
 foreach ( 0 .. $#tainted_group1 ) {
     ( $group1[$_] ) = ( $tainted_group1[$_] =~ /^([12])$/ )
         or $errors{ 'group1' } = [ "checkbox", \@tainted_group1, "You
+ must supply a proper value for 'group1'.  Allowed characters are let
+ters, numbers, or punctuation." ];
 }
 
 # password values: 
 my ( $pass ) = ( "" ), die  # could not auto-create regex #  )
         or $errors{ 'pass' } = [ "password", \$tainted_pass, "You mus
+t supply a proper value for 'pass'.  Allowed characters are letters, 
+numbers, or punctuation." ];
 
 my @sports; # select values: Tiddly winks,Mud wrestling,Twister,Jello
+ wrestling
 foreach ( 0 .. $#tainted_sports ) {
     ( $sports[$_] ) = ( $tainted_sports[$_] =~ /^(Tiddly\ winks)$/ or
+ 
                         $tainted_sports[$_] =~ /^(Mud\ wrestling)$/ o
+r 
                         $tainted_sports[$_] =~ /^(Twister)$/ or 
                         $tainted_sports[$_] =~ /^(Jello\ wrestling)$/
+ )
         or $errors{ 'sports' } = [ "select", \@tainted_sports, "You m
+ust supply a proper value for 'sports'.  Allowed characters are lette
+rs, numbers, or punctuation." ];
 }
 
 my @test; # textarea values: 
 foreach ( 0 .. $#tainted_test ) {
     ( $test[$_] ) = ( "" ), die  # could not auto-create regex #  )
         or $errors{ 'test' } = [ "textarea", \@tainted_test, "You mus
+t supply a proper value for 'test'.  Allowed characters are letters, 
+numbers, or punctuation." ];
 }
 
 # radio values: ^,^,^,\,],1,2,3,4
 my ( $radio1 ) = ( $tainted_radio1 =~ /^([\^\\\]1234])$/ )
         or $errors{ 'radio1' } = [ "radio", \$tainted_radio1, "You mu
+st supply a proper value for 'radio1'.  Allowed characters are letter
+s, numbers, or punctuation." ];
 
 my @religions; # select values: Democrat,Republican
 foreach ( 0 .. $#tainted_religions ) {
     ( $religions[$_] ) = ( $tainted_religions[$_] =~ /^(Democrat)$/ o
+r 
                            $tainted_religions[$_] =~ /^(Republican)$/
+ )
         or $errors{ 'religions' } = [ "select", \@tainted_religions, 
+"You must supply a proper value for 'religions'.  Allowed characters 
+are letters, numbers, or punctuation." ];
 }
 
 # select values: Dasdfsdf,asfdasdfasdf
 my ( $asdf ) = ( $tainted_asdf =~ /^(Dasdfsdf)$/ or 
                  $tainted_asdf =~ /^(asfdasdfasdf)$/ )
         or $errors{ 'asdf' } = [ "select", \$tainted_asdf, "You must 
+supply a proper value for 'asdf'.  Allowed characters are letters, nu
+mbers, or punctuation." ];
 
 # submit values: why bother?
 my ( $_submit ) = ( $tainted__submit =~ /^(why\ bother\?)$/ )
         or $errors{ '_submit' } = [ "submit", \$tainted__submit, "You
+ must supply a proper value for '_submit'.  Allowed characters are le
+tters, numbers, or punctuation." ];
 
 error( \%errors ) if %errors;
 
 sub error {
     my $err_hash = shift;
     # $err_hash is a reference to an anoymous hash.  Keys are form at
+tribute names and values
     # are an anonymous array: [ element type, value, error_message ]
     # Example: 
     # $err_hash = { 
     #       'username' => [ 'text', '????', "You must supply a proper
+ value for 'username'..." ]
     # }
     # Might be generated from: <input type="text" name="username">
     my @safe_elements = qw/ hidden radio select submit /;
     foreach my $key ( keys %$err_hash ) {
         if ( grep { /$$err_hash{ $key }->[0]/ } @safe_elements ) {
             # The value of an element whose value shouldn't change ap
+pears to be wrong
             sec_log( $$err_hash{ $key }, $key );
         } else {
             # insert error handling code here
         }
     }
 }
 sub sec_log {
     my ( $bad_data, $element_name ) = @_;
     # $bad_data is a reference to an anoymous array.
     # [ element type, value, error_message ]
     # Example: [ 'text', '%#$!#$', "You must supply a proper value fo
+r 'username'..." ]
     # Might be generated from: <input type="text" name="username">
 }

=head1 COPYRIGHT

Copyright (c) 2001 Curtis A. Poe.  All rights reserved.
This program is free software; you may redistribute it and/or modify i
+t under 
the same terms as Perl itself.  The author does not warranty this code
+ for any
particular purpose and strongly recommends that the programmer using t
+his code
have a thorough understanding of Perl security issues.  The author acc
+epts
absolutely no responsibility for problems arising from this code.  If 
+you do
not agree to these terms, please do not use this code.

=head1 AUTHOR

Curtis A. Poe <poec@yahoo.com>
Address bug reports and comments to: poec@yahoo.com.  When sending bug
+ reports,
please provide the HTML the program was run against.

=head1 MISCELLANEOUS

This program does not produce production-ready code.  It is merely int
+ended to simplify the
process of creating code to read in form data.  The untainting section
+ will need to be reviewed
carefully and modified to suit the programmer's needs.  Further, no I<
+actual> error handling is
provided in the event that a regex fails to match.  As the needs can v
+ary widely, I have
not tried to implement this.  You have the error stubs that will have 
+an error hash passed.  The
rest is up to you.

Note that code is generated with spaces for indenting instead of tabs.
+  As different editors render
tabs at different widths, I felt it prudent to go for safety, as a spa
+ce is a space is a space.

Use of C<$HTML::Parser::VERSION> which is less than 3.25 may result in
+ incomplete output as older
versions do not always handle XHTML correctly.  It is the programmer's
+ responsibility to verify
that the output of this code matches the programmer's needs.  Naturall
+y, bugs may exist and it's
possible that some form elements may not be properly dealt with by thi
+s code.

The author strongly recommends that you use a validator such as Tidy (
+L<http://www.w3.org/People/Raggett/tidy/>)
to validate your HTML.  C<HTML::Parser> does an excellent job of deali
+ng with 'dirty' HTML, but
improperly formatted documents with overlapping E<lt>formE<gt> element
+s (amongst other things),
will cause unpredictable output.

=cut

Comment on Automatically generate form parsing code
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://88647]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-09-21 18:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (174 votes), past polls