Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Automatically generate form parsing code

by Anonymous Monk
on Jun 15, 2001 at 02:03 UTC ( #88647=sourcecode: print w/ replies, xml ) Need Help??

Category: CGI Programming
Author/Contact Info

This is Ovid posting anonymously. No need for XP for what is essentially a 'double posting'.

Description:

Run this code with an HTML file as the argument. It will create a series of files with one file created for each form in the HTML document(try it against a perlmonks page, you might be surprised). Each document will contain complete form-handling code to read in the data and even has generic untainting routines.

Read the POD for full documentation on its use. Very easy to use and moderately customizable.

#!/usr/bin/perl -w
require 5.004;
use strict;
use CGI;
use File::Basename;
use HTML::TokeParser;
    
my $doc      = shift or &usage;
my $basename = basename( $doc );
   $basename =~ s/\..*$//;
my $p        = HTML::TokeParser->new($doc) || die "Can't open: $!";
my $formnum  = 1;

######--------------------------------------######
#       Begin user config section                #
######--------------------------------------######

# The following variables should be set by the user to control        
+                
# the output of the code generator
#
# The only thing you *need* to set it the shebang line.  The other opt
+ions
# are just for configuration.
#
# The following variable MUST only containt letters, numbers, or under
+scores:
#   qw/ $taint_pfx $cgi_obj $err_var $err_sub $log_sub /
# If you slip up, the script will die rather than produce bad code.

# This is the prefix of all variables that need to be untainted.
# should be letters, numbers, or underscores.
my $taint_pfx = 'tainted_'; 

# set this to false to have OO cgi code written out.
my $cgi_std   = 0;

# if $cgi_std is set to false, use this to specify the variable
# name of the CGI object (e.g. 'q' becomes 'my $q = CGI->new;')
my $cgi_obj   = 'q';

# This is the shebang line that will be used.   
# If left blank, it will be skipped.
my $shebang   = '#!/usr/bin/perl -wT';

# use this for the name of the hash that contains the errors 
my $err_var   = 'errors';

# set this to true to have the program print the &error stub
my $print_err = 1;

# set this to the name of your error handling routine
my $err_sub   = 'error'; 

# if $err_sub is true, this will be the stub of your security
# log routine.  If the form has been tampered with (i.e. data
# in @safe_types does not untaint), the use this to log the info.
my $log_sub   = 'sec_log';

# set this to true for lower case variable names
# If your forms elements that have the same name except
# for case, this could cause problems.
my $lc_names  = 1;

######--------------------------------------######
#       End user config section                  #
######--------------------------------------######

# These are the form elements for which we can *safely* create regular
+ expressions
# for untainting.  Do not change this array unless you know what you a
+re doing.
my @safe_types = qw/ hidden checkbox radio select submit /;

my %element;       # holds all of the form element types, names, and v
+alues
my %select;        # holds select form elements so we know if we've se
+en them
my @element_order; # order that elements appear in the form.  It's not
+ really
                   # needed, but we do this so that variables in gener
+ated code
                   # appear roughly in the same order as the form.
my $select_token;  # holds the select token when parsing <select> valu
+es


# walk through document and get each tag
# we're building a list of form elements here
while (my $token = $p->get_tag) {
    my $tag = $token->[0];
    if ( my $form_pos = ( $tag eq 'form' .. $tag eq '/form' ) ) {

        # Oh!  We're in a form.  Start looking for stuff.
        if ( $form_pos != 1 and substr( $form_pos, -2 ) ne 'E0' ) {
            add_input_element( $token )  if $tag eq 'input';

            # <select> is a pain, so we need to handle it differently
            if ( my $select_pos = ( $tag eq 'select' .. $tag eq '/sele
+ct' ) ) {
                $select_token = $token if $tag eq 'select';
                if ( $select_pos != 1 and substr( $select_pos, -2 ) ne
+ 'E0' ) {
                    add_select_element( $token, $p, $select_token ) if
+ $tag eq 'option';
                } elsif ( substr( $select_pos, -2 ) eq 'E0' ) {
                    # we've finished the <select>, so add it to
                    # %select so we knows we've seen it.
                    $select{ $select_token->[1]->{ 'name' } } = '';
                }
            } # end if (select)

            foreach ( qw/ textarea button / ) {
                add_generic_element( $token ) if $tag eq $_;
            }
        } elsif ( substr( $form_pos, -2 ) eq 'E0' ) {
            # we've finished the form, so let's write the document, cl
+ear the vars,
            # and start looking for more forms.
            &write_template;
            %element       = ();
            @element_order = ();
            $formnum++;
        }
    } # end if (form)
}

# This is to extract necessary data from form element
# that will later be added to %element
sub add_generic_element {
    my $token = shift;
    my $tag   = $token->[0];
    my $name  = $token->[1]->{ 'name' }  || '';
    my $value = $token->[1]->{ 'value' } || '';

    # I don't want to pass 'text' defaults as they can often be huge
    if ( $tag eq 'textarea' ) {
        $value = '';
    }
    update_element_hash( $name, $tag, $value, 1 );    
}

# <select> needs to be handled different.
sub add_select_element {
    my ( $token, $p, $select_token ) = @_;

    my $name  = $select_token->[1]->{ 'name' } || '';
    my $value = $token->[1]->{ 'value' }       || '';
    
    # The following is because an <option> with a select will
    # assume the value of the text if no value is specified
    if ( ! $value ) {
        $value = $p->get_trimmed_text;
    }
    
    # If <select> has the 'multiple' attribute or we've previously
    # encountered a <select> with the same name, set multiple to true
    my $multiple = exists $select_token->[1]->{ 'multiple' } ? 1 :
                   exists $select{ $name }                   ? 1 :
                                                               0 ;
    update_element_hash( $name, 'select', $value, $multiple );
}

# Need to format the data from the input element
sub add_input_element {
    my $token = shift;
    my $type  = $token->[1]->{ 'type' }  || '';
    my $name  = $token->[1]->{ 'name' }  || '';
    my $value = $token->[1]->{ 'value' } || '';
    my $multiple = $type eq 'radio' ? 0 : 1;
    update_element_hash( $name, $type, $value, $multiple );
}

# here's where the formatted form element data is actually added
# to the element hash.
sub update_element_hash {
    my ( $name, $type, $value, $multiple ) = @_;
    return if $type eq 'reset'; # This only affects the form, not the 
+script
    $value =~ s/\n/\\n/g;
    if ( $name ) {
        if ( ! exists $element{ $name } ) {
            $element{ $name }{ 'multiple' } = 0;
            $element{ $name }{ 'type' }     = $type;
            $element{ $name }{ 'value' }    = $value ? [ $value ] : []
+;
            push @element_order, $name;
        } else {
            $element{ $name }{ 'multiple' } += $multiple;
            push @{ $element{ $name }{ 'value' } }, $value if $value;
        }
    }
}
    
# Duh!
sub usage {
    print <<"    END_HERE";
    Usage:  formparse.pl some.html
    END_HERE
    exit;
}

# Oh, goody!  We actually get to start writing the code :)
sub write_template {
    my $filename = "${basename}_form_${formnum}.cgi";
    my $cgi_var = $cgi_std ? '' : "\$${cgi_obj}->";
    my $cgi_line = 'use CGI';
    $cgi_line   .= $cgi_std ? " qw/:standard/;\n" : ";\n";

    my $max_var_length = 1;
    foreach ( keys %element ) {
        $max_var_length = length if length > $max_var_length;
    }
    
    open OUT, "> $filename" or die "Can't open $filename for writing: 
+$!";

    print OUT $shebang."\n" if $shebang;
    print OUT &template;
    print OUT "use strict;\n";
    print OUT $cgi_line;

    if ( ! $cgi_std ) {
        print OUT "my \$$cgi_obj = CGI->new;\n";
    }
    print OUT "my \%$err_var;\n";
    print OUT "\n# Grab all data\n";

    # Here's where we print param() calls
    foreach my $element ( @element_order ) {
        my $var_name  = get_var_name( $element );
        my $data_type = $element{ $element }{ 'multiple' } ? '@'  : '$
+';
        my $default   = $element{ $element }{ 'multiple' } ? '()' : "'
+'";
        print OUT qq/my ${data_type}${taint_pfx}${var_name} / .
                  ' ' x ( $max_var_length - length $var_name ) .
                  qq/= ${cgi_var}param( '$element' )/ .
                  ' ' x ( $max_var_length - length $var_name ) .
                  qq/ || $default; # $element{ $element }{ 'type' }\n/
+;
    }
    print OUT <<"    END_HERE";
    
# The following is just a rough "fill in" template for untainting your
+ data.
# It will need to be customized to suit your particular needs.  You'll
+ need
# to create regular expressions to untaint your data and if you skimp 
+on this,
# it's at your peril!!!

    END_HERE
    # here's where we print the untainting template
    foreach my $element ( @element_order ) {
        my $var_name  = get_var_name( $element );
        my $type      = $element{ $element }{ 'type' };
        my $multiple  = $element{ $element }{ 'multiple' };

        if ( $multiple ) {
            # Ooh, multiple values.  Need to untaint an array.
            print OUT qq!my \@${var_name}; # $type values: ! . join( "
+,", @{ $element{ $element }{ 'value' } } ) . "\n";
            print OUT "foreach ( 0 .. \$#${taint_pfx}${var_name} ) {\n
+".
                  qq!    ( \$${var_name}\[\$_] ) ! .
                  untainting_code( $element, $var_name, $multiple, $ty
+pe, '[$_]' ) .
                  "}\n\n";
        } else {
            # Untainting a scalar.
            print OUT qq!# $type values: ! . join( ",", @{ $element{ $
+element }{ 'value' } } ) . "\n";
            print OUT qq!my ( \$${var_name} ) ! .
                      untainting_code( $element, $var_name, $multiple,
+ $type, '' ) . "\n";
        }
    }
    
    print OUT err_stub() if $print_err;
    close OUT or die "Can't close $filename: $!";
}

# return the code that will actually go in the untainting routine
sub untainting_code {
    my ( $element, $var_name, $multiple, $type, $index ) = @_;
    my $var_prefix = $multiple ? '@' : '$';
    return qq!= ( ! . untainting_regexes( $element, $var_name, $multip
+le ) . " )\n" .
           qq!        or \$${err_var}\{ '$var_name' } = [ "$type", \\$
+{var_prefix}${taint_pfx}${var_name}, "You must supply a proper value 
+for '$var_name'.  Allowed characters are letters, numbers, or punctua
+tion." ];\n!;    
}

# need to create the regexes for untainting.
sub untainting_regexes {
    my ( $element, $var_name, $multiple ) = @_;
    my $type    = $element{ $element }{ 'type' };
    my $code;

    # Can't create a safe regex, so we insert code to kill the script 
+unless
    # the programmer creates his or her own regex
    $code =  '"" ), die  # could not auto-create regex # ';
    return $code if ! grep { /$type/ } @safe_types;

    my @values  = @{ $element{ $element }{ 'value' } };
    @values = return_unique_items( @values ) if @values;
    if ( scalar @values > 1 ) {
        my ( $all_length_of_one, $formatted_values ) = escape_values( 
+\@values );

        my $array_index = $multiple ? '[$_]' : '';
        
        if ( $all_length_of_one ) {
            # we're returning a character class
            $code = qq!\$${taint_pfx}${var_name}$array_index =~ /^(! .
+ '['. join( '', @$formatted_values ) .'])$/';
        } else {
            # we have multiple values that a character class is not su
+itable for,
            # so we return separate regex tests for each value
            #my $offset = $multiple ? length( $taint_pfx ) + 17 : leng
+th( $taint_pfx ) + 12;
            my $offset = $multiple ? 18 : 13;
            $code = qq!\$${taint_pfx}${var_name}$array_index =~ /^(! .
+ $$formatted_values[0] . ")\$/ or \n";
            for ( 1 .. scalar @$formatted_values - 1 ) {
                $code .= ' ' x ( $offset + length $var_name );
                $code .= qq!\$${taint_pfx}${var_name}$array_index =~ /
+^(! . $values[$_] . ")\$/";
                $code .= " or \n" if $_ < scalar @$formatted_values - 
+1;
            }
        }
    } else {
        $code = qq!\$${taint_pfx}${var_name} =~ /^(! . quotemeta( $val
+ues[0] ) . ')$/';
    }
    return $code;
}

###############################################################
# if you want to maintain this, please note that this appears #
# to have subs within subs.  Look closer:  HERE docs.  Rather #
# tricky.                                                     #
###############################################################

sub err_stub {
    my $code =<<END_CODE;
$err_sub( \\\%$err_var ) if \%$err_var;

sub $err_sub {
    my \$err_hash = shift;
    # \$err_hash is a reference to an anoymous hash.  Keys are form at
+tribute names and values
    # are an anonymous array: [ element type, value, error_message ]
    # Example: 
    # \$err_hash = { 
    #       'username' => [ 'text', '????', "You must supply a proper 
+value for 'username'..." ]
    # }
    # Might be generated from: <input type="text" name="username">
END_CODE
    $code .= "    my \@safe_elements = qw/ ";

    foreach ( @safe_types ) {
        $code .= "$_ " if $_ ne 'checkbox'; # skip checkbox as it's po
+ssible for no data to be sent
    }    
    $code .= "/;\n";
    $code .=<<'END_CODE';
    foreach my $key ( keys %$err_hash ) {
        if ( grep { /$$err_hash{ $key }->[0]/ } @safe_elements ) {
            # The value of an element whose value shouldn't change app
+ears to be wrong
            sec_log( $$err_hash{ $key }, $key );
        } else {
            # insert error handling code here
        }
    }
}
END_CODE
    $code .= "sub $log_sub {\n";
    $code .=<<'END_CODE';
    my ( $bad_data, $element_name ) = @_;
    # $bad_data is a reference to an anoymous array.
    # [ element type, value, error_message ]
    # Example: [ 'text', '%#$!#$', "You must supply a proper value for
+ 'username'..." ]
    # Might be generated from: <input type="text" name="username">
}
END_CODE
}
##########################
# End 'HERE' doc section #
##########################

# return a safe variable name based upon the element name
sub get_var_name {
    my $var_name  = shift;
    $var_name     = lc $var_name if $lc_names;
    $var_name     =~ s/\W/_/g; # nice and safe variable names
    return $var_name;
}

# escape data for created regexes
sub escape_values {
    # if all values are only one character in length, we'll escape cha
+racters
    # with special meaning in a character class.  Otherwise, we'll esc
+ape all
    # non-word characters.
    my $values = shift;
    my $all_length_of_one = 1;
    foreach ( @$values ) {
        $all_length_of_one = 0, last if length > 1;
    }
    if ( $all_length_of_one ) {
        # We don't want a negated character class!
        if ( $$values[0] eq '^' ) {
            $$values[0] = '\^';
        }
        for ( 1 .. scalar @$values - 1 ) {
            # quote ']', '\', and '-' in character class
            if ( $$values[ $_ ] =~ /[\]\\\-]/ ) {
                $$values[ $_ ] = '\\' . $$values[ $_ ];
            }
        }
    } else {
        @$values = map { quotemeta( $_ ) } @$values if @$values;
    }
    return ( $all_length_of_one, $values );
}

# remove duplicates from array
sub return_unique_items {
    my %seen;
    grep { ! $seen{ $_ }++ } @_;
}

# This is the 'comment' template written at the top of the code
sub template {
    my ( $day, $month, $year ) = (localtime)[3..5];
    $year += 1900;
    my @months = qw/January February March     April   May      June
                    July    August   September October November Decemb
+er/;
    return <<END_HERE;
#
# Program:
# Author:
# Date Created:
# Purpose:
#
# Inputs:
#
# Outputs:
#
# --------------- Maintenance log ------------------------
# Date:        $months[ $month ] $day, $year
# Programmer:  parse_form.pl
# Action:      automatic code generation template
# --------------------------------------------------------
# Date:
# Programmer:
# Action:
# --------------------------------------------------------

END_HERE
}

# If the user config information is bad, kill the script rather than w
+rite bad code
sub validate {
    unless( $taint_pfx =~ /^\w+$/ ) { bad_config( '$taint_pfx' ) };
    unless( $cgi_obj   =~ /^\w+$/ ) { bad_config( '$cgi_obj' ) };
    unless( $err_var   =~ /^\w+$/ ) { bad_config( '$err_var' ) };
    unless( $err_sub   =~ /^\w+$/ ) { bad_config( '$err_sub' ) };
    unless( $log_sub   =~ /^\w+$/ ) { bad_config( '$log_sub' ) };

    if ( $HTML::Parser::VERSION < 3.25 ) {
        warn "\n    You have HTML::Parser version $HTML::Parser::VERSI
+ON, which is out of date.\n".
             "    XHTML may not parse correctly\n".
             "    If you cannot upgrade \$HTML::Parser, you will need 
+to carefully double-check\n".
             "    that all form elements have been included.";
    }
}

# This is where the script is actually killed
sub bad_config {
    my $config_var = shift;
    die "$config_var must only containt letters, numbers, or underscor
+es:\n\t".
        "$config_var =~ /^\\w+\$/ or die;";
}

__END__

=head1 NAME

parse_form.pl Quick and easy form handling code

=head1 USAGE

 parse_form.pl some.html

=head1 DESCRIPTION

One of the most tedious tasks of writing CGI scripts is creating the f
+orm handling routines.
The larger the form, the more tedious this can be.  Typically, this is
+ a series of 'param'
calls followed by a bunch of rather boring untainting expressions.  Th
+is program will read
an HTML document and produce a series of files, one per form in the sc
+ript.  

Each file will be the beginnings of the CGI script used to process eac
+h form in the HTML
document.  Each file has several logical sections:

=over 4

=item Shebang Line

The programmer sets the C<$shebang> variable to point to the Perl inte
+rpreter and add 
necessary switches:

 my $shebang = '/usr/bin/perl -wT';

If C<$shebang> is set to a false value (e.g., the empty string), then 
+the shebang line will
not be included.

=item Header Template

The C<&template> subroutine should be altered to fit the programmer's 
+shop standards.  This 
template is a series of comments added to the beginning of the output 
+files, after the
shebang line, which identify the programmer, author, purpose of the pr
+ogram, etc.

=item C<use> Statements

After the template, it is assumed the the programmer will C<use strict
+> and C<use CGI>.
If the programmer sets the C<$cgi_std> variable to false, then this se
+ction will be:

 use strict;
 use CGI;
 my $q = CGI->new;

The variable name for the CGI object is set with the C<$cgi_obj> varia
+ble.

=item Grabbing Form Data

The next section will be a series of C<CGI::param> calls.  These will 
+grab in all data from
the CGI form the variable name set to whatever value is assigned to C<
+$taint_pfx> followed 
by the C<name> attribute in each CGI element.  To ensure that safe var
+iable names are created,
the C<name> attribute will be parsed through the following regex:

 $var_name = s/\W/_/g;

Here is an example of a text field and the resulting C<param> call (as
+sumes using the
function-oriented version of CGI.pm:

 <input type="text" name="first name">
 
 my $_first_name = param( 'first name' ); # text

Note the the type of form element generating the data is appended as a
+ comment.  If
C<parse_form.pl> determines that more than one value could be for a pa
+rticular name,
it will use an array instead of a scalar:

 <select name="sports" multiple>
   <option>Football
   <option>Basketball</option>
   <option>Other
 </select>

 my @_sports = param( 'sports' ); # select

Note that the HTML is sloppy.  "Basketball" has a closing E<lt>/option
+E<gt>, while the
other options do not.  C<parse_form.pl> does not care.

=item Untainting the data

Untainting the form data is perhaps one of the most onerous tasks and 
+it's an easy one
to skip (whether through false laziness or genuine oversight).  C<pars
+e_for.pl> creates
a series of stub untainting methods.  These B<are not fully functional
+>!!!  Blindly
untainting data is foolish.  Instead, we create generic routines that 
+the programmer will
need to complete.  However, for certain types of form elements, the fo
+rm data should be
relatively simple to untaint.  These are elements for which the data i
+s already specified
and should not change:


=over 4

=item *

hidden

=item *

checkbox

=item *

radio

=item *

select

=item *

submit

=back

For these five items, the program will read the default values and cre
+ate a regular expression
to match this data.  If the default values are all single character, i
+t will use a character class
to match the data.  Otherwise, it will use alternation.  For the C<@_s
+ports> example above, the
following untainting code will be created:

 my @sports; # select values: Football,Basketball,Other
 foreach ( 0 .. $#_sports ) {
     ( $sports[$_] ) = ( $_sports[$_] =~ /^(Football)$/ or 
                         $_sports[$_] =~ /^(Basketball)$/ or 
                         $_sports[$_] =~ /^(Other)$/ )
         or push @{ $errors{ 'sports' } }, "You must supply a proper v
+alue for 'sports'.  Allowed characters are letters, numbers, or punct
+uation.";
 }

The first line is a comment listing the type of form element and any d
+efault values that were
provided by the form and a declaration of the variable to hold the unt
+ainted data.  The subsequent
lines represent the untainting code that's generated..

If you prefer, you could use alternation for the regex:

     ( $sports[$_] ) = ( $_sports[$_] =~ /^(Football|Basketball|Other)
+$/ )

I have elected not to do this as alternation is much less efficient.  
+If you have many values,
particularly with larger forms, this could have a a significant impact
+ on your script's performance.
For simple alternation, however, you I<probably> wouldn't notice much 
+of a speed difference.

It is recommended, however, that you go through the regular expression
+s manually and optimize them.
The untainting code for days of the month might be something like:

 my ( $user_datesuse2_day ) = ( $tainted_user_datesuse_day =~ /^(01)$/
+ or 
                                $tainted_user_datesuse_day =~ /^(02)$/
+ or 
                                $tainted_user_datesuse_day =~ /^(03)$/
+ or 
                                ...
                                $tainted_user_datesuse_day =~ /^(31)$/
+ )

Needless to say, that's a waste!  A quick optimization produces someth
+ing like:

 my ( $user_datesuse_day ) = ( $tainted_user_datesuse_day =~ /^([0-2]\
+d|3[01])$/ )

For form elements other than the five listed above, the code will be s
+lightly different.  This is 
to force the programmer to actually go through and add all of these re
+gexes.  In the C<$_first_name>
example above, the following untainting code will be created:

 # text values:
 my ( $first_name ) = ( "" ), die  # could not auto-create regex #  )

What this does is set C<$first_name> to the empty string and kill the 
+script.  This is to force the
programmer to create the regex manually.  Also, with large forms, it e
+nsures that you are less likely
to overlook the untainting.

=back

=head1 User Configuration

No point in having auto-generated code unless you have some control of
+ the output.  At the beginning
of the program is a 'user config' section.  The following variables ca
+n be changed by the user to
affect the output of the script.

=over 4

=item C<$taint_pfx>

This is the prefix of all variables that need to be untainted.  Must m
+atch /^\w+$/

    my $taint_pfx = 'tainted_'; 

=item C<$cgi_std>

Set this to false to have OO cgi code written out.

    my $cgi_std   = 0;

=item C<$cgi_obj>

If C<$cgi_std> is set to false, use this to specify the variable name 
+of the CGI object 
(e.g. 'q' becomes C<my $q = CGI->new;>).  Must match /^\w+$/

    my $cgi_obj   = 'q';

=item C<$shebang>

This is the shebang line that will be used.   If left blank, it will b
+e skipped.

    my $shebang   = '#!/usr/bin/perl -wT';

=item C<$err_var>

Use this for the name of the hash that will contain the errors generat
+ed by scalars that
do not untaint.  Must match /^\w+$/

    my $err_var   = 'errors';

=item C<$print_err>

Set this to true to have the program print the C<&error> stub.

    my $print_err = 1;

=item C<$err_sub>

Set this to the name of your error handling routine.  Must match /^\w+
+$/

    my $err_sub   = 'error'; 

=item C<$log_sub>

If C<$err_sub> is true, this will be the stub of your security log rou
+tine.  If the form has
been tampered with (i.e. data in C<@safe_types> does not untaint), the
+ use this to log the info.
Must match /^\w+$/

    my $log_sub   = 'sec_log';

=item C<$lc_names>

Set this to true for lower case variable names.  If your forms element
+s that have the same name
except for case, this could cause problems.

    my $lc_names  = 1;

=back

=head1 Sample Input and Output

=head2 Input:  HTML document named test.html

Note the the following document has very poorly-formed HTML.  Not all 
+element attributes are quoted.
The case varies on input elements and whether or not a closing E<lt>/o
+ptionE<gt> tag has been used
varies.

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <html>
     <head>
         <title>This is for a form parsing test</title>
     </head>
     <body>
         <p>I don't think anyone will even <em>look</em> at this docum
+ent.</p>
         This is a line with no tags.
         <form action='C:\Inetpub\wwwroot\garban\cgi-bin\cookie.cgi' m
+ethod=post enctype='multipart/form-data'>
             <input type='hidden' name=somename value="asdf">
             <input type=text name=name value=Ovid size="30" maxsize="
+30">
             <br />
             <br>
             <input type="checkbox" name="group1" value="1" checked />
+ box 1 group 1
             <br>
             <input type="checkbox" NAME="group1" value="2"> box 2 gro
+up 1
             <br>
             <input type="password" name="pass"> Password
             <br />
             <select name="sports">
                 <option>Tiddly winks
                 <option>Mud wrestling</option>
             </select>
             <br>
             <textarea name="test">Some text</textarea>
             <br>
             <input type="radio" name="radio1" value="^" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="^" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="^" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="\" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="]" checked /> ra
+dio 1 group 1
             <input type="radio" name="radio1" value="1" checked /> ra
+dio 1 group 1
             <br>
             <input type="radio" NAME="radio1" value="2"> radio 2 grou
+p 1
             <!-- a comment -->
             <br />
             <SELEct name="religions" multiple>
                 <option>Democrat
                 <option>Republican</option>
             </select>
             <textarea name="test">Some text</textarea>
             <br />
             <select name="asdf">
                 <option>Dasdfsdf
                 <option>asfdasdfasdf</option>
             </select>
             <br>
             <select name="sports">
                 <option>Twister
                 <option>Jello wrestling</option>
             </select>
             <br>
             <input type="submit" name=".submit" value="why bother?">
             <br>
             <input type="radio" name="radio1" value="3" checked /> ra
+dio 3 group 1
             <br>
              <input type="radio" NAME="radio1" value="4"> radio 4 gro
+up 1
         </form>
     </body>
 </html>

=head2 Output:  document named test_form_1.cgi

Note that from a 60 line HTML document, we have instantly generated a 
+124 line Perl script.

 #!/usr/bin/perl -wT
 #
 # Program:
 # Author:
 # Date Created:
 # Purpose:
 #
 # Inputs:
 #
 # Outputs:
 #
 # --------------- Maintenance log ------------------------
 # Date:        June 14, 2001
 # Programmer:  parse_form.pl
 # Action:      automatic code generation template
 # --------------------------------------------------------
 # Date:
 # Programmer:
 # Action:
 # --------------------------------------------------------
 
 use strict;
 use CGI;
 my $q = CGI->new;
 my %errors;
 
 # Grab all data
 my $tainted_somename  = $q->param( 'somename' )  || ''; # hidden
 my $tainted_name      = $q->param( 'name' )      || ''; # text
 my @tainted_group1    = $q->param( 'group1' )    || (); # checkbox
 my $tainted_pass      = $q->param( 'pass' )      || ''; # password
 my @tainted_sports    = $q->param( 'sports' )    || (); # select
 my @tainted_test      = $q->param( 'test' )      || (); # textarea
 my $tainted_radio1    = $q->param( 'radio1' )    || ''; # radio
 my @tainted_religions = $q->param( 'religions' ) || (); # select
 my $tainted_asdf      = $q->param( 'asdf' )      || ''; # select
 my $tainted__submit   = $q->param( '.submit' )   || ''; # submit
     
 # The following is just a rough "fill in" template for untainting you
+r data.
 # It will need to be customized to suit your particular needs.  You'l
+l need
 # to create regular expressions to untaint your data and if you skimp
+ on this,
 # it's at your peril!!!
 
 # hidden values: asdf
 my ( $somename ) = ( $tainted_somename =~ /^(asdf)$/ )
         or $errors{ 'somename' } = [ "hidden", \$tainted_somename, "Y
+ou must supply a proper value for 'somename'.  Allowed characters are
+ letters, numbers, or punctuation." ];
 
 # text values: Ovid
 my ( $name ) = ( "" ), die  # could not auto-create regex #  )
         or $errors{ 'name' } = [ "text", \$tainted_name, "You must su
+pply a proper value for 'name'.  Allowed characters are letters, numb
+ers, or punctuation." ];
 
 my @group1; # checkbox values: 1,2
 foreach ( 0 .. $#tainted_group1 ) {
     ( $group1[$_] ) = ( $tainted_group1[$_] =~ /^([12])$/ )
         or $errors{ 'group1' } = [ "checkbox", \@tainted_group1, "You
+ must supply a proper value for 'group1'.  Allowed characters are let
+ters, numbers, or punctuation." ];
 }
 
 # password values: 
 my ( $pass ) = ( "" ), die  # could not auto-create regex #  )
         or $errors{ 'pass' } = [ "password", \$tainted_pass, "You mus
+t supply a proper value for 'pass'.  Allowed characters are letters, 
+numbers, or punctuation." ];
 
 my @sports; # select values: Tiddly winks,Mud wrestling,Twister,Jello
+ wrestling
 foreach ( 0 .. $#tainted_sports ) {
     ( $sports[$_] ) = ( $tainted_sports[$_] =~ /^(Tiddly\ winks)$/ or
+ 
                         $tainted_sports[$_] =~ /^(Mud\ wrestling)$/ o
+r 
                         $tainted_sports[$_] =~ /^(Twister)$/ or 
                         $tainted_sports[$_] =~ /^(Jello\ wrestling)$/
+ )
         or $errors{ 'sports' } = [ "select", \@tainted_sports, "You m
+ust supply a proper value for 'sports'.  Allowed characters are lette
+rs, numbers, or punctuation." ];
 }
 
 my @test; # textarea values: 
 foreach ( 0 .. $#tainted_test ) {
     ( $test[$_] ) = ( "" ), die  # could not auto-create regex #  )
         or $errors{ 'test' } = [ "textarea", \@tainted_test, "You mus
+t supply a proper value for 'test'.  Allowed characters are letters, 
+numbers, or punctuation." ];
 }
 
 # radio values: ^,^,^,\,],1,2,3,4
 my ( $radio1 ) = ( $tainted_radio1 =~ /^([\^\\\]1234])$/ )
         or $errors{ 'radio1' } = [ "radio", \$tainted_radio1, "You mu
+st supply a proper value for 'radio1'.  Allowed characters are letter
+s, numbers, or punctuation." ];
 
 my @religions; # select values: Democrat,Republican
 foreach ( 0 .. $#tainted_religions ) {
     ( $religions[$_] ) = ( $tainted_religions[$_] =~ /^(Democrat)$/ o
+r 
                            $tainted_religions[$_] =~ /^(Republican)$/
+ )
         or $errors{ 'religions' } = [ "select", \@tainted_religions, 
+"You must supply a proper value for 'religions'.  Allowed characters 
+are letters, numbers, or punctuation." ];
 }
 
 # select values: Dasdfsdf,asfdasdfasdf
 my ( $asdf ) = ( $tainted_asdf =~ /^(Dasdfsdf)$/ or 
                  $tainted_asdf =~ /^(asfdasdfasdf)$/ )
         or $errors{ 'asdf' } = [ "select", \$tainted_asdf, "You must 
+supply a proper value for 'asdf'.  Allowed characters are letters, nu
+mbers, or punctuation." ];
 
 # submit values: why bother?
 my ( $_submit ) = ( $tainted__submit =~ /^(why\ bother\?)$/ )
         or $errors{ '_submit' } = [ "submit", \$tainted__submit, "You
+ must supply a proper value for '_submit'.  Allowed characters are le
+tters, numbers, or punctuation." ];
 
 error( \%errors ) if %errors;
 
 sub error {
     my $err_hash = shift;
     # $err_hash is a reference to an anoymous hash.  Keys are form at
+tribute names and values
     # are an anonymous array: [ element type, value, error_message ]
     # Example: 
     # $err_hash = { 
     #       'username' => [ 'text', '????', "You must supply a proper
+ value for 'username'..." ]
     # }
     # Might be generated from: <input type="text" name="username">
     my @safe_elements = qw/ hidden radio select submit /;
     foreach my $key ( keys %$err_hash ) {
         if ( grep { /$$err_hash{ $key }->[0]/ } @safe_elements ) {
             # The value of an element whose value shouldn't change ap
+pears to be wrong
             sec_log( $$err_hash{ $key }, $key );
         } else {
             # insert error handling code here
         }
     }
 }
 sub sec_log {
     my ( $bad_data, $element_name ) = @_;
     # $bad_data is a reference to an anoymous array.
     # [ element type, value, error_message ]
     # Example: [ 'text', '%#$!#$', "You must supply a proper value fo
+r 'username'..." ]
     # Might be generated from: <input type="text" name="username">
 }

=head1 COPYRIGHT

Copyright (c) 2001 Curtis A. Poe.  All rights reserved.
This program is free software; you may redistribute it and/or modify i
+t under 
the same terms as Perl itself.  The author does not warranty this code
+ for any
particular purpose and strongly recommends that the programmer using t
+his code
have a thorough understanding of Perl security issues.  The author acc
+epts
absolutely no responsibility for problems arising from this code.  If 
+you do
not agree to these terms, please do not use this code.

=head1 AUTHOR

Curtis A. Poe <poec@yahoo.com>
Address bug reports and comments to: poec@yahoo.com.  When sending bug
+ reports,
please provide the HTML the program was run against.

=head1 MISCELLANEOUS

This program does not produce production-ready code.  It is merely int
+ended to simplify the
process of creating code to read in form data.  The untainting section
+ will need to be reviewed
carefully and modified to suit the programmer's needs.  Further, no I<
+actual> error handling is
provided in the event that a regex fails to match.  As the needs can v
+ary widely, I have
not tried to implement this.  You have the error stubs that will have 
+an error hash passed.  The
rest is up to you.

Note that code is generated with spaces for indenting instead of tabs.
+  As different editors render
tabs at different widths, I felt it prudent to go for safety, as a spa
+ce is a space is a space.

Use of C<$HTML::Parser::VERSION> which is less than 3.25 may result in
+ incomplete output as older
versions do not always handle XHTML correctly.  It is the programmer's
+ responsibility to verify
that the output of this code matches the programmer's needs.  Naturall
+y, bugs may exist and it's
possible that some form elements may not be properly dealt with by thi
+s code.

The author strongly recommends that you use a validator such as Tidy (
+L<http://www.w3.org/People/Raggett/tidy/>)
to validate your HTML.  C<HTML::Parser> does an excellent job of deali
+ng with 'dirty' HTML, but
improperly formatted documents with overlapping E<lt>formE<gt> element
+s (amongst other things),
will cause unpredictable output.

=cut

Comment on Automatically generate form parsing code
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://88647]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-07-13 21:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (252 votes), past polls