Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Code Generating Scripts - Long Post

by Ovid (Cardinal)
on Jun 08, 2001 at 05:21 UTC ( #86815=perlmeditation: print w/ replies, xml ) Need Help??

As some of you may recall from this node, I was looking for a program that automatically generated code to process form-data. Various good suggestions were made and I wound up "rolling my own." One monk and I had a lengthy, private, Chatterbox discussion about this and he expressed some concerns about the way I was approaching it, particularly the "untainting" section. As a result, I am hoping you can review my code.

What it does is analyze a given HTML document and write out script templates for the form handling. Every form will get written to a separate file. The first section is a series of param calls to get the data and the second section is a rough "untainting" section. It's not "plug-n-play". You'll have to add the untainting regexes yourself, but it should save plenty of time when coding large forms. I processed it against one of our larger forms and it produced a 17K Perl script instantly. Needless to say, coding that script by hand would have been tedious and error-prone.

Here's the program (beta!):

#!/usr/bin/perl -w require 5.004; use strict; use CGI; use File::Basename; use HTML::TokeParser; my $doc = shift or &usage; my $basename = basename( $doc ); $basename =~ s/\..*$//; my $p = HTML::TokeParser->new($doc) || die "Can't open: $!"; my $formnum = 1; ###################################################################### +################# # The following variables should be set by the user to control + # # the output of the code generator + # # + # my $cgi_std = 0; # set this to false to have OO cgi code written out. + # my $cgi_obj = 'q'; # if $cgi_std is set to false, use this to specify + the variable # # name of the CGI object (e.g. 'q' becomes 'my $q += CGI->new;') # my $shebang = '#!/usr/bin/perl -wT'; # This is the shebang line that +will be used. # # If left blank, it will be skippe +d. # ###################################################################### +################# my ( %element, %select, @element_order, $select_token ); if ( $HTML::Parser::VERSION < 3.25 ) { warn "\n\tYou have HTML::Parser version $HTML::Parser::VERSION, wh +ich is out of date.". "\n\tXHTML may not parse correctly\n\n"; } # walk through document and get each tag while (my $token = $p->get_tag) { my $tag = $token->[0]; if ( my $form_pos = ( $tag eq 'form' .. $tag eq '/form' ) ) { # Oh! We're in a form. Start looking for stuff. if ( $form_pos != 1 and substr( $form_pos, -2 ) ne 'E0' ) { add_input_element( $token ) if $tag eq 'input'; # <select> is a pain, so we need to handle it differently if ( my $select_pos = ( $tag eq 'select' .. $tag eq '/sele +ct' ) ) { $select_token = $token if $tag eq 'select'; if ( $select_pos != 1 and substr( $select_pos, -2 ) ne + 'E0' ) { add_select_element( $token, $p, $select_token ) if + $tag eq 'option'; } elsif ( substr( $select_pos, -2 ) eq 'E0' ) { # we've finished the <select>, so add it to # %select so we knows we've seen it. $select{ $select_token->[1]->{ 'name' } } = ''; } } # end if (select) foreach ( qw/ textarea button / ) { add_generic_element( $token ) if $tag eq $_; } } elsif ( substr( $form_pos, -2 ) eq 'E0' ) { # we've finished the form, so let's write the document, cl +ear the vars, # and start looking for more forms. &write_template; %element = (); @element_order = (); $formnum++; } } # end if (form) } sub add_generic_element { my $token = shift; my $tag = $token->[0]; my $name = $token->[1]->{ 'name' } || ''; my $value = $token->[1]->{ 'value' } || ''; # I don't want to pass 'text' defaults as they can often be huge if ( $tag eq 'textarea' ) { $value = ''; } update_element_hash( $name, $tag, $value, 1 ); } sub add_select_element { my ( $token, $p, $select_token ) = @_; my $name = $select_token->[1]->{ 'name' } || ''; my $value = $token->[1]->{ 'value' } || ''; # The following is because an <option> with a select will # assume the value of the text if no value is specified if ( ! $value ) { $value = $p->get_trimmed_text; } my $multiple = exists $select_token->[1]->{ 'multiple' } ? 1 : exists $select{ $name } ? 1 : 0 ; update_element_hash( $name, 'select', $value, $multiple ); } sub add_input_element { my $token = shift; my $type = $token->[1]->{ 'type' } || ''; my $name = $token->[1]->{ 'name' } || ''; my $value = $token->[1]->{ 'value' } || ''; my $multiple = $type eq 'radio' ? 0 : 1; update_element_hash( $name, $type, $value, $multiple ); } sub update_element_hash { my ( $name, $type, $value, $multiple ) = @_; return if $type eq 'reset'; # This only affects the form, not the +script $value =~ s/\n/\\n/g; if ( $name ) { if ( ! exists $element{ $name } ) { $element{ $name }{ 'multiple' } = 0; $element{ $name }{ 'type' } = $type; $element{ $name }{ 'value' } = $value ? [ $value ] : [] +; push @element_order, $name; } else { $element{ $name }{ 'multiple' } += $multiple; push @{ $element{ $name }{ 'value' } }, $value if $value; } } } sub usage { print <<" END_HERE"; Usage: form_parse.pl some.html END_HERE exit; } sub write_template { my $filename = "${basename}_form_${formnum}.cgi"; my $cgi_var = $cgi_std ? '' : "\$${cgi_obj}->"; my $cgi_line = 'use CGI'; $cgi_line .= $cgi_std ? " qw/:standard/;\n" : ";\n"; my $max_var_length = 1; foreach ( keys %element ) { $max_var_length = length if length > $max_var_length; } open OUT, "> $filename" or die "Can't open $filename for writing: +$!"; print OUT $shebang."\n" if $shebang; print OUT "use strict;\n"; print OUT $cgi_line; if ( ! $cgi_std ) { print OUT "my \$$cgi_obj = CGI->new;\n"; } print OUT "\n# Grab all data\n"; # Here's where we print param() calls foreach my $element ( @element_order ) { my $var_name = $element; $var_name =~ s/\W/_/g; # nice and safe variable names my $data_type = $element{ $element }{ 'multiple' } ? '@' : '$' +; print OUT qq/my ${data_type}_${var_name} / . ' ' x ( $max_var_length - length $var_name ) . qq/= ${cgi_var}param( '$element' );/ . ' ' x ( $max_var_length - length $var_name ) . qq/ # $element{ $element }{ 'type' }\n/; } print OUT <<" END_HERE"; # The following is just a rough "fill in" template for untainting +your data. # It will need to be customized to suit your particular needs. Yo +u'll need # to create regular expressions to untaint your data and if you sk +imp on this, # it's at your peril!!! # These are NOT efficient regexes. For large forms with many simi +lar data values # for the same name, this could be quite slow. I had considered m +aking these more # efficient by having separate regex tests for each value, but sin +ce the user still # needs to go in and tweak them, I didn't see the point. END_HERE # here's where we print the untainting template foreach my $element ( @element_order ) { my $var_name = $element; $var_name =~ s/\W/_/g; # nice and safe variable names my $type = $element{ $element }{ 'type' }; print OUT "# $type values: ". join( ",", @{ $element{ $element + }{ 'value' } } ) . "\n"; if ( $element{ $element }{ 'multiple' } ) { # Ooh, multiple values. Need to untaint an array. print OUT qq!my \@${var_name};\n!; print OUT qq!( \$${var_name}\[\$_] ) ! . ' ' x ( $max_var_length - length $var_name ) . qq!= ( \$_${var_name}\[\$_] =~ /^(! . create_regex( +$element ) . qq!)\$/ ) foreach ( 0 .. \$#_${var_name} );\n\n!; } else { # Untainting a scalar. print OUT qq!my ( \$${var_name} ) ! . ' ' x ( ( $max_var_length - length $var_name ) + + 1 ) . qq!= ( \$_${var_name} =~ /^(! . create_regex( $e +lement ) . qq!)\$/ );\n\n!; } } close OUT or die "Can't close $filename: $!"; } sub create_regex { my $element = shift; my $type = $element{ $element }{ 'type' }; my @safe_types = qw/ hidden checkbox radio select submit /; return ' # could not auto-create regex # ' if ! grep { /$type/ } @ +safe_types; my @values = @{ $element{ $element }{ 'value' } }; @values = return_unique_items( @values ) if @values; if ( scalar @values > 1 ) { my $all_length_of_one = 1; foreach ( @values ) { $all_length_of_one = 0, last if length > 1; } if ( $all_length_of_one ) { # We don't want a negated character class! if ( $values[0] eq '^' ) { $values[0] = '\^'; } for ( 1 .. $#values ) { # quote ']', '\', and '-' in character class if ( $values[ $_ ] =~ /[\]\\\-]/ ) { $values[ $_ ] = '\\' . $values[ $_ ]; } } } else { @values = map { quotemeta( $_ ) } @values; } my $regex = $all_length_of_one ? '['. join( '', @values ) .']' + : join( '|', @values ); return $regex; } else { if ( $values[0] ) { return quotemeta $values[0]; } } } sub return_unique_items { my %seen; grep { ! $seen{ $_ }++ } @_; }

Here's a sample script that it spit out:

#!/usr/bin/perl -wT use strict; use CGI; my $q = CGI->new; # Grab all data my $_somename = $q->param( 'somename' ); # hidden my $_name = $q->param( 'name' ); # text my @_group1 = $q->param( 'group1' ); # checkbox my $_pass = $q->param( 'pass' ); # password my @_sports = $q->param( 'sports' ); # select my @_test = $q->param( 'test' ); # textarea my $_radio1 = $q->param( 'radio1' ); # radio my @_religions = $q->param( 'religions' ); # select my $_asdf = $q->param( 'asdf' ); # select my $__submit = $q->param( '.submit' ); # submit # The following is just a rough "fill in" template for untainting +your data. # It will need to be customized to suit your particular needs. Yo +u'll need # to create regular expressions to untaint your data and if you sk +imp on this, # it's at your peril!!! # These are NOT efficient regexes. For large forms with many simi +lar data values # for the same name, this could be quite slow. I had considered m +aking these more # efficient by having separate regex tests for each value, but sin +ce the user still # needs to go in and tweak them, I didn't see the point. # hidden values: asdf my ( $somename ) = ( $_somename =~ /^(asdf)$/ ); # text values: Ovid my ( $name ) = ( $_name =~ /^( # could not auto-create regex # ) +$/ ); # checkbox values: 1,2 my @group1; ( $group1[$_] ) = ( $_group1[$_] =~ /^([12])$/ ) foreach ( 0 .. $#_ +group1 ); # password values: my ( $pass ) = ( $_pass =~ /^( # could not auto-create regex # ) +$/ ); # select values: Tiddly winks,Mud wrestling,Twister,Jello wrestling my @sports; ( $sports[$_] ) = ( $_sports[$_] =~ /^(Tiddly\ winks|Mud\ wrestling +|Twister|Jello\ wrestling)$/ ) foreach ( 0 .. $#_sports ); # textarea values: my @test; ( $test[$_] ) = ( $_test[$_] =~ /^( # could not auto-create regex + # )$/ ) foreach ( 0 .. $#_test ); # radio values: ^,^,^,\,],1,2,3,4 my ( $radio1 ) = ( $_radio1 =~ /^([\^\\\]1234])$/ ); # select values: Democrat,Republican my @religions; ( $religions[$_] ) = ( $_religions[$_] =~ /^(Democrat|Republican)$/ ) +foreach ( 0 .. $#_religions ); # select values: Dasdfsdf,asfdasdfasdf my ( $asdf ) = ( $_asdf =~ /^(Dasdfsdf|asfdasdfasdf)$/ ); # submit values: why bother? my ( $_submit ) = ( $__submit =~ /^(why\ bother\?)$/ );

The monk who was questioning this strategy is someone who I hold in high esteem, so I am concerned about whether or not this is a reasonable approach. If I have missed anything, or if anyone can think of a better approach...

Cheers,
Ovid

Update: After giving it some thought, I realized I could improve upon the above code. For certain form elements (qw/ hidden checkbox radio select submit/), auto-creating the untainting should be relatively simple. After all, those values should not change! Either they are sent or not. Period. As a result, I have updated the code to create the untainting code for those element types. I've also improved formatting of the output. See code above.

Needless to say, the generated code is not production ready. Tweaking will still be necessary. Since no one has *publicly* commented on the code, I'm assuming that there are no major problems with it.

Of course, the actual error-handling on the untainted items is another issue, but it varies enough from case-to-case that I felt trying to add something like that would be useless.

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Code Generating Scripts - Long Post
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://86815]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (11)
As of 2014-10-21 07:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (98 votes), past polls