Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

how to improve my script

by agn (Initiate)
on Jun 15, 2009 at 12:32 UTC ( #771634=perlquestion: print w/ replies, xml ) Need Help??
agn has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Am new to perlmonks.org and to Perl. Currently going through 'Intermediate Perl'. To keep me motivated I started a project - a simple/basic web server. I know that re-inventing the wheel is not a good idea but I figured it could be a good learning process. So please give me suggestions on how to improve my code. It's not complete yet, but I want to know if am in the right direction.

Here is the code.

PS: Am posting a link as am not sure if its allowed to post around 100+ lines of code in a post.
#!/usr/local/bin/perl -w #Caution: Big mess ahead #TODO: # get .. working in dir listing # implement Getopt::Long # usage() function # log to syslog # daemonize # persistent connections use strict; use IO::Socket; my (@files, $req, $client); my $DOCROOT = '/home/arun/site/'; my %error_page = ( 403 => $DOCROOT.'403.html', # forbidden 404 => $DOCROOT.'404.html', # not found 406 => $DOCROOT.'406.html', # not acceptable 501 => $DOCROOT.'501.html' # not implemented ); $SIG{'INT'} = \&cleanup; my $socket = new IO::Socket::INET ( LocalAddr => '172.17.1.50', LocalPort => (shift || 4321), Proto => 'tcp', Listen => 5, ReuseAddr => 1 ) or die "$! \n"; $socket->listen(); &log("Listening on ".$socket->sockhost().":".$socket->sockport."\n"); while ($client = $socket->accept()) { &log("Connection from ".$client->peerhost().":".$client->peerport( +)."\n"); # get http request - first line $req = <$client>; &log($client->peerhost()." ".$req); &respond_to( &handle_req($req) ); close $client; } sub cleanup { close $socket; die "Interrupted. Exiting...\n"; } sub log { my $msg = shift; print scalar localtime," ", $msg; } sub getfiles { my $dir = shift; opendir DIR, $dir or die "open:$!\n"; # remove . and .. from list of files @files = grep { !/^\.(\.)?$/ } readdir DIR; closedir DIR; return \@files; } sub handle_req { my ($method, $uri) = split / +/, shift; if ($method !~ /^GET/) { &log("501 Not Implemented\nr"); return 501; } $uri =~ s/\/(.*)/$1/; # strip the f +irst slash if (-e $DOCROOT.$uri) { if (-f $DOCROOT.$uri) { &log("200 HTTP OK\n"); return 200; } elsif (-d $DOCROOT.$uri) { &log("200 HTTP OK\n"); return 200; } else { &log("406 Not Acceptable\n"); return 406; } } else { &log("404 Not Found\n"); return 404; } } sub respond_to { my $status_code = shift; unless ($status_code == 200) { &display($error_page{$status_code}) if (-f $error_page{$status +_code}); return; } my $uri = (split / +/, $req)[1]; $uri =~ s/\/(.*)/$1/; my $path = $DOCROOT.$uri; if (-f $path) { &display($path) ; return; } if (-d $path) { if (-f $path.'index.html') { &display($path.'index.html'); } else { &gen_dir_list($uri, &getfiles($path)); } } } sub display { my $file = shift; open RES, $file or die "open: $file: $!"; &log("Sending $file\n"); print $client $_ while (<RES>); close RES; } sub gen_dir_list { my ($uri, $files) = @_; &log("[info] dir listing request\n"); # print html header print $client <<HEADER; <html> <head><title>dir listing for: /$uri</title></head> <body> <table cellpadding=5> HEADER my ($count, $modification_time); foreach my $f (@$files) { # open $f to get its modification time if (-f $DOCROOT.$uri.'/'.$f) { open my $handle, $DOCROOT.$uri.'/'.$f or &log("open: $!\n" +); $modification_time = scalar localtime((stat $handle)[9]); } else { opendir my $handle, $DOCROOT.$uri.'/'.$f or &log("opendir: + $!\n"); $modification_time = scalar localtime((stat $handle)[9]); } printf $client "%s<td><a href=\"%s\">%s</a></td><td>%s</td></t +r>", # different colours for alternate rows (++$count % 2 ? '<tr bgcolor="#e0ffd6">' : '<tr bgcolo +r="#ffdcd6">'), # generate href links (-d $DOCROOT.$uri.'/'.$f ? '/'.$uri.$f.'/' : '/'.$ +uri.$f), # append a '/' to the end of dir names (-d $DOCROOT.$uri.'/'.$f ? $f.'/' : $f), $modification_time; } #print html footer print $client <<FOOTER; </table> <p>-- httpserv v0.1 --</p> </body> </html> FOOTER }

Comment on how to improve my script
Download Code
Re: how to improve my script
by rovf (Priest) on Jun 15, 2009 at 13:04 UTC

    I suggest that you also test your code against http://perlcritic.com/, maybe with critic level set to "stern" for a start.

    Also, I don't think it is a good idea to start your sub calls with &. Though it doesn't harm in your case, if you would call a sub having a prototype, you would bypass the conversion. Though I use & only if I really need to.

    -- 
    Ronald Fischer <ynnor@mm.st>
Re: how to improve my script
by betterworld (Deacon) on Jun 15, 2009 at 13:27 UTC

    From looking at your code, it appears that it is vulnerable to directory traversal attacks. If someone requests URLs like "/../../../../../../etc/passwd", you happily deliver the system's user database.

    Also I'd guess that you'll get a lot of warnings if a client terminates the connection without sending a request.

    However I may be wrong; I did not run your code, I only had a look at it.

    I realize that you are doing this only as an exercise, but I'd strongly recommend thinking about malicious input when writing networking software.

      I'm missing taint mode and any kind of input validation. I see code written in Perl4 style (&log(...)), I see repeated (hidden) stat calls (-e $DOCROOT.$uri, then -d $DOCROOT.$uri, then -f $DOCROOT.$uri), I see open where stat is sufficient (# open $f to get its modification time). And I'm sure perlcritic would find a lot more than this.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: how to improve my script
by agn (Initiate) on Jun 15, 2009 at 18:32 UTC
    Thanks for the suggestions. Directory traversals with .. don't work. I checked it when I wrote the code. Patched up the code while keeping the other suggestions in mind. Pasting the updated code:
    #!/usr/local/bin/perl -w #Caution: Big mess ahead #TODO: # get .. working in dir listing # implement Getopt::Long # usage() function # log to syslog # fork() children # daemonize # persistent connections use strict; use IO::Socket; my ($socket, @files, $req, $client); my $DOCROOT = '/home/arun/docs/'; my %error_page = ( 403 => $DOCROOT.'403.html', # forbidden 404 => $DOCROOT.'404.html', # not found 406 => $DOCROOT.'406.html', # not acceptable 501 => $DOCROOT.'501.html' # not implemented ); #$SIG{'INT'} = \&cleanup; $socket = new IO::Socket::INET ( LocalAddr => '127.0.0.1', LocalPort => (shift || 4321), Proto => 'tcp', Listen => 5, ReuseAddr => 1 ) or die "$! \n"; $socket->listen(); &log("Listening on ".$socket->sockhost().":".$socket->sockport."\n"); while ($client = $socket->accept()) { &log("Connection from ".$client->peerhost().":".$client->peerport( +)."\n"); # get http request - first line $req = <$client>; &log($client->peerhost()." ".$req); &respond_to( &handle_req($req) ); close $client; } sub cleanup { close $socket; die "Interrupted. Exiting...\n"; } sub log { my $msg = shift; print scalar localtime," ", $msg; } sub getfiles { my $dir = shift; opendir DIR, $dir or die "open:$!\n"; # remove . and .. from list of files @files = grep { !/^\.(\.)?$/ } readdir DIR; closedir DIR; return \@files; } sub handle_req { my ($method, $uri) = split / +/, shift; if ($method !~ /^GET/) { &log("501 Not Implemented\nr"); return 501; } $uri =~ s/\/(.*)/$1/; # strip the fi +rst slash if (-e $DOCROOT.$uri) { if (-f $DOCROOT.$uri) { &log("200 HTTP OK\n"); return 200; } elsif (-d $DOCROOT.$uri) { &log("200 HTTP OK\n"); return 200; } else { &log("406 Not Acceptable\n"); return 406; } } &log("404 Not Found\n"); return 404; } sub respond_to { my $status_code = shift; unless ($status_code == 200) { &display($error_page{$status_code}) if (-f $error_page{$status +_code}); return; } my $uri = (split / +/, $req)[1]; $uri =~ s/\/(.*)/$1/; chomp($uri); my $path = $DOCROOT.$uri; if (-f $path) { &display($path) ; return; } if (-d $path) { if (-f $path.'index.html') { &display($path.'index.html'); } else { &gen_dir_list($uri, &getfiles($path)); } } return; } sub display { my $file = shift; open RES, $file or die "open: $file: $!"; &log("Sending $file\n"); print $client $_ while (<RES>); close RES; } sub gen_dir_list { my ($uri, $files) = @_; &log("[info] dir listing request\n"); # print html header print $client <<HEADER; <html> <head><title>dir listing for: /$uri</title></head> <body> <table cellpadding=5> HEADER my $count; foreach my $f (@$files) { printf $client "%s<td><a href=\"%s\">%s</a></td><td>%s</td></t +r>", # different colours for alternate rows (++$count % 2 ? '<tr bgcolor="#e0ffd6">' : '<tr bgcolo +r="#ffdcd6">'), # genereate href links (-d $DOCROOT.$uri.'/'.$f ? '/'.$uri.$f.'/' : '/'.$uri +.$f), # append a '/' to the end of dirs (-d $DOCROOT.$uri.'/'.$f ? $f.'/' : $f), scalar localtime((stat $DOCROOT.$uri.'/'.$f)[9]); } #print html footer print $client <<FOOTER; </table> <p>-- httpserv v0.1 --</p> </body> </html> FOOTER }

    Thanks for the percritic link. It did find many things. Am still working on it. I will also remove & from function calls.

      Some notes:

      • Enable taint mode (perlsec). This will generate a lot of new errors, everywhere you pass unverified data to critical functions. Add data verification, don't disable the taint mode.
      • Get rid of the Perl4-style function calls. The ampersand is not only not needed, but it DISABLES prototype checks. You don't want that.
      • Escape all HTML and URI output. You are opening your server for cross-site scripting attacks.
      • Don't stat a file more than once. Learn about the _ (last stat result) argument for the file test operators (-d, -f, -s, -e, and so on).
      • handle_req does not handle requests, it just tests for file or directory and returns a HTTP status code. The real request handler is respond_to. Think about the function names, think about merging both functions.
      • display displays nothing, it sends a file to the client. Think about the function name. It lacks binmode, this will damage binary files on Win32, DOS, OS/2, and perhaps other systems.
      • gen_dir_list not only generates the directory list, but also sends it to the client. Think about the function name.
      • $req and $client are globals, so your code will not be able to handle more than one request at a time. Pass them as parameters to the request handler and your code can handle more than one request when you later change the server part of your code.
      • Your code aborts reading the request after the first line, this may confuse clients. You are expected to read the entire request before responding.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      I downloaded and ran your code, and traversals with ".." definitely work:

      skx@gold:~$ telnet 127.0.0.1 4321 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. GET /../../../../../../etc/passwd HTTP/1.0 root:x:0:0:root:/root:/bin/bash ..

      I notice too that you never return a content-type in your response which is surprising to say the least! (I was surprised that firefox displayed the resulting pages correctly.)

      Steve
      --
Re: how to improve my script
by toolic (Chancellor) on Jun 15, 2009 at 18:48 UTC
    PS: Am posting a link as am not sure if its allowed to post around 100+ lines of code in a post.
    It would be preferable in this case if you included the code in "readmore" tags (as described in Writeup Formatting Tips), rather than as a link. You may update your original post to do so.

    The way you handle your @files variable is a little confusing to me. You declare it in your main code as my @files;, and you fill it your getfiles sub. But then that sub does something unexpected by returning it as a reference to an array. You should decide whether you want to use it as an ordinary array or as an array reference, not both. I would be surprised if this does not give you problems.

Re: how to improve my script
by agn (Initiate) on Jun 16, 2009 at 05:22 UTC
    skx, .. traversals didn't work when I checked it in firefox. Maybe it was firefox that removed those ..'s . It works while using telnet. :( Thanks for the heads up. That'll be my first priority.

    And thanks, afoken, for the suggestions. I definitely have my hands full with your suggestions. For a novice like me, without any prior programming experience, it'll take a bit of time to correct all these. I'll be back after I make the recommended changes.

    I also want to make it clear that I don't plan to write a full fledged web server. Just want to support GET requests and write good code.

    Thank you all once again.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://771634]
Approved by rovf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2014-09-01 08:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (299 votes), past polls