Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Runaway CGI script

by Pascal666 (Scribe)
on Nov 19, 2014 at 16:15 UTC ( #1107792=perlquestion: print w/replies, xml ) Need Help??

Pascal666 has asked for the wisdom of the Perl Monks concerning the following question:

tl;dr: Somehow a CGI script that doesn't write to disk kept running for about 16 hours after the client disconnected, filled up the disk about 10 hours in, and then freed the space when Apache was killed. Contents of script unknown.

Fully patched CentOS 7. Woke up this morning to "Disk quota exceeded" errors and:

# df -h Filesystem Size Used Avail Use% Mounted on /dev/simfs 25G 25G 0 100% / # du -sh 3.9G .
Top indicated that I had plenty of ram left and a CGI script I wrote yesterday was the likely culprit:
KiB Mem: 1048576 total, 380264 used, 668312 free, 0 buffe +rs KiB Swap: 262144 total, 81204 used, 180940 free. 33856 cache +d Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COM +MAND 5140 apache 20 0 239888 2348 1756 R 17.3 0.2 144:42.67 htt +pd 14980 apache 20 0 30840 1884 1228 S 15.6 0.2 153:43.94 bou +nced.cgi
I killed Apache and now my disk and cpu utilization are normal. I didn't have lsof installed so I couldn't see what file was causing the problem.

Access_log shows only me accessing the script, and error_log shows nothing since I wrote it.

I wrote this quickly yesterday with no error handling, but worst I expected to happen if there was an error was the script to die. I can't understand how the following could possibly fill up my disk. It appears to work as intended.

#!/usr/bin/perl use strict; use warnings; use CGI; use CGI::Carp qw(fatalsToBrowser); my $q = new CGI; print $q->header; print $q->start_html('Bounce summary'); my @files = </home/user/Maildir/cur/*>; for (@files){ open(IN, $_); while (<IN>) { last if /The mail system/; } while (<IN>) { if (m#/domain.com$#) { print '<p>'; last; } s/</&lt/g; print; } close IN; } print $q->end_html;
Edited to add:
Pulling the CGI components out gives nearly identical output to what the web browser tab displays, with no errors showing. The directories I was/will run this against never have subdirectories.

Having thought about it today, I believe one of my initial assumptions when opening this thread was probably incorrect. As a CGI script it only runs when I access it. I only ran it a couple times in its final state (above). It is probable the stuck version was different, and I simply didn't notice it. It could have run for many hours before crippling the server. I do not make a habit of confirming scripts end when they stop loading or I hit X in my web browser, I just assume Apache will kill them.

I just really don't understand how a cgi script could stay running without a client attached. I just created one with an intentional infinite loop, and as soon as I hit X Apache killed it.

From /var/log/messages after I ran "service httpd stop" this morning:

Nov 19 10:38:44 systemd: httpd.service stopping timed out. Killing. Nov 19 10:38:44 systemd: httpd.service: main process exited, code=kill +ed, status=9/KILL Nov 19 10:38:44 systemd: Unit httpd.service entered failed state.
"kill -9 14980" probably would have fixed the problem without killing Apache, but I didn't think of it at the time.

Update 2:
It is actually trivial to create a cgi script that won't die when the client disconnects. My test above contained a "print" inside the loop. Looks like Apache disconnects STDOUT when the client disconnects which causes print to kill the script. For example, a cgi containing just:

#!/usr/bin/perl sleep 1 while 1;
will keep running after the client disconnects, and a "service httpd stop" will yield the same errors as above, however, Apache will kill it after the cgi timeout. So apparently one of my interim scripts entered an infinite loop without a print, but with something that caused Apache's timeout not to kill it. Still no idea how that could use up all free disk space, and then free it immediately when killed.

I just tried writing to STDERR in the loop, both with "print STDERR" and by trying to read a closed filehandle. In both cases error_log recorded the errors immediately and continued to grow in size. When I experienced the disk full error yesterday one of the first things I checked was the log files. error_log was only 7728 bytes.

Replies are listed 'Best First'.
Re: Runaway CGI script
by Eily (Monsignor) on Nov 19, 2014 at 16:40 UTC

    I don't know about your full disk (I actually never used CGI), but there are a few things you may want to change:
    - You should either use autodie or check the return value of open.
    - You probably meant s/</&lt;/g
    - m< /domain.com$ >x will match "/domain/com", "/domainacom" ... (See quotemeta)
    - You could do only one while loop with the flip flop operator:

    while(<IN>) { next if 1../The mail system/; # The condition will be true from line + 1 until "The mail system" # Code here }

    Edit: and having two nested loop with $_ as a loop variable doesn't help readability. Besides, while does not localize $_, so you are actually changing the content of the @files array every time you read a line (each element of @files is aliased to $_, and you write a line in $_).

    Edit again: and just to prove what I'm saying:

    use v5.14; use Data::Dumper; my @list = 1..3; for (@list) { print while (<DATA>); } say Dumper \@list; __DATA__
    $VAR1 = [ undef, undef, undef ];

Re: Runaway CGI script
by kennethk (Abbot) on Nov 19, 2014 at 18:47 UTC
    I would be very surprised if this script caused your issue. What kind of output do you see if you run the following in a TTY context:
    #!/usr/bin/perl use strict; use warnings; my @files = </home/user/Maildir/cur/*>; for (@files){ open(IN, $_); while (<IN>) { last if /The mail system/; } while (<IN>) { if (m#/domain.com$#) { print "\n"; last; } print; } close IN; }
    There are a number of methodological changes I'd make to what you've done, particularly adding a check to see if what you are about to open is a file (-f or next; and see -f) and testing the result of the open (open(IN, '<', $_) or die "Open failure on $_:$!\n"), but these shouldn't be the cause of your issue unless you've got an unbelievably large set of subdirectories and have swamped your log file. Which you said you didn't.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Runaway CGI script
by Anonymous Monk on Nov 19, 2014 at 23:41 UTC
    What kind of filenames do you have? Two argument open could invoke the shell, that is why three argument open is always recomended
      Files are generated by Postfix and of the format: 1416421709.VeaI30e0529M460919.domain.com:2,RSa

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1107792]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2020-10-23 02:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (234 votes). Check out past polls.

    Notices?