Update on controlling long-running processes via CGI

dannyhmg has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Update on controlling long-running processes via CGI by GrandFather (Saint) on Nov 12, 2014 at 20:22 UTC
It's a while since I was working on that project. I think maybe, but I'm not sure, it was running on a *nix server for a while, but I haven't the code available here. However a little Super Searching brings up a few hits that may help: cgi,wait for long running external prog, forking in a cgi script and Handling long running operations started by CGI requests. Playing with the search parameters will bring up more hits. If you don't have any joy in the next 8 hours reply to this reply and I'll see if I can dig up the old code I had for solving this problem. Perl is the programming world's equivalent of English	[reply]
Re: Update on controlling long-running processes via CGI by scorpio17 (Canon) on Nov 13, 2014 at 17:07 UTC
I've done something similar. This runs on a RHEL server with apache, etc. I used CGI::Application and HTML::Template, so you may have to make a few slight changes, but this should (hopefully) help you get things working. In my case, I'm displaying a large spreadsheet-like table of data. I only show 50 rows on a page, but there's a pager control at the bottom (first, prev, next, last). There's also a button users can click labeled "Download CSV" that will allow them to download all the data as a comma-separated-value file. If there's a LOT of data, dumping the file can take a relatively long time (the web page might time out and generate an error), or worse, the user might get impatient and click the button two or three more times! So, here's how I did it: First, the initial page with the data table has this HTML near the bottom: `<form name="csvform" action="/myapp.pl/downloadcsv" method="POST"> <input type="submit" name="csv" id="csv" value="Generate CSV File" onc +lick="return SubmitTheCSVForm();" /> </form>` [download] The main thing to notice here is that, when clicked, we're going to call a script called "downloadcsv" (in CGI::Application, every page is defined in a "run mode", and run modes are just subroutines. All my run modes are in the myapp.pl script.) The onclick event points to some javascript that disables the button, preventing multiple clicks. It looks like this: `var submitted = false; function SubmitTheCSVForm() { if(submitted == true) { return; } document.csvform.csv.value = 'working...'; document.csvform.csv.disabled = true; submitted = true; document.csvform.submit(); }` [download] Inside downloadcsv, I have the following code: `sub downloadcsv : Runmode { my $self = shift; if (my $pid = fork) { # parent does this return $self->redirect("/myapp.pl/csv_status"); } elsif (defined $pid) { # child does this close STDOUT; close STDERR; open STDERR, ">&=1"; my $id = $self->session->id(); my $cmd = "$CFG{'PATH'}/make_csv.pl"; exec "$cmd", "$id"; die "can't do exec: $!"; } else { die "cannot fork: $!"; } }` [download] Notice that I use fork here. The parent process redirects to another page, which will basically display a "please wait..." message (more on that later). The child process actually runs another script (the long running process that actually does the work - in my case, generating the file to be downloaded). Things to note: I have a config file in which I define a path to where my script lives. My $cmd variable contains the command I would type on a linux command line (it's not a URL). You have to make sure your permissions are set correctly. For example, if the web server runs as user 'nobody', then this script is run as user 'nobody'. Since it's writing a file, the location it's written to must be writable by user 'nobody', etc. Make sure you test your command as the correct user (if you only test as yourself, you may have different env variables, path settings, etc. In my case, I'm running another perl script, but $cmd could contain anything. This is a security risk - be careful, especially if you build the command using any input from the user. I pass a session ID, in case multiple users request different downloads at the same time. I'm skipping some of those details in order to try to stay on topic. Also note that I close STDOUT and STDERR. If you don't do this, apache won't "let go" of the child process. This is very important! You must sever this connection for the child to be independent. Also, if exec works correctly, it will never return, so the die on the next line will never be reached. Meanwhile, back in the parent process, we redirected to the "cvs_status" page, which is defined something like this: sub csv_status : Runmode { my $self = shift; my $id = $self->session->id(); my $path = $CFG{'CSV_TEMP'}; my $still_running = 0; if ( -e "$path/$id/csv.pid" ) { open my $in, '<', "$path/$user/csv.pid" or die "can't access $user/csv.pid file : $!"; my $pid = <$in>; close $in; if ( IsStillRunning($pid) ) { $still_running = 1; } else { $still_running = 0; } } my $template = $self->load_tmpl('csv_status.html'); $template->param( TITLE => "CSV Status", STILL_RUNNING => $still_running,, ); return $template->output; } [download] I've removed a lot of error checking to make things simpler. The basic idea is that my long running script creates a process id file when it starts up. I can use that PID to check and see if it's still running or not. I pass this status to my template with the $still_running variable. Basically, there are two versions of the "status" page, depending on whether or not the process is still running, or has finished. The template (cvs_status.html) contains the following: `<TMPL_IF STILL_RUNNING> <head> <meta http-equiv=refresh content=5> </head> ... <TMPL_IF STILL_RUNNING> <img src="images/working.gif" /> <hr> <p>Please be patient... this might take a while.</p> <TMPL_ELSE> <h3> Job complete!</h3> </TMPL_IF>` [download] Again, I'm only showing the important bits. At the top, inside the header, IF the job is still running, I use a meta tag to force the page to reload every 5 seconds. Further down, in the body of the page, IF the job is still running, I display an animated gif (a little spinning icon), and a "please wait" message. When the job completes, the meta tag is NOT written (so the page refresh stops), and the icon/"please wait" message gets replaced with a "job complete" message (in my case, I also generate a link to the CSV file that the user can click to download.) It would probably be better to use AJAX to refresh the page, instead of the meta tag, but I did this a long time ago before I knew how to use AJAX. Good luck, I hope this helps!	[reply] [d/l] [select]
Re: Update on controlling long-running processes via CGI by jhourcle (Prior) on Nov 13, 2014 at 16:51 UTC
If you can build a way for the process to report on its status (such that another script can then monitor it), you can likely convert the whole thing to use what's called 'server push'. There's a few different variations, but the methods that I've used are the `multipart/x-mixed-replace` trick, where you send multiple HTML documents with status updates and then the final one when done. I've also used more 'web-app' type systems, where the page is set up, but populated/updated with javacript after the initial draw. I find the first one easier, but not all browsers (eg, IE) support it. In any case, you need to make sure that your server is treating your CGIs as 'NPH' (non-parsed-headers ... ie, it won't wait for all of the content to come down before it emits it to the client).	[reply] [d/l]
Re: Update on controlling long-running processes via CGI by Anonymous Monk on Nov 13, 2014 at 04:35 UTC
I'm not sure how to properly initiate a child process on this server Proc::Background has a nice interface for that	[reply]
Re: Update on controlling long-running processes via CGI by jellisii2 (Hermit) on Nov 13, 2014 at 17:16 UTC
This ideally should be handled in AJAX using promises, assuming the end user is using something that resembles a modern browser. I link jQuery's stuff here because that's what I use, but if you want to roll your own, I don't see what's stopping you.	[reply]
Re: Update on controlling long-running processes via CGI by brachtmax (Initiate) on Nov 13, 2014 at 10:44 UTC
regarding the creation of a a child process,- fork() for itself creates just a clone of the current process, it's useless without a subsequent exec(). So after the fork() you have two (almost) identical processes, both of'em just returning from fork(). The fork return value is then used to determine whether it's child or parent - in case of parent you do the exec() then which replaces the clone with whatever process you like: if($pid = fork()) { #...this is the parent ... } elsif (defined $pid) { # this is the child... exec(<whatever executable>) } else { # error - the fork() didn't work }	[reply]
Re: Update on controlling long-running processes via CGI by sundialsvc4 (Abbot) on Nov 13, 2014 at 03:45 UTC
Broadly speaking, this sort of activity needs to be treated as “a background job.” The web-page ... Apache-based or otherwise ... therefore should be seen merely as a user interface by which the user can submit work to be processed, query the present status of the work in progress, and retrieve the output. There are many “batch job monitoring systems” out there for all operating systems, including those that are designed to be cross-platform. There are CPAN modules, as well. An elementary implementation of this idea ... which, by the way, is probably the most common one ... is to have one-or-more workers that are launched by means of `cron`, with an SQL database acting as the job-queue. The workers query the database to find work-to-do and, using an SQL transaction to provide atomicity, select one. Then they carry out the work, ensuring that any exceptions that may be thrown will be caught. And this they do forever. A key aspect of this arrangement is that, no matter how many units of work may be requested, and no matter how rapidly they come in, the work is always carried out in a predictable and controlled way. The web page is, and the web page remains, only the user-interface: the means by which the user can interact with the batch system, but not a player in the game.	[reply]
Re: Update on controlling long-running processes via CGI by Anonymous Monk on Mar 18, 2017 at 01:16 UTC
Here is how I did it...I created another (unique) HTML file with meta tags which will cause the page to refresh itself every 15 seconds and with no cache saved. Once that placeholder HTML file is created, redirect the user to that placeholder HTML file and kick off a grandchild worker to update the HTML file dynamically. Once the grandchild is done updating the HTML file then we clear out the HTML one final time with just the final HTML results you want shown. CGI - Parent worker: /srv/www/cgi-bin/parent.pl obtain CGI params like normal create directory and html file and give proper permissions for both printf HTMLFILE "<meta http-equiv=\"refresh\" content=\"15\">" printf HTMLFILE "<META HTTP-EQUIV=\"Pragma\" CONTENT=\"no-cache\">" printf HTMLFILE "please wait message or an animated gif" print "Location: PlaceHolderURL\n\n" fork off a child worker (passing in the html filename) wait 3 seconds to allow the child to start up exit CGI - Child worker: close stdout (close STDOUT;) and stderr (close STDERR;) so browser doesn't wait on child exec to a grandchild which will do the work, passing in the html filename CGI - Grandchild worker: /srv/www/cgi-bin/grandchild.pl get html file as an argument append html file with any progress you want to show and do long-running processing once finished processing: -optional: close and reopen html file without appending to clear it -print out what you want shown (the html results)	[reply]


Perl Monk, Perl Meditation
	PerlMonks