Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Managing a long running server side process using CGI

by GrandFather (Saint)
on Jan 27, 2007 at 09:13 UTC ( [id://596850]=perlquestion: print w/replies, xml ) Need Help??

GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

I have been asked by a friend to convert a command line application to have a web interface. As this is the first CGI programming I have done I've taken Ovid's excellent introductory cgi_course and now know all there is to know about CGI a little about CGI and security.

While the web interface is fairly simple - gather a few parameters then kick off a simple process - the process comprises running a very large number of iterations of a simple calculation then generate some statistical results. My problem is: how do I set up a CGI process to do some work that may take between a few hours and a day while letting the user who initiated the process monitor progress and possibly even update parameters and restart the process?

My initial thought is that this could be accomplished by using a file to pass information to a "child" process which performs the actual processing, updating a status file as it goes. Questions are: how do I spawn a child process (for a Windows or *nix server) and is there a better mechanism for interprocess communication than simply using a file?


DWIM is Perl's answer to Gödel
  • Comment on Managing a long running server side process using CGI

Replies are listed 'Best First'.
Re: Managing a long running server side process using CGI
by McDarren (Abbot) on Jan 27, 2007 at 10:00 UTC
Re: Managing a long running server side process using CGI
by Joost (Canon) on Jan 27, 2007 at 12:04 UTC
    A file can be a nice and simple way of interfacing with a background process. You can just have a deamon process update a file periodically and then you can have access to that info whenever you want. You could even use something like Data::Dumper or YAML to exchange complex data structures.

    However, if you have multiple tasks to run and you're already using a database, I would use the database to do the communication. i.e. set up one table where each record is a task (id, status etc) and run one or more deamon processes that poll that table looking for new tasks and update the records as they progress. Or run a single deamon that forks() off new children for each task. Or run all tasks sequentially, it'll all look the same from the CGI process's point of view.

Re: Managing a long running server side process using CGI
by kyle (Abbot) on Jan 27, 2007 at 13:15 UTC

    I don't know how the Windows folks spawn their child processes, but in *nix, it's fork. For example:

    my $pid = fork(); if ( ! defined $pid ) { die "Can't fork: $!\n"; } if ( $pid ) { # parent } else { # child }

    You could also look at Proc::Daemon which has this and more wrapped up in a tidy package.

    Using a file to do your communication is a fine idea. To avoid polling, you could also use signals to let the worker know when to look. Have a look at the "Signals" section of perlipc.

      I don't know how the Windows folks spawn their child processes

      Win32::Job provides a high-level interface for controlling sub-processes. More low-level is Win32::Process. And there is also system(1, $program, @args) for something that is more akin to a fork/exec on unix.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Managing a long running server side process using CGI
by SheridanCat (Pilgrim) on Jan 27, 2007 at 18:50 UTC
    When I've done this, I just have the CGI interface gather parameters from the user and then set a flag of some sort (presence of a file, a database value, etc) indicating that the process is ready to go.

    The main program is kicked off by cron (or task scheduler in Win32) and checks for whatever the flag is and runs with the supplied params if the flag is found. The main program logs its progress so the user can follow what's happening - presumably via another CGI interface.

    If there's a requirement that parameters can change, just have the main program reread it's param whenever it finds they've changed.

Re: Managing a long running server side process using CGI
by talexb (Chancellor) on Jan 28, 2007 at 01:14 UTC

    I just presented to the Toronto Perlmongers (badly) about how I did this. The short answer is that I used a daemon that spawned a bunch of kids to do some work. Requests arrived at the mod_perl request handler asynchronously, and were handled the same way.

    I can highly recommend IPC::Run, with the following caveats:

    • Specify a timer when setting up the harness. Failing to do so seems to cause the start method to not return.
    • Don't forget to use either pump or pump_nb after sending a command or before looking for the response to a command -- it acts like a yield (remember non-preemptive multi-tasking?).
    • Do your IPC::Run testing with a simple, predictable program, to make sure that you're figured out the IPC::Run functionality before you try to run the actual Production application.

    I hope to post soon on How Not To Make A Presentation to a Local User's Group, as well as Some Pointers on Using IPC::Run. The Toronto Perlmongers will know what I'm talking about.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

    Updated 1350 EST, January 28, 2007: Fixed three glaring spelling mistakes. Sorry about that.

Re: Managing a long running server side process using CGI
by shmem (Chancellor) on Jan 27, 2007 at 23:01 UTC
    is there a better mechinism for interprocess communication than simply using a file?

    On UNIX, you might want to allocate a shared memory segment and have the forked calculating process write it's results to it. See perlipc. The CGI spawning the process sets up the segment and passes it's shmid to the web client, so on subsequent calls a new CGI process knows which segment to read. If the size of the data isn't constant (e.g. growing), a database might be a better approach.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Managing a long running server side process using CGI
by GrandFather (Saint) on Jan 31, 2007 at 10:41 UTC

    After reading through the various previous replies and the linked material I put together the following two test applications. The CGI app is heavily based on the sample code in merlyn's LinuxMag column Watching long processes through CGI (pointed to by McDarren++), with amendments so that it works under Windows (but no longer under *nix, although making both work in this "framework" ought be trivial). In the real application the CGI app would manage the monitored app by writing to a second file.

    The managing app:

    #!Perl -w use strict; use CGI::Pretty qw(:standard :cgi-lib); use CGI::Carp qw(fatalsToBrowser); # Remove for production code use File::Cache; $CGI::DISABLE_UPLOADS = 1; # Disable uploads $CGI::POST_MAX = 10240; # Maximum number of bytes per post $| = 1; # Unbuffered output if (param('Spawn')) { # setup monitoring page then spawn the monitored process my $session = get_session_id(); my $cache = get_cache_handle(); $cache->set($session, "wait ..."); # no data yet Delete_all(); # parent redirects browser to monitor session param('session', $session); print redirect (self_url()); close STDOUT; # Rest of this block alters in *nix context to spawn monitored pro +cess use Win32::Process; my $job; Win32::Process::Create ( $job, 'c:/Perl/bin/perl.exe', "perl.exe spawned.pl $session", 0, NORMAL_PRIORITY_CLASS | DETACHED_PROCESS, '.' ); exit 0; # all done } elsif (my $session = param('session')) { # display monitored data my $cache = get_cache_handle(); my $data = $cache->get($session); if (! $data) { # something is wrong showError ("Cache data not available"); exit 0; } my $headStr = $data eq 'Completed' ? '' : "<meta http-equiv=refres +h content=5>"; print header(); print start_html (-title => "Spawn Results", -head => [$headStr]); print h1("Spawn Results"); print pre(escapeHTML($data)); print end_html; } else { # display spawn form print header(), start_html("Spawn"), h1("Spawn"); print start_form(); print submit('Spawn', 'spawn'); my %params = Vars (); for my $param (keys %params) { print br ("$param -> $params{$param}"); } print end_form(), end_html(); } exit 0; sub showError { print header(), start_html("SpawnError"), h1("Spawn Error"); print p (shift); my %params = Vars (); for my $param (keys %params) { print br ("$param -> $params{$param}"); } print end_html(); } sub get_cache_handle { File::Cache->new ({ namespace => 'Spawn', username => 'nobody', default_expires_in => '30 minutes', auto_purge_interval => '4 hours', }); } sub get_session_id { require Digest::MD5; Digest::MD5::md5_hex(Digest::MD5::md5_hex(time().{}.rand().$$)); }

    The monitored app:

    #!Perl -w use strict; use File::Cache; my $session = shift; my $cache = get_cache_handle(); $cache->set($session, "wibble ..."); # no data yet my $end = time () + 20; my $count = 0; while (time () < $end) { $cache->set ($session, "Count: $count\n"); ++$count; sleep (1); } $cache->set ($session, "Completed"); exit 0; # all done sub get_cache_handle { File::Cache->new ({ namespace => 'Spawn', username => 'nobody', default_expires_in => '30 minutes', auto_purge_interval => '4 hours', }); }

    The code was tested using a local Apache server and seems to provide exactly the types of interaction I am looking for.

    Are there any glaring oversights?


    DWIM is Perl's answer to Gödel

      Using File::Cache seemed to produce strange behaviour which I attribute to file locking or sharing issues and race conditions between the two processes. I've updated the code to use CGI::Session instead which not only seems to have fixed the problem, but also removes the need to generate an ID explicitely.

      Not however the use of flush () in the main loop of the long process to ensure the data is flushed through to the monitoring process.

      #!Perl -w use strict; use CGI::Pretty qw(:standard :cgi-lib); use CGI::Carp qw(fatalsToBrowser); # Remove for production code use CGI::Session; $CGI::DISABLE_UPLOADS = 1; # Disable uploads $CGI::POST_MAX = 10240; # Maximum number of bytes per post $| = 1; # Unbuffered output if (param('Spawn')) { # setup monitoring page then spawn the monitored process my $cache = CGI::Session->new (); my $session = $cache->id(); $cache->param ('status', "wait ..."); # no data yet Delete_all(); # parent redirects browser to monitor session param('session', $session); print redirect (self_url()); close STDOUT; # Rest of this block alters in *nix context to spawn monitored pro +cess use Win32::Process; my $job; Win32::Process::Create ( $job, 'c:/Perl/bin/perl.exe', "perl.exe spawned.pl \"$session\"", 0, NORMAL_PRIORITY_CLASS | DETACHED_PROCESS, '.' ); exit 0; # all done } elsif (my $session = param('session')) { # display monitored data my $cache = CGI::Session->new ($session); my $data = $cache->param ('status'); if (! $data) { # something is wrong showError ("Cache data not available"); exit 0; } my $headStr = $data eq 'Completed' ? '' : "<meta http-equiv=refres +h content=5>"; print header(); print start_html (-title => "Spawn Results", -head => [$headStr]); print h1("Spawn Results"); print pre(escapeHTML($data)); print end_html; } else { # display spawn form print header(), start_html("Spawn"), h1("Spawn"); print start_form(); print submit('Spawn', 'spawn'); my %params = Vars (); for my $param (keys %params) { print br ("$param -> $params{$param}"); } print end_form(), end_html(); } exit 0; sub showError { print header(), start_html("SpawnError"), h1("Spawn Error"); print p (shift); my %params = Vars (); for my $param (keys %params) { print br ("$param -> $params{$param}"); } print end_html(); }

      The monitored (long running) process:

      #!Perl -w use strict; use CGI::Session; my $session = shift; my $cache = CGI::Session->load ($session); $cache->param('status', "configuring ..."); # no data yet my $end = time () + 20; my $count = 0; while (time () < $end) { $cache->param ('status', "Count: $count\n"); $cache->flush (); ++$count; sleep (1); } $cache->param ('status', "Completed"); $cache->flush (); exit 0; # all done

      DWIM is Perl's answer to Gödel
      Hello, I needed to add print end_html; after line print redirect ( self_url() );

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://596850]
Approved by BrowserUk
Front-paged by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-19 06:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found