http://www.perlmonks.org?node_id=1085442

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have a simple web-interface in PHP (aka a textarea only) and when the user enters some data, I call a Perl program to run externally. The thing is that this program relies on quite a few extra standalone software to produce its final result, some of which are quite time-consuming. I was thus thinking if it is worth implementing a queuing system for this perl program, which would mean that if for example 5 users submit a query at the same time, the computer will only execute one and put the others "on hold".
I have zero experience with these things, so any start-tips would be greatly appreciated... Also, do you think the queue should be done within the perl script? Or the PHP page can somehow do it before-hand?
The only thing I can think about is that I could have some kind of variable that holds my result (the final result after everything has run is that an initially empty variable gets a value and that value is the output to the user, so in that case I could "know" that the whole thing has finished executing (for the 1st user that submitted).
Thank you beforehand!
  • Comment on Queuing system for running a perl program

Replies are listed 'Best First'.
Re: Queuing system for running a perl program
by Corion (Patriarch) on May 08, 2014 at 11:36 UTC

    Personally, I would implement the queue in both, PHP and Perl. The easiest approach would be in my opinion to have PHP append the new jobs to a text file, for example one JSON-encoded line per job, or to have PHP insert new jobs into a database table. Perl can then remove jobs either via Tie::File for the file approach or via DBI for the database approach.

    There is Queue::DBI, which does most of the queue stuff for the Perl side of things. It could be easier to do the queue-appending from Perl if you can't convert the relevant parts of Queue::DBI to PHP easily.

Re: Queuing system for running a perl program
by wjw (Priest) on May 08, 2014 at 12:12 UTC
    So many questions are generated by a question like this.

    For example, is the user informed that the "process is running", come back later for your answer, or "your results will be ready in about x seconds/minutes."?
    Why does processing take so long?
    etc...

    If it were me, I would start by considering changing that PHP script to Perl, if for no other reason than to reduce the number of languages by one.
    The queue system seems like a reasonable approach, as long as you can communicate it back to your user, so they are not left wondering if something is happening.
    That, to my way of thinking, implies that Perl should do the queuing and, if required, inform PHP and thus the user.

    Your back-end is already working, though slowly. I would also examine those 'stand-alone' software pieces and see if what they do can be done better(maybe by Perl).

    Another question: are these jobs that are running somehow attached to a session?
    How do you ensure the results come back to the user that submitted?
    What about jobs where the user leaves before the results come back? Is that job going to be de-queued?

    Hope these thoughts are helpful, and best of luck with your project.

    ...the majority is always wrong, and always the last to know about it...
    Insanity: Doing the same thing over and over again and expecting different results...
      Hi guys,
      thanks to all for your time...Some clarifications:
      1) No, the "Extra" software that is being executed, cannot be re-written in Perl, there are complex programs, can't touch them...
      2)What I am thinking to do is have a random-string generator subroutine, so that the files generated for each user are separated from those of the others. Sound reasonable approach?
      Is there somehow a way to say, have Perl tell me that "Now I started processing this ID" and then associate this unique ID to the current process and Perl tell me that "Now I finished with this ID"? In that scenario, I would know when each full-process has ended and re-direct the user to the output page... Maybe some kind of intermediate "Your query is being processed page" could be helpful? In that case, is this done by PHP, Perl, Javacript?

        You may be interested in Data::GUID for generating your unique process ID.

        If your processing code runs on a single system you can use PIDs to uniquely identify currently running processes. If you are running across a number of systems use a GUID to identify a task and map that to a system name and PID to allow monitoring task progress. This stuff is commonly enough done that much of it is likely already provided in a module, but my quick CPAN search didn't turn up anything promising.

        Perl is the programming world's equivalent of English
        As mentioned by another poster(and a good point), there is in fact, a lot of 'prior art' out there.

        Also, as mentioned by yet another poster, there are in fact existing schedulers which would handle this.

        What you propose is certainly doable with Perl(and others). Again, I have to wonder how long the user will be waiting.
        If the user is expected to wait a few seconds, no problem. If longer...? Regardless, I personally would have Perl put out
        that "please wait for..." page. Your back-end knows what is going on, let it inform the front end. There is no penalty that I can
        think of for doing it that way. To keep the user more or less engaged, have Perl spit out some javascript which probes the server
        for the state of the job and updates the user every 10 seconds or so...

        Just a thought... .

        ...the majority is always wrong, and always the last to know about it...
        Insanity: Doing the same thing over and over again and expecting different results...
        What I am thinking to do is have a random-string generator subroutine, so that the files generated for each user are separated from those of the others. Sound reasonable approach?

        It may seem like that now, but don't do this. Sooner or later the generator will repeat one of the random strings and then the trouble begins. Better to use a truly unique identifier (which could be as simple as an incrementing integer).

Re: Queuing system for running a perl program
by jmacloue (Beadle) on May 08, 2014 at 14:51 UTC

    Well, why re-invent the wheel? Looks like a task for beanstalkd or Gearman actually. These are work queues or job servers - you connect to it from PHP side and place a job request, and your Perl scripts connect to it to listen for such requests. It's up to you when and how to perform the task, and you can even give some feedback on the task result.

    As far as I know clients are available for both applications, both for PHP and for Perl. I'd suggest to start from beanstalkd as it is the simpler one.

      It is such informative to learn by your experience fellow Monks!
      Sorry I didn't mention it earlier... My calculations (given the actual machine that the server will run on) show that it might take something like 7-8 mins, 10 at tops but quite uncommon cases... Usually no more than 3-4 minutes. That's why I was thinking the "Please wait" page, or is this time too much?
      However, these 3-4 minutes I am writing here as the typical response time that I am speculating, is based on the assumption that only 1 user will run at a time, else this time gets prolonged (I mean if I don't do any queueing system underneath).
Re: Queuing system for running a perl program
by DrHyde (Prior) on May 09, 2014 at 10:07 UTC

    It's not entirely clear to me what you're trying to do. However, I find Parallel::ForkManager useful for controlling how many jobs get spawned at once and IPC::ConcurrencyLimit handy for making sure that I don't run multiple copies of my scripts at the same time.

    You can find examples of both in the scripts that keep cpXXXan up to date.

      Ok, I will try to analyze the whole process:
      So, I have a simple interface (textarea), where the user enters a protein sequence (I work with Biology data).
      Step 1: read this sequence and run external C program on it
      Step 2: based on the results of the C program, run a Java code on them (both C and Java software cannot be altered, must be used as-is).
      Step 3: gather results from both runs through the Perl script, make some calculations and produce a final output
      The bottleneck in the whole process is that both steps 2 & 3 can take some time, and that time heavily depends on how many CPU cores I can dedicate to them. So, since the physical machine of the server has 2 cores, I was thinking that each time I could (should?) devote both of them to only one user-request (after all, my web-service is not Google, so I will not really get THAT many jobs at the same time...).
      Does it make more sense now maybe?
Re: Queuing system for running a perl program
by sundialsvc4 (Abbot) on May 08, 2014 at 12:02 UTC

    On a more general level, this sort of thing is called a batch-job monitoring system, or maybe a workload scheduler, and there is already a lot of “prior art” out there on the Internet, ready for the taking.   However, this is one of the most-reinvented wheels, because most of the time people just homebrew something up ... without considering, for example, the issues related to enabling a cluster of computers to reliably process the work given that none of them might be totally reliable.

    The usual “homebrew” beginning starts with an SQL table (in a database system that supports transactions), and a cron job – or simply worker daemons who sleep() – who read that table as a queue of work-to-do.   You homebrew a user-interface screen that queries the table to tell users when the work is done.   (They hit Refresh periodically.)   The worker daemons retrieve work from the table, using transactions to avoid conflict with one another, and they post completion status when they are done.   The number of daemons dictates the number of units of work that can proceed in parallel.   The web-page or what-have-you is only a job-entry and job-observation system, as well as perhaps the way that completed output is retrieved for use.

    It’s a bit strange to me that, whereas in the earlier mainframe days all work was done with batch jobs, when Un*x came along with its champion of “interactivity,” it never acquired a “standard” background-processing tool other than cron.   Which is woefully insufficient.   If, for example, that “extra standalone software” which is “time-consuming” could also have associated resource-conflicts, a simplistic solution will either constantly-overcommit because it does not know about these limits (and thus, run slower than it could, or not at all); or, it will fail to initiate work as fast as it could because it does not know when more work could be started (somewhere) and it is not aware of multiple machines..   If there is a possibility that a unit-of-work might “abnormally end” (ABEND, as we used to say), a simplistic solution might not be able to recover at all.   When you start adding these to your homebrew, the re-invented wheel just gets bigger and bigger.   Therefore, I would canvass the Internet to look at available pre-existing alternatives, including those that are commercial products.   None of these will care if the workload is or isn’t Perl.

Re: Queuing system for running a perl program
by kbrannen (Beadle) on May 13, 2014 at 02:00 UTC
    Load and complexity of the situation are big factors in what to do. As sundialsvc4 points out, if you have a complex situation, go find a product to take care of this for you.

    We had a need for something like this, it was (and will remain) quite simple. Yes, a web front-end with possible many inputs that must be worked on sequentially or else in very controlled batches. Parellel processing caused big problems for us. Interestingly, we could process 1 request in time T, but we could process 30 requests in time 2T; so it was pretty simple to stay caught up.

    Web pages insert the data into a DB table as a queue. That gives us persistance. The CGI then kicks a process with a signal to wake it up and then moves on. The CGI doesn't have to report a status, just queue the work.

    The worker sits there and when woken, starts by reading up to X items from the queue (X differs by time of day). It works on them for about 10 minutes before saving the results, then it removes the X items from the queue. Note, for us, if the work is done twice because something bad happened (which is extremely rare), it's OK because the earlier results are merely replaced with the same data.

    The program goes in a loop doing X items at a time until the queue is empty and then goes back to sleep. The program is 105 lines of perl code including a number of comments and blank lines; it's not hard a hard program to write if you take care.

    *BUT* we have very simple needs and those needs haven't changed since some tweaking at the very beginning (basically trial and error to find the correct size of X).

    If your needs are simple, rolling your own can work fine. If I needed the complexity of cron+make+otherstuff, I'd seriously look to the commercial world for a product.

    HTH,
    Kevin
Re: Queuing system for running a perl program
by sundialsvc4 (Abbot) on May 08, 2014 at 14:38 UTC

    Let me please “harp on” my prior suggestion.   For one client, I prepared a formal research paper outlining the various batch processing / cluster processing systems that were available at that time.   And my recommendation was to buy one, because they had a tremendous amount of computational work to do and their home-grown cron stuff simply was not cutting the mustard.   Their problems were big, involved definite resource-contention scheduling concerns, and were mission-critical.   When you seriously start to “peel the onion-layers off of” this business requirement, you just keep finding more layers.   You have now touched upon only a few.   And that is precisely why I will again state that this has been done before; that it is more intricate than it first appears; and that it is an excellent “build vs. buy” decision.   It is extremely easy for a company to build a very costly and yet inferior solution because they didn’t want to spend any money.   The entirety of your requirement, including the PHP-driven screens that you now contemplate, might very well be bypassed completely.   (Interfaces that are specifically designed to snap-in to existing web applications are also common features.)

    When you finally put into place a really-good workload management system, you will quickly wonder how you got along without one.   And, in my humble, this is probably a better buy decision than a build decision.   (Be careful about saying, “well, no, our situation is really not that complicated ...” because it most-assuredly will become so.)