Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Multi-client approaches

by Tanktalus (Canon)
on Jan 31, 2008 at 22:17 UTC ( #665465=perlquestion: print w/replies, xml ) Need Help??

Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

How do you generally approach multi-client/multi-application application systems? This is a bit vague, so perhaps an example will help.

You have some data or actions (e.g., CSV/Oracle/DB2/MySQL/whatever or sending email/triggering a CGI script/uploading a file) that can happen a number of ways. For example, it could run at the commandline, as well as in its own CGI script. And maybe from a cron job, or maybe in a Postfix filter. Of course, the inputs to each of these are both similar and different - similar in that the same data can be present (maybe some of it is optional), but different in that the source needs to be massaged into the format used by the core engine.

For me, the first two paradigms that come to mind are:

  1. Put all the logic and functionality into modules, allow the front-end to massage the input into parameters (perhaps via other modules, doesn't matter), and run in the front-end's process space.

    This brings up concurrency issues, I think. Of course, if the data is going into an RDBMS, then that largely goes away. Other actions may render concurrency moot, too, but in the general case concurrency may need to be dealt with.

    Upgrading the modules may result in some funky behaviour - clients that start up each time may be using a mix of modules (e.g., if they load while the new modules are being updated), and clients that are daemons themselves (say mod_perl) may simply be using out-of-date modules until reload.

  2. Put all the logic and functionality into modules, put that into a daemon that listens on some socket and requires its input in that format. All the other clients need to reformat their input into the format required by this server, open the socket, send it, possibly wait for a return value, and close.

    Here, concurrency is controlled by the daemon which can fork if concurrency isn't an issue, or deal with one socket at a time if it is.

    Upgrades are tightly controlled - don't reload the daemon until the upgrade is done. And mod_perl won't be affected - no need to reload the apache server. (Ok, so reloading apache isn't that big of a deal - it even has a signal handler to do that gracefully, but it's just an example.)

So ... the question is: for an active production system, which paradigm do you go for? a? b? or some other one? I'm not particularly in love with either, but my limited comp-sci background means I may not have been exposed to other methodologies.

Replies are listed 'Best First'.
Re: Multi-client approaches
by Joost (Canon) on Jan 31, 2008 at 22:50 UTC
    I've used a variant of b) with good results on a production system but note that it only really has benefits if you have a substantial or concurrent (threaded, for instance) process that needs to be synchronized.

    Also note that you can always abstract away the communication by using a client library/module, but you'll still need to make sure the clients are using the current protocol, i.e. use the latest client module.

    Actually what I did was something like this:

    Multiple Apache Dispatching and Multithreaded Worker pro +cesses processes running coordinating server Running on multiple node +s on one or more nodes running on a single One connection per thre +ad node (using IO::Select) Client A - socket -+ +- socket --> Worker 1a Client B - socket -+-> Server -+- socket --> Worker 1b Client C - socket -+ +- socket --> Worker 2a
    Where the clients and workers connect to the server whenever they're ready, and pass simple messages around to start jobs and indicate job status.

    The server itself is about 150 lines of single threaded perl code that organizes requests so that duplicate requests are ignored and shedules jobs to the first worker that becomes available.

    The protocol (client) mechanisms are really pretty simple. Just a few lines to connect to the server (using IO::Socket::INET) and a couple of methods that convert a request to a single line.

    The reason for this setup was that a single worker process can take up to 6 Gb of memory so purely for memory efficiency they had to be multi-threaded (on multi-core machines) to get the best performance out of them. And also that we could spread the workers to other nodes.

Re: Multi-client approaches
by pc88mxer (Vicar) on Feb 01, 2008 at 02:11 UTC
    For simple business processes you probably can get away with direct database access. This is especially true of "read-only" kinds of processes. For your more complex procedures I'd look into implementing a message queuing system. Clients would simply add their requests to a queue that gets processed either periodically or on demand by other servers/threads/agents.

    The advantages of this approach are:

    • It's a simple architecture. You don't have to write robust, long running daemons.
    • You can take down part of the request processing system for updates or maintenance without losing the ability to submit new requests.
    • It is easy to accommodate procedures that must be handled serially as well as those that can be handled in parallel - just decide on how many worker threads can handle each type of request.
    • The queue can also serve as a log of the changes made to the system which can be helpful.
    • This is especially beneficial when the request can take a long time. In this case the client gets an immediate confirmation that the request has been queued up and can check back periodically to see if it's been acted on.
    There are disadvantages to this approach too, such as not getting an immediate response to your request. But it's a trade-off. In return you get more control over how your processes are executed, and you can always poll to see what the result of your request is.
Re: Multi-client approaches
by traveler (Parson) on Jan 31, 2008 at 23:06 UTC
    I built a system similar in design to Joost's. The only siginficant difference was that once the client was assigned to a worker, the "server" got out of the way and let the two communicate directly. In my case each client could have had multiple workers (implemented as virtual machines) all on the same physical machine. The server was perl, the workers were under control of a perl executive, and the clients were a Java/C combination.


      The only siginficant difference was that once the client was assigned to a worker, the "server" got out of the way and let the two communicate directly.
      I never considered doing that for my project, mainly because the jobs were usually (but definitely not always) finished within a fraction of a second - somewhere between 1 and 0.05 seconds is typical - and multiple clients can - and will, if the job queue is long enough - request the same job (which might be running or queued already).

      Putting all the coordination in the server made the clients and the workers very simple. Having all the synchronization in a single-threaded single process made the server relatively simple too.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://665465]
Approved by kyle
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2021-12-01 10:58 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (3 votes). Check out past polls.