Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Need suggestion on problem to distribute work

by smarthacker67 (Beadle)
on Jun 14, 2020 at 18:47 UTC ( #11118059=perlquestion: print w/replies, xml ) Need Help??

smarthacker67 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I need some suggestions to tackle following problem.

* currently I have a program that loop and deal with the database within the loop and sends some command to the remote server by connecting to it

* Remote server connection and sending request is taking about 10ms due to latency and this gets stackup since I am running in loop.

* This command send to remote server don't need to return any result and its assume that it will be process by server.

* I need to register the data in database and send it to remote server in side while loop.

* this sending and insertion in database logic need to be separated out from parent process

* Eg I want to setup the new process that will continue to listen and once I send it this work it should finish this

* that way my main while loop will be much faster.

I am aware of fork but looking for alternative solution I am not sure what to use can some help me.

investigated and found Coro, AnyEnvent , POE but not sure will they be useful.

I want to separate out the process of sending work to remote server and database insertion from parent program that too being in while loop.

Can someone suggest me what I should use



Thanks in advance :-)

pseudo code
while(1) { pick_batch_of_200(); foreach(1..200) { send_work_to_remote_client($_); ### this I want to f +ork out here and want to get it done by different process or module s +o that while loop can proceed with next batch insert_into_db(@work_sent,@result) #### this I want to f +ork out here and want to get it done by different process or module s +o that while loop can proceed with next batch } } sub send_work_to_remote_client { #client list foreach(1..20) { connect_random_client_and_Send_work(send_work); } } insert_into_db { #get cached connection insert into XYX values( @result); }

Replies are listed 'Best First'.
Re: Need suggestion on problem to distribute work
by Marshall (Canon) on Jun 14, 2020 at 20:15 UTC
    I am trying to understand your situation. Before suggesting any actual code, I'd like to sanity check my understanding. I think that you want to have multiple DB and remote server interactions underway at the same time? A possible scenario is like below.

    The main program queues new work onto a queue that is accessible by multiple worker threads. There will be multiple Worker threads. If a worker thread is not busy and work is available, it accepts new work and processes the DB and remote server work items. If your DB and remote server can handle multiple operations at the same time, this will speed things up.

    Ultimately there will be a maximum throughput. Some sort of throttle will probably be necessary on the main program so that the work queue doesn't grow to an infinite size. I suspect there will be other complications with error handling. But is this general idea what you seek?

    Main Program: there is just one of these while (I don't know) { generate work item push work onto shared work queue } Worker Bee Thread: There will be N of these running in parallel #each worker gets its own connections connect to DB connect to remote server while (pop from work work queue, if queue not empty) { manipulate DB send to remote server }
    Update: Clarification is needed on this point: Remote server connection and sending request is taking about 10ms due to latency Surely you are not connecting and disconnecting for each server request? Connect once, use many. However, 10ms for remote communication overhead doesn't strike me as particularly long. I work with some networks where a simple ping response time takes 60-70ms. BTW, you don't mention DB processing time, but that can be very significant. A DB commit is "expensive" and requires multiple disk operations. Search for "ACID DB". I suspect the DB operation takes longer than the "send to remote server" operation.
      Work queue is also what I suggest. But don't use a database as the queue. Use something like redis' FIFO queue. You could get fancy and make a priority queue using sets, but sounds like you want straightforward, and I agree.

      The producer process puts work on the atomic queue, worker daemons spin and pop off work to do. Sure you could have the worker daemons fork off children to do the work, but as long as you have the atomic queue then you can just have any number of worker daemons checking for work to to do in a loop - so there is no need to get fancy with the worker processes. Redis (and the Perl client) is not the only way to do this, but it's the one I have the most experience with.

      As I stated above, don't use a database to serve the queue. You don't have to use Redis, but do-not use a database (terribly inefficient for this type of middleware).

      If you wish for the worker process to communicate back to the work producer, you can use a private Redis channel specified in the chunk of work. However, if you want real messaging you will be best to go with something built for that, like RabbitMQ or something similar but lighter weight.

      Work can be shoved into the queue by the producer in JSON or some other easily deserialized format; it can include a "private" redis channel or "mailbox" for the worker thread to send a message to the producer or some other listening. You could actually set up a private mailbox scheme so that the initial contact with work on the queue allows the producer and consumer to have any sort of meaningful conversation you wish.

      Also note, the 6.x version of redis supports SSL natively and some level of access controls. I'd use them if going over public internet or crossing any sort of untrusted networks.

        Nowhere did I say to use the DB as the work queue. In Perl there are ways to push an item onto a "thread-safe" array. Likewise threads can get an item off of this array in a thread-safe way. I guess I should have said "shift off of the array" instead of "pop". I would process requests in roughly FIFO order.
        Sir Mq or queue addition will be slow the process further I wish to separate the work out of main queue + want to get it done in as less time as possible. I am looking into POE which will scale itself once we send more work by forking itself. I got more than sufficient hardware resources but need to utilize them now :-)
      All Thanks for your responses.

      When I say a remote server is its a FreeSWITCH server to whom I want to send work but its takes time so I cant send more work unless that work reaches to it. I wish to handle it separately.

      I connect with single database only eg I do 200 insertion + 200 times work sending to remote server due to this next bach has to wait. Wish to fork this out from main loop so that separate thread will take care of this,

      Since its calling related I have X numbers of servers so I need to connect them in round robin to distribute work. I hope this clarify :-)
      Updated the question as well,
Re: Need suggestion on problem to distribute work
by bliako (Monsignor) on Jun 15, 2020 at 23:19 UTC

    Forking every time, in the loop, is expensive (and by the way make sure that you do not fork after you have a huge data structure already in memory because it will be duplicated! sort-of but see System call doesn't work when there is a large amount of data in a hash). And each of these newly created children will have to set, from scratch, a one-off connection to db or random_client which is also expensive.

    Regarding sharing db connections between forked children, see forking and dbi but you need to make sure it is up-to-date. The answer is yes-and-no (at that time). As for sharing sockets (i.e. connections to random_client) between forks, it can be done. But I do not know the cost.

    So instead of spawning ephemeral children, perhaps you must consider a Pool of Workers (queue L'Internationale :) ) (see for example Re^3: thread/fork boss/worker framework recommendation and Implementing Custom ThreadPool) each listening to a separate port where you send them the data, either via shared memory or IPC. In this way each of the workers keeps its own connectivity with db or random_client alive and tighly enclosed in its own space. But you need to create an enormous amount of workers in order to be just 99.99% sure (can't be 100%!) that there will always be a worker free when data arrives so that you do not need to implement queue and throttler in the middle.

    Perhaps a webserver solved your problems already?

Re: Need suggestion on problem to distribute work
by jo37 (Chaplain) on Jun 14, 2020 at 19:41 UTC

    By chance, the "command to the remote server" is a mail to be sent?

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
      No Sir its a work for sipwise/freeswitch server
Re: Need suggestion on problem to distribute work
by Anonymous Monk on Jun 15, 2020 at 01:31 UTC

    Right now you are in the thrails of an "XY Problem." You're describing what you need without addressing how to get there.

    I focus on this: "currently I have a program that loop and deal with the database within the loop and sends some command to the remote server by connecting to it" ...

    So is "the remote server" the database, or something else? We just can't tell.

    Now do this: tell us, first, who your server is and what he is to do. As this process is perceived by his external customers, exactly what is he expected to do?

    Then, tell us clearly about each external agent ... "the database server," maybe "the 'remote server'" ... exactly how he connects to each one. If there is some external party upon which "he" depends to accomplish a particular unit of work, please make that clear.

    Even though you are obviously momentarily-confused as to how to tackle this problem ... we have all been there. Describe it.

      All Thanks for your responses.

      When I say a remote server is its a freeswitch server to whome I want to send work but its takes time so I cant send more work unless that work reaches to it. I wish to handle it separately.

      I connect with single database only eg I do 200 insertion + 200 times work sending to remote server due to this next bach has to wait. Wish to fork this out from main loop so that separate thread will take care of this,

        I am still not understanding your application. I alluded to this before, but running 200 insertions as a single DB transaction can take about the same time as 1 or 2 single transactions. The transaction overhead is HUGE. Doing 200 insertions at once doesn't take much more time than doing just a single insertion.

        I am not sure that forking or threading is your answer. I now suspect that your DB code is flawed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11118059]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2023-10-01 08:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?