Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Apache or Daemon in Perl ?

by szabgab (Priest)
on Jan 18, 2005 at 12:50 UTC ( #423002=perlquestion: print w/replies, xml ) Need Help??

szabgab has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a project - sort of proxy - where lots of devices are sending me mesages via HTTP. (right now a total about 3 messages every sec but it is expected to go up to 100 mesages a sec) After some buffering I have to resend them to a bunch of other devices via HTTP.

I have to make sure - for each sending device separately - that I don't change the order of the message. (I might drop some of the messages if I don't have time to send all of them).

The way it is currently implemented is that I am getting the messages through Apache/mod_perl putting them in a memory table of mysql. In addition I have another child of the Apache that is running in a loop. Fetching all the messages from the memory table and trying to send them one by one. If it reaches its timeout I drop the message and forget about it.

The problem is that I am getting a lot more messages than what I can send out (sending out takes between 0.3-3 sec normally with a 5 sec timeout). Even if I am ocassionally dropping the surplus I always get a buffer built up quickly and the messages wait too much in the queue. For the application a delay of 20-30 sec in forwarding is too late already. Normally I should forward within 10 sec the most.

I need to change the system so that in normal circumstances I drop only messages that I could not send due to reaching the timeout. I should also set the timeout to 15 sec

So I have to have several processes that are emptying the buffer and sendig out messages. The question is how ? Should I generate more "sender" children of the Apache ? Is there some other daemon (written in perl) that would be recommended to use that would suit the task better than Apache ?

Replies are listed 'Best First'.
Re: Apache or Daemon in Perl ?
by Corion (Pope) on Jan 18, 2005 at 13:02 UTC

    I have the following image of your setup:

    collector (mod_perl) -> MySQL -> sender (LWP?)

    If your clients support HTTP/1.1, it might be worthwhile to use the LWP::ConnCache in your sender, so you don't have to reopen the connection to your client(s) on every new message. I don't know if Apache has a built-in method for connecting/sending via sockets - if it has, using that mechanism or fudging it into LWP::ConnCache might speed up the process a lot. You maybe should make the sender into different processes that (all) poll the database for new messages as soon as they've sent the current batch of messages - this will generate a lot of load on the database, but you can easily decouple the reception and sending of the messages that way.

    Other than that, I don't really see much of a better way, as you have already decoupled your system and use a versatile scheduling mechanism, the database... You could try to separate the system further by putting the database and sender onto two different machines, or better on a 4 CPU machine, so you have three CPUs, enough RAM and enough network capability to keep all parts saturated.

      Is that 100 requests/sec sustained or burst?

      I have to agree with Corion's design here.

      collector (mod_perl) -> MySQL -> sender (LWP?)

      Use a mod_perl handler with persistent database connections (ala Apache::DBI) to write the incoming info to a suitably fast database (like MySQL). Have a separate daemon written in Perl (or something compiled if performance dictates) that reads from the DB and sends the responses.

      Make sure the DB stores timestamps for request received and response transmitted. Your sending daemon fetches a 'worklist' by querying for records w/o transmitted timestamp. The table can be archived/summarized/cubed periodically during slow periods. If there are no slow periods, you might need to investigate 'breaking-the-mirror' type solutions to get an offline snapshot for reporting.

      --Solo

      --
      You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
      The sender is some in-house written thing using IO::Socket and I think I cannot keep the sockets open to these devices as they might need that for other things. But that might be a good direction for improvement.
Re: Apache or Daemon in Perl ?
by Mutant (Priest) on Jan 18, 2005 at 13:35 UTC

    I'm not sure your achitecture is exactly correct for the volume of traffic you're trying to deal with. You've gone for a batch approach (ie. put everything in a table, and check periodically), when perhaps a real-time approach (process messages as you receive them) will work better. The downside of a real-time approach, in this case, is that it's likely to be much more complex.

    I've actually implemented something very similar to this. I'm assuming you need to queue messages, but they only need to be queued based on some critieria, ie, messages from two senders can run simultaneously, but messages from the same sender must be queued. (If all messages need to be in the exact same order, then you can't avoid full serialisation, so you're current approach will have to stick).

    What we did was have Apache/mod_perl handle the requests, do some basic error checking, and then pass the request onto a custom written daemon via a socket.

    The daemon has two main process. The first one lists for requests from Apache, and then forks to handle the request. The forked process then does it's thing. However, before it can do it's thing, it asks permission from the second main process - the serializer. The serializer has a hash table of processes currently running for each sender. If there is nothing running for that sender, it gives permission for the sender to go. Otherwise, it tells the sender to wait. You can also handle timeouts in this process as well.

    The code for this is actually GPL'ed. I'm not sure if it'll be of any use, but you can have a look here (you want the fe-replication executable).

Re: Apache or Daemon in Perl ?
by BrowserUk (Pope) on Jan 18, 2005 at 13:41 UTC

    Sounds like an ideal application of threads.

    One or more receiver threads receive and enqueue the message + timestamp in memory. One or more sender threads, read the queue, and check the timestamp, drop any passed their sellby and forward them whereever.

    You can overlap the network delays of both receiving and sending, and tailor the number of senders and recievers as the project evolve using command line parameters.

    One process, with say 3 receivers and 6 senders and no middleman DB could probably saturate 1-mb bandwidth with maybe 15% cpu on a 2GHz machine.

    If you have higher bandwaidth available, change 2 numbers and you match the requirements.

    It's simple, flexible and very scalable.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Apache or Daemon in Perl ?
by jimrobertsiii (Scribe) on Jan 18, 2005 at 14:41 UTC
    HTTP is *not* the correct solution for this a architecture where occasional dropped messages are tolerable.

    If the decision is yours you should consider using UDP. The problem with HTTP (TCP) in this situation is that it is inherently slow due to the handshaking in TCP. UDP has no such handshaking.

    Like I said, if you have the option of changing the protocol that you use...

    -Jim

      He also said that the messages must be processed in the same order, and UDP does not guarentee that. To be done with UDP, you would need the application-layer protocol to check some ordering number in the message.

      There are some other transport-layer protocols that might work, but TCP is probably good enough. The handshaking isn't much overhead for modern servers. The bottleneck for this application is likely the database, not the network connectivity.

      "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

        I'm curious ... Let's say, using UDP, message 1 comes in, then message 3, then message 5, and then message 2, and message 4 got dropped. How would you check ordering here? You process 1 immediately, but wait for a few messages to come in, then get message 2, process 2 and 3, and then ... when do you process 5? Or would you be proposing that with UDP, you would only process the messages that already came back in order, dropping all that were delayed out of order?

      the choice is not mine but actually in our case dropping a message is tolerable if the receving device does not answer in reasonable time.
      Just for curiosity: in what kind of situations do you tolerate that messages can be lost?

        In our situation the devices are actually moving objects. They ask for their location. If I cannot send an answer in a timely manner I can as well drop the request or the response I am trying to forward. It has no value any more.

        Besides, above my head sits and application that will resend its messages after about 1 minute if it did not get a response earlier.

Re: Apache or Daemon in Perl ?
by ank (Scribe) on Jan 18, 2005 at 15:10 UTC
    The way you have chosen to do it so far looks a bit convoluted.
    Check out IO::Socket::INET - reimplementing it as a simple pre-forking TCP server could be the answer.
    Remember to set ReusePort / ReuseAddr while testing :)
Re: Apache or Daemon in Perl ?
by perrin (Chancellor) on Jan 19, 2005 at 00:35 UTC
    Here's a fairly simple technique to do what you want with only minor modifications to your mod_perl code: create a mod_perl cleanup handler that will check for any unsent messages in the database and send them. Make it loop until there are none left. This will mean that whenever messages come in, all of the processes that took the messages will stick around to send messages until the queue is cleaned out. The only danger of this approach is that you could end up tying up all of your processes sending messages and leave none to receive. To get around this, you could keep track of how many processes are in "send mode" at any given time, and have new processes skip the sending step if too many are already doing it. That may be unnecessary though.

    The advantage of using apache (via mod_perl) as your daemon is that it's fast, stable, secure, handles SSL, has flexible logging, speaks HTTP, etc. Don't switch to something else without a really good reason.

    If you need really great performance (it doesn't sound like you do), you would probably want to use some kind of single process non-blocking I/O approach, since most of the work is just waiting for I/O to complete. You could look at POE, although I'm not sure how well it performs under load.

Re: Apache or Daemon in Perl ?
by elwarren (Curate) on Jan 18, 2005 at 20:40 UTC
    Have you looked at doing anything to help speed up the reception on the receiving device? Shortening that tcp handshake over a (possibly?) slow link could save you valuable time as well. Maybe using ip addresses and hard coded /etc/hosts to eliminate reverse-name lookups? Just a thought.

    BTW: the gpsd daemon and friends server use udp for this very reason.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://423002]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2019-07-15 18:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?