Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

HTTP server guidance

by DaveH (Monk)
on Nov 25, 2003 at 17:46 UTC ( #309980=perlquestion: print w/replies, xml ) Need Help??

DaveH has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone.

I'm in the process of designing (and eventually building) an "interface" HTTP service. The idea is that this is a HTTP front-end to a traditional socket-server application. Hopefully the ASCII art below will illustrate what I'm trying to achieve.

,--------------.  ,---------------.  
| HTTP Request |  | HTTP Response | A
`--------------'  `---------------'  
        |               ^            
        |               |           B
        v               |            
,---------------------------------.  
|          Web Server             | C
`----- | --------------- ^ -------'  
       |                 |          D
,----- v --------------- | -------.  
|    Pool of socket connections   | E
`---------------------------------'  
      |  |  |       ^  ^  ^          
      |  |  |       |  |  |         F
      v  v  v       |  |  |          
,---------------------------------.  
| Multi-threaded backend system   | G
`---------------------------------'  

The point of this exercise is that I need to provide a HTTP front-end to the backend system (G). This system has a socket interface (e.g. listens on port 4567), and is designed to have one, long running, request-response stream (however it is multi-threaded, and can support multiple connections). The connection to this system (F) is a proprietary TCP/IP protocol. I need to provide a standard HTTP interface (B) to this application.

The crux of my query is this: how would you recommend implementing steps C, D and E? There are no really complicated translations needed. All messages are ASCII (text/plain), and I don't need to wrap HTML around or anything else. It is really just a translation from one protocol to another (with a very thin layer of logic to rewrite message headers).

The options I have thought of so far are as follows:

  • Implement C, D and E myself (using HTTP::Daemon (C) and LWP::Socket (E)).
  • Use either Apache or IIS to handle the HTTP stuff (C), and write a CGI script (C, D & E) to do the requests to the backend system. Will involve establishing then tearing down a backend TCP/IP connection (F) for each request...
  • Use Apache or IIS for HTTP (C), but use some sort of Inter-Process Communication (D) to another daemon (E) for all the proprietary socket stuff.

This is a common enough thing to want to implement, I'm just wondering if anyone can share any "gotchas" or suggestions about which way to go. Links to other people's implementations are just as welcome as any of your own personal experiences.

Thank you for your time. It is much appreciated. :-)

Cheers,

-- Dave :-)


$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print

Replies are listed 'Best First'.
Re: HTTP server guidance
by perrin (Chancellor) on Nov 25, 2003 at 17:55 UTC
    Using Apach or IIS will mean that you could hit a different process every time. It is possible to keep a socket open with mod_perl or PerlEx, but that will only work if the proprietary server doesn't have any rules about flow, i.e. you may get one conversation bouncing between different connections on each request.

    If that's a problem, you can use the technique that merlyn discussed in one of his columns using HTTP::Daemon to keep a separate process alive and connected to the backend server for each HTTP client. This will not scale very well, but is ideal for a small load.

Re: HTTP server guidance
by l3nz (Friar) on Nov 25, 2003 at 19:38 UTC
    I think it would be hard to find an ideal solution from what you say, but I believe the most important thing here is the expected load and reliability of the solution. I'm writing a few ideas here:
    • If your web server C must serve other things apart from those requests (static web pages, images...) or is exposed to the internet, I'd go for a production web server and would not invent a kludge for myself.
    • If you can afford the cost of a separate connection F via TCP/IP, a CGI on a webserver will likely get the work done with the minimum expense
    • If you cannot afford the cost of a separate F for each request, either you find a way to share a global connector object (I'd do it trivially in Java using the Application container, not sure how to do it with Perl) or you find a way to IPC to something else that will handle the connections. The point is that before going this way you have to prove that you can do IPC + marshalling + connection turnaround in less time than a separate F.
    • If a single request B translates to more than one F, you can create a new channel for each B and reuse it for all F's therefore spreading the setup cost.
    • If you can have C and G on the same machine or using a very low latency hardware, you can probably go with separate TCP-IP connections, otherwise if G is on a data center on the other side of the world and you cannot help latency you'll have to think more about this part
Re: HTTP server guidance
by sgifford (Prior) on Nov 25, 2003 at 20:12 UTC

    Your problem is very similar to the problems with a Web frontend to the IMAP protocol; it is designed to be stateful and let users keep connections, which conflicts with HTTP's stateless model.

    A solution that some IMAP users use is a connection-caching IMAP proxy. It keeps a large cache of connections open, each identified somehow (with an ID or by username/password). When the Web app gets a request, it opens a new connection to the IMAP proxy. The proxy looks in its connection cache for the connection. If it finds it, it uses the cached connection; otherwise it opens a new one. Presumably the proxy will time out connections after a period of time, or else keep a maximum number of sessions open then close the least recently used one.

    You could do this by adding a SESSION command to your protocol, which the proxy could use to re-attach users to sessions. The session number could be kept as a cookie, form variable, session variable, etc.

Re: HTTP server guidance
by hardburn (Abbot) on Nov 25, 2003 at 17:55 UTC

    If can't be easily done by a CGI/mod_perl program, then you should probably use SOAP::Lite instead.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: HTTP server guidance
by Abigail-II (Bishop) on Nov 25, 2003 at 19:00 UTC
    I don't think there's enough information to give a really useful answer (too many open ends), but I'd look into mod_perl, SOAP, LWP, and even into non-Perl solutions like libwww (C) or Tomcat (Java).

    Abigail

      I deliberately kept the detail down to reduce the size of the initial question. However I'm more than happy to give more details if people are interested.

      The situation is as follows:

      We have a backend application which listens on a particular port (not that it matters, but this is configurable). The message format is simple. Messages are supposed to each have a unique message number (but this is determined by the client). The message itself is a pipe-delimited format, which consists of a message header and one or more segments of data. There are different types of traffic in this protocol, for example, information (handshakes), keep-alives and data packets (of pipe delimited messages).

      All I'm interested in is providing a HTTP interface to this protocol. The HTTP client is a fixed implementation (cannot be rewritten easily, it is legacy code), as is the backend system (which is closed source). Suggestions such as SOAP would be excellent, but don't fit well unfortunately. We basically need a "request-response" HTTP server which talks (somehow) to the backend system. The backend system works in exactly the same way ("request-response"), it just uses a different protocol. There is really no concept of "sessions" or logging in, so it is well suited to presentation over HTTP.

      We have limited time (as always), so I've managed to persuade my superiors that Perl should be the development language of choice (I know a number of languages, but Perl is my strongest). Despite being a fairly simple program, it will be revenue generating, so needs to be pretty stable and robust. The volume of transactions will be quite high, so something which forks off a new server for each request would pretty soon die I think.

      What I was really asking for were general approaches, perhaps pointers to more information (which everyone has already kindly given). The one approach I liked is using mod_perl, and maintaining a persistent connection to the backend system (a-la Apache::DBI). Besides reading the Apache::DBI source (which I might do), is there a general approach for doing this mod_perl "once-at-startup" connection pooling?

      There are numerous approaches (e.g. IPC with shared directory structures, databases, CGI scripts), but I was interested in other people's slant on this problem.

      Thanks again for everyone's time.

      Cheers,

      -- Dave :-)


      $q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print

        One other suggestion in addition to the others mentioned so far - POE.

        I've used POE to provide an HTTP API to a legacy protocol with persistent, stateful connections. It worked well for me. I actually expected to have to rewrite it in C once it had been prototyped, but it coped with the load well. Might be worth investigating.

        We had a POE process that sat their taking HTTP requests from a mod_perl server that was handing the application logic, and mapped that onto a pool of connections to the backend server.

        As for the HTTP API - you might want to consider trying to twist your stateful protocol into something in a more RESTful style. Then you'll be able to deal with scaling issues with more traditional web based caching/proxies.

        For example, in my project some of the information passed over protocol was only stateful (and slow) because of it's history a something that ran over a serial line. Once we got the data out we could map it onto a stateless set of URL referenced documents that could then be proxied and cached - reducing the load on the POE backend server considerably and allowing it to spend most of its time on the data that had to have state.

Re: HTTP server guidance
by Anonymous Monk on Nov 26, 2003 at 05:11 UTC

    I worked on a very similar project about two years ago. The problem was to build a middleware layer to translate the data format of several legacy systems the company had built over 10 years to a common format. We debated many solutions including SOAP, but settled on XML over HTTP. We used Apache and mod_perl. The project was a success and the middleware server had no problems keeping up with demand.

    The first module was the hardest (a few weeks of programming) then with a framework in place everything went fast.

    The mod_perl is propietary but one thing that really helped us was a good testing program. We wrote HTTPtest so that we could run regression tests against the system. And because it was soooooo boring reading the same output over and over again on a browser. HTTPtest was released to public. It may prove usefull to you as well.

    http://www.anomaly.org/wade/projects/httptest/index.html

    Good luck,
    --Burke
Re: HTTP server guidance
by Beechbone (Friar) on Nov 27, 2003 at 02:53 UTC
    To me this sounds almost the same as a database connection. If it was one, I'd recommend mod_perl and Apache::DBI. As it is none, there a few points to think about:

    Has the backend a problem with idle connections?

    Has the backend a problem with many connections?

    Is there any kind of state in the backend connection?

    If you can answer no to all these questions, then use mod_perl and just hold the connection in a global variable. It will stay there, one per webserver child, as long as the child lives.

    # Pseudocode our $conn; sub handler { my $r = shift; $conn ||= connect(); send($conn, 'abc'); print recv($conn); return OK; }
    And, configure the maximum, minimum idle and maximum idle number of childs to some values that match your setup...

    Search, Ask, Know

      Hi Beechbone,

      You hit the nail on the head! The answer to all those questions is a resounding "no", so I think I'm going to go for some variation of your solution.

      The only problem might be "idle connections" (ideally, the protocol requires that keep-alive packets are sent on quiet lines), but I could easily solve this with a cron job which polls the server every minute for some 'no-op' request. Does anyone have a more elegant solution?

      Anyway, my thanks to everyone for your contributions. All responses were very gratefully received.

      Cheers,

      -- Dave :-)


      $q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print
        The cron won't help. The problem is that the connections are spread into the webserver childs, and when an http request comes in (from a client or the cron job) it is handled by a (more or less) random child. No way to make sure every child gets an request every 5 minutes.

        What happens if the connection times out? The server closes it and is happy, or does the backend server have any problems with it? And, is there any kind of "ping" message the perl wrapper could send to test if the connection is still alive?

        I would just let the connections time out and don't care. If a perl wrapper finds the connection dead, it just silently opens a new one.

        # $conn ||= connect(); if (not $conn or is_dead($conn) { $conn = connect(); } my $status = send($conn, 'abc'); if ($status == CONN_DEAD) { $conn = connect(); $status = send($conn, 'abc'); }

        Search, Ask, Know

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://309980]
Approved by calin
Front-paged by sgifford
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2023-03-23 17:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which type of climate do you prefer to live in?






    Results (60 votes). Check out past polls.

    Notices?