Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Develop a forward HTTP proxy server

by josef (Acolyte)
on Nov 04, 2013 at 16:27 UTC ( #1061146=perlquestion: print w/ replies, xml ) Need Help??
josef has asked for the wisdom of the Perl Monks concerning the following question:

Hello PerlMonks,

I find very interesting to develop a flexible and powerful forward HTTP proxy server in Perl. I imagine that is possible to write a HTTP proxy server to support 400-1000 requests/second.
For example the very used proxy Squid use one non-blocking socket with a big select() poll. For my intention I tend to use only pre-forking or polling (select, epoll, kqueue), due to easy development. Iím not clear which technique is better suited. Which CPAN modules are suitable? Which performance can I expect? Do I need a blocking or a non-blocking socket? How scale?

I have tested some modules: forks, AnyEvent, EV, IO::Socket, IO::Async, IO::Select, IO::Poll, HTTP::Daemon, HTTP:Proxy, Net::Server, LWP::UserAgent, Furl and other, but I need a direction.

Google canít help my much; one running proxy server using pre-forking was published by Randal L. Schwart (http://www.stonehenge.com/merlyn/WebTechniques/col34.html) 15 years ago.

Best regards

Josef

Comment on Develop a forward HTTP proxy server
Replies are listed 'Best First'.
Re: Develop a forward proxy server
by oiskuu (Friar) on Nov 04, 2013 at 18:19 UTC
    Not that I've used either, but there is a perl-based reverse proxy called Perlbal using Danga::Socket.
    The latter (Danga::Socket) is epoll/kqueue backed and ought to scale well.
Re: Develop a forward HTTP proxy server
by vsespb (Hermit) on Nov 09, 2013 at 21:54 UTC

    I've got 250 req/s with HTTP::Daemon + fork() + LWP::UserAgent + my own code and logic (not small) + sharing data between workers and master process via sockets on _each_ request. On modern desktop hardware. I think I was testing with tiny requests on localhost. And I had around 100 worker processes

    Note that most slower part here is LWP::UserAgent (for doing outgoing requests to upstream proxy).

    So I think 400 is possible if you replace LWP::UserAgent with something like .*Curl.*.

    Not sure about 1000.

      Exactly, LWP::UserAgent for a proxy server is to slow.
      My benchmark with some user agents (LWP::UserAgent, LWP::Simple, HTTP::Lite, WWW::Curl::Easy and Furl):
      Test 1: run on localhost, ideal case for no network latency, no cache URI: http://localhost, GET method Compare COUNT: 10000 Benchmark: timing 10000 iterations of Furl, curl, lite, lwp, lwp_simpl +e... Furl: 5.90996 wallclock secs (1.92 usr + 0.62 sys = 2.55 CPU) @3926/s curl: 6.10753 wallclock secs (1.16 usr + 1.70 sys = 2.87 CPU) @3487/s lite: 5.68429 wallclock secs (2.53 usr + 0.55 sys = 3.09 CPU) @3240/s lwp: 16.2804 wallclock secs (12.20 usr + 0.50 sys = 12.70 CPU) @787/s lwp_simple: 16.7128 wallclock secs(12.92 usr + 0.54 sys = 13.46 CPU) @ + 742/s Rate lwp_simple lwp lite curl Furl lwp_simple 743/s -- -6% -77% -79% -81% lwp 788/s 6% -- -76% -77% -80% lite 3241/s 336% 311% -- -7% -17% curl 3488/s 369% 343% 8% -- -11% Furl 3926/s 429% 398% 21% 13% --
      Test 2: run on internet, real case with network latency, cache active +on remote server URI: http://google.de, GET method Compare COUNT: 1000 Benchmark: timing 1000 iterations of Furl, curl, lite, lwp, lwp_simple +... Furl: 99.9638 wallclock secs (0.75 usr + 0.52 sys = 1.27 CPU) @790/s curl: 42.9186 wallclock secs (0.24 usr + 0.39 sys = 0.63 CPU) @1580/s lite: 14.2738 wallclock secs (0.48 usr + 0.24 sys = 0.72 CPU) @1391/s lwp: 102.792 wallclock secs (3.88 usr + 0.44 sys = 4.31 CPU) @231/s lwp_simple: 432.2 wallclock secs (3.73 usr + 0.48 sys = 4.22 CPU) @237 +/s Rate lwp lwp_simple Furl lite curl lwp 232/s -- -2% -71% -83% -85% lwp_simple 237/s 2% -- -70% -83% -85% Furl 790/s 241% 233% -- -43% -50% lite 1391/s 500% 487% 76% -- -12% curl 1580/s 581% 567% 100% 14% --

      LWP::UserAgent and LWP::Simple use the same ground, hence identically performance.
      Furl, Curl and HTTP::Lite achieve good performance for requests on localhost (test 1).
      On real live (test 2), all agents note a performance decrease at least %50:
      LWL -70%, Furl -80%, HTTP::Lite -58% and Curl -55%
      Furl is the real drama, but still is better as LWP. I thing HTTP::Lite and Curl make a very good figure and must earn more recognition, especially HTTP::Lite.
      Note: The test use only the GET method, perhaps on POST the image is different.

      Josef

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1061146]
Approved by keszler
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (13)
As of 2015-07-31 12:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (277 votes), past polls