http://www.perlmonks.org?node_id=989939

amasidlover has asked for the wisdom of the Perl Monks concerning the following question:

We have a large (77,000 lines inc POD and comments) application written in Perl and our client has asked us to 'make it faster' we already have a number of strategies for doing this and are tackling performance from multiple angles. However, in a meeting last week they showed us some stats of a PHP app they use along side our app and showed it doing 2,000 concurrent requests on the same hardware that our app only manages 200 concurrent requests on.

Obviously comparing two apps doing different jobs is nonsense, but the scale of difference was so dramatic I decided to see what we would need to do to get our app to that kind of performance. Our assumption, based on timings from debugs, is that talking to the DB (Oracle) is the most time consuming part of our app, so I decided to write a quick test script that did an INSERT and SELECT and then call it lots of times to see how fast it will go on their hardware.

So I wrote the test handler (will paste code and set-up details at end) and ran it on our hardware (not the client's yet), it performed badly... As its a bank holiday and I don't have access to test on their hardware I decided to knock together a PHP script for comparison, the results were quite shocking when loaded with 300 concurrent requests the average response time from the mod_perl handler is 1900ms and from the PHP it is 400ms.

So bearing in mind rewriting our app in PHP is not likely to be cost effective, my questions are:

1) Is my test valid?

2) Can I make the mod_perl version faster (and how)?

3) How much faster?

Server details:- The hardware node is a Quad Core Xeon L5630 @ 2.13GHz with 24Gb RAM, the OS for the Apache virtual machine is Gentoo, the OS for Oracle is Centos 5,.

Versions: OSes both updated within last 2 weeks, Apache version 2.2.22, mod_perl version 2.0.4, DBI Version 1.622, DBD::Oracle version 1.50, Oracle instant client version 10.2.0.3, Oracle Database 10g Express Edition Release 10.2.0.1.0, PHP version 5.3

Things I checked: during the testing the only load was from the test app/oracle, neither virtual machine hit any of its bean counter limits, e.g., memory, Oracle showed 1 session per Apache child at all times, inserts had been done after each run.

Apache config - the mod_perl module was not loaded for the PHP tests and PHP not loaded for the perl tests

PerlRequire "/etc/apache2/startup.pl" PerlModule Apache::DBI PerlModule SQLLoad; PerlModule Apache2::Reload PerlInitHandler Apache2::Reload PerlSetVar ReloadAll Off PerlSetVar ReloadModules "SQLLoad" <Location /sqlload> SetHandler perl-script PerlHandler SQLLoad </Location> <IfModule mpm_prefork_module> ServerLimit 2000 StartServers 1 MinSpareServers 1 MaxSpareServers 1 MaxClients 2000 MaxRequestsPerChild 200 </IfModule>

The Perl code

startup.pl: use DBI (); use CGI (); CGI->compile(':all'); DBI->install_driver("Oracle"); SQLLoad.pm: package SQLLoad; use strict; use Apache2::RequestRec (); use Apache2::RequestIO (); use Apache2::Const -compile => qw(:common); use DBI; use CGI; our $dbh; sub handler { my $r = shift; # Connect to DB $dbh = DBI->connect( "DBI:Oracle:host=oracle.ourdomain.com;port=15 +21;sid=XE", "hr", "password" ) unless $dbh; my $err = ''; # Do the selects my $dbi_query_object = $dbh->prepare("SELECT id FROM zz_system_opt +ions"); $dbi_query_object->execute(); my $dbi_insert_object = $dbh->prepare("INSERT INTO zz_system_options (id,option_name +) VALUES (zz_system_optionsids.nextval,?)"); $dbi_insert_object->execute("load testing"); # Print out some info about this... $r->content_type('text/plain'); $r->print("Errors: \n"); return Apache2::Const::OK; }

The PHP code

$cs = "//oracle.ourdomain.com:1521/XE"; $oc = oci_pconnect("hr","password",$cs); if(!$oc) { print oci_error(); } $stid = oci_parse($oc, 'SELECT id FROM zz_system_options WHERE id = 1' +); oci_execute($stid); $stmt = oci_parse($oc, "INSERT INTO zz_system_options (id,option_name) + VALUES (zz_system_optionsids.nextval,'load testing')"); oci_execute($stmt); echo "hello world";

PHP Results

Concurrency Level: 300 Time taken for tests: 13.286 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 2071035 bytes HTML transferred: 110055 bytes Requests per second: 752.69 [#/sec] (mean) Time per request: 398.569 [ms] (mean) Time per request: 1.329 [ms] (mean, across all concurrent reques +ts) Transfer rate: 152.23 [Kbytes/sec] received

Perl Results

Concurrency Level: 300 Time taken for tests: 63.716 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 1541003 bytes HTML transferred: 130923 bytes Requests per second: 156.95 [#/sec] (mean) Time per request: 1911.488 [ms] (mean) Time per request: 6.372 [ms] (mean, across all concurrent reques +ts) Transfer rate: 23.62 [Kbytes/sec] received

I'm happy to answer more questions about the set-up and try any suggestions. I've done some searching myself and what I've found via Google seems to say a) mod_perl should be faster than mod_php and b) PHP has 'native' DB access code so 'can' be faster for DB access - I've got a fairly big problem though if its going to be 5 times worse under load...

Subsequent edit: SOLVED! Had done the PHP test without https on...

Replies are listed 'Best First'.
Re: mod_perl vs php performance when accessing Oracle under load
by moritz (Cardinal) on Aug 27, 2012 at 08:43 UTC

    The first job when optimizing is to find out what takes the most time. I highly recommend Devel::NYTProf, it looks like Devel::NYTProf::Apache is taylored for your use case.

    One difference is that you use different SELECT queries in PHP and in Perl, and never retrieve any values. It could be that PHP simply doesn't execute the query before you try to fetch a result, so to make the comparison useful, run the same query in both scripts and also retrieve the result.

      We've been using Devel::NYTProf::Apache for analysing the main application, so I can have a go at that.

      In the meantime I stripped out everything except the lines:

      my $r = shift; $r->content_type('text/plain'); $r->print("Errors: $err\n"); return Apache2::Const::OK;

      And it was still giving 1700ms per request - suggesting that its not the DB access at all...

      The SELECT difference was a cut and paste error, the scripts on the server do both have the WHERE clause.

        I'm now struggling to get NYTProf to do what I need it seems to be only writing one request's worth of data when actually I'm expecting it to write 300 files all with 10 or so requests in.

        PerlModule Devel::NYTProf::Apache PerlSetEnv NYTPROF file=/tmp/nytprof.out:addpid=1:endatexit=1

        I can't really set MaxClients to 1 as the large response times don't occur without the concurrency.

        The timings that are written out show a total of only just over 100ms even including the DB connect - which I assume I'm getting because its only logging my first request. Excluding the connect its about 30ms of time.

        What I'm struggling with is how to translate one request into concurrent requests (for the purposes of estimating timing) - particularly when I'm seeing requests that with no DB activity show 1700ms average response time and with DB calls in there show 1800ms average response time. It suggests to me that something outside of my handler is taking the majority of the response time up when concurrency gets 'high'

        The detailed results from 'ab' show the following:

        Connection Times (ms) min mean[+/-sd] median max Connect: 48 1223 1633.2 918 14262 Processing: 2 611 368.2 573 1542 Waiting: 2 228 167.0 192 679 Total: 75 1833 1626.0 1705 14860

        So the connect is definitely a big part...

        OOOPS!!! Just been back and double checked the PHP results - turns out the PHP version I'd missed the 's' in the URL i.e. http and not https.

        They now give similar results! Lets see if I can now do something productive with the other half of my day...

Re: mod_perl vs php performance when accessing Oracle under load
by liz (Monsignor) on Aug 27, 2012 at 08:45 UTC
    I would almost bet that is a CPU problem on Perl's side, specifically caused by CGI.pm. Could you check CPU usage during the test, and CPU burned by checking "times" ?

      I've tried taking out CGI entirely as the current version of the script doesn't use it; I get an average of 1800ms

      CPU Usage sits at about 30% during the test. Not sure how I'd go about checking 'times', short of starting apache with 'time' in debug mode - and then I can't do the concurrency tests.

Re: mod_perl vs php performance when accessing Oracle under load
by sundialsvc4 (Abbot) on Aug 27, 2012 at 11:03 UTC

    Even though you say, “Solved!” based on a difference of having or not-having https turned on, obviously there must be something more in play here.   PHP is a fine, well-built system, but it cannot be 10 times faster just on its own coding merits.   SSL should not knock-off that much speed, although it does pay a one-time cost at connect time for key-negotiation.   (Are your benchmarks connecting every single time?   Quite possibly, and that’s not realistic.)

    You should frankly but politely question why your customer is presenting a PHP vs. Perl comparison because it is an apples-to-oranges comparison ... but there must be some motivation for it.   Are they talking right now to a competitor who is promising them the moon?

    You need to continue to pursue the strategy of systematically testing how the application actually spends its time, and what gives rise to the perception of “response time.” as seen by the user.   It is often helpful to attach instrumentation to the back-end portions of the system, i.e. to the objects that it uses to do actual work, using test-rigs that are not associated with the web server processes.

    Many web-sites burn up a lot of time with “set-up / tear-down costs” which they pay every time for every request:   they connect to the database server, then tear that connection down a few milliseconds later.   They prepare dozens or even hundreds of individual queries which the database server has no choice but to parse “as though it had never seen it before,” set up, execute, then tear down.   Since these actions involve a lot of resources and multiple processes and/or computers (hence, fiber optics or patch cables), they not only add-up quickly, but they also degrade non-linearly under load.

    Bear in mind that it really makes no difference whether the application is done in mod_perl or not:   while this does implement “a persistent handler-process,” in this case by stuffing Perl right into the Apache worker-bees, the effective use of persistence is a fundamental design point that has to be engineered into the thing from the start.   Apps are frequently designed, then shoehorned into mod_perl or FastCGI “to make them run faster,” but the design doesn’t really change.   And they do run somewhat faster, especially on the too-lightly loaded developer’s systems, which superficially seems to exonerate the strategy.

Re: mod_perl vs php performance when accessing Oracle under load
by Anonymous Monk on Aug 27, 2012 at 15:22 UTC

    You should be able to gain a small speed boost by using a non-threaded version of Perl, plus, there's always the Advanced Perl DBI talk. DBI's prepare_cached might help for some cases of repeated statements, too.

    Anyway, I do not see anything wrong with your test code as you have laid it out (and was certain that the likely culprit was CGI.pm). I trust that your production code is of similar quality, and hope that the apples-to-oranges comparison by your client was, well, apples and oranges. And not HTTP to HTTPS...

      FWIW, when moving from a threaded perl to an identical unthreaded version of Perl at $work, in practice the difference in performance hardly made it above noise levels (as in less than 1%). This was quite disappointing.

      So, the Perl test-suite may run 15% faster without threads, but your real life application gains may be a lot less. I guess the test-suite just exercises the expensive unthreaded code paths on a threaded perl more than in the real world.