Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

More than mod_cgi less than mod_perl.

by techcode (Hermit)
on Jun 07, 2005 at 10:16 UTC ( #464209=perlquestion: print w/ replies, xml ) Need Help??
techcode has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks.

I've been thinking about this for a while - and after some time I posted it on devshed.com Perl forum. Got some nice responses and a hint that I should post it here (after I refine it a little bit). So here I am.

Currently to create web applications with Perl, you have (mostly used) two options : mod_perl and mod_cgi

1. mod_perl - great, fast and beside requiring from you to be a good programmer isn't widely available (security problems mostly - some people point out higher memory usage). So you need some form of dedicated hosting (virtual private host an option too) to be able to run it.

2. mod_cgi - It has served us for years but it's getting quite old and what most people care for - it's slow.

[There are other things like Fast CGI, Apache::PerlRun and things like that, but they are even less available than mod_perl. Or run under mod_perl, so it's basicly same thing.]


What I thought that it would be a good thing is that we should get something between these two. Something that would be faster than mod_cgi, yet not as "complicated" as mod_perl. As you may know : "Sometimes, less is more!".

Ideally it would :

- intended for plain people, regular John Doe's who doesn't have money or knowledge to get an dedicated hosting.

- be able to run unaltered Perl/CGI code, even if it's bad style ...

- be part of Apache distribution. Actually it should replace mod_cgi if it can comply with previous

- be faster than mod_cgi by using some of the techniques of mod_perl, not necessarily as fast as mod_perl

That's about it. If anyone has something to add, I would like to hear it.

My idea :

- Embed Perl "inside" Apache so that it doesn't need to be reloaded on each request. It should help with the speed.

- Provide clean memory on each request, so that dirty CGI scripts would also run.

- Maybe cache things like precompiled application and such things. But then again it shouldn't cache too much things because then memory usage would rise.

My first thought when I saw Apache::PerlRun is to simply use it. But as it turns out, it's not so great. I mean, if it were so great, then Apache would be configured in such a way. That PerlRun would run all CGI scripts ...

I also wondered how PHP (I'm not sure if PHP under Apache runs only under mod_php or if there is another way) is doing it (faster than CGI - almost as fast as mod_perl but uses less memory)

Any ideas and/or comments are welcome.

PS. Please don't reply with comments about how mod_perl is great. I know! I know that it can do much more things than PHP - like Apache handlers and other things. Many replied with post like that on devshed, waisting their time to tell me something I already know. Just think that something faster than CGI, yet not as powerful as mod_perl is needed - something for the masses to use.

Comment on More than mod_cgi less than mod_perl.
Re: More than mod_cgi less than mod_perl.
by CountZero (Bishop) on Jun 07, 2005 at 10:42 UTC
    Provide clean memory on each request, so that dirty CGI scripts would also run
    I think that is just what makes mod_perl so fast and mod_cgi so slow. Because you don't have to reload/re-compile the scripts every request, you get the speed-up. I do not know of a way to keep the compiled code around (other than turning it into an *.exe file) after having it run but still provide a clean slate of variables and state each time it runs again. It's an interesting idea though and perhaps worthwhile of further research.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: More than mod_cgi less than mod_perl.
by Gilimanjaro (Hermit) on Jun 07, 2005 at 12:08 UTC

    An embedded perl interpreter cannot replace mod_cgi, as mod_cgi is not limited to running perl-scripts; you can run any executable using mod_cgi. Could be compiled C binaries, or a shell script, as long as it 'talks' CGI.

    The two main ways to make scripts faster, are to keep the perl interpreter in memory, and to keep the compiled scripts in memory. In fact, I don't see how these two could ever be separated; the perl interpreter also maintains the variable space, and as far as I know, there is no way to 'reset' it. There are many ways to alter the symbol table, and no guaranteed way to track these alterations. Because of this, there is no way to provide each request with 'clean memory' without reloading the perl interpreter.

    Mod_perl already tries to solve these issues by creating a separate package for each cgi-script, thereby at least separating the variable space for the scripts. But it's trivial to manually access variables outside of this package. And it's not trivial to 'reset' the used variables after each request...

    The nature of perl and it's interpreter make it very hard to establish what you propose. It's possible for the most part, but would require all cgi-scripts to 'play nice', and any script that doesn't can cause big problems. And harnassing memory usage is always tricky with perl; you have to know quite a bit about the perl internals to understand where it's all going...

    PHP usage of variable-space was designed for this purpose, which is why it is capable of so much.

    A possible way to do what you want may be to have a 'cgi-server' wrapped around all cgi-scripts, and have Apache pass all request to this other process. This would only require ProxyPass'ing from Apache, and a simple server-wrapper around the cgi scripts. In fact, I can't imagine such a wrapper doesn't exist yet. This would require the server to run this extra process though, and would possibly create extra maintenance problems.

    A solution sometimes used is to use a separate mod_perl enabled Apache instance with a limited number of threads to handle the script request, and have a 'big' non-mod_perl Apache as the front-end. This makes use of existing tools, but limits the amount of problems that can arise from using mod_perl.

      A possible way to do what you want may be to have a 'cgi-server' wrapped around all cgi-scripts, and have Apache pass all request to this other process. This would only require ProxyPass'ing from Apache, and a simple server-wrapper around the cgi scripts. In fact, I can't imagine such a wrapper doesn't exist yet.

      It does exist. Look at my response below about PersistentPerl.

      Fast CGI is another option (like Joost mentions), but I believe that you have to change your code for it to work. Although it is probably trivial to do so.

        A little OT, but PersistentPerl does not support Win32 I believe. I also couldn't use Fast CGI on IIS. And all two seems to be dead: FastCGI last release: 2003 PersistentPerl last release: 2003
      OK. My bad. I tougth that it's clear that I'm trying to improve only Perl's performance. Of course if other languages benefit - it's even better.

      And how about adding functionality to perl, so that it cleans up memory before next run, on itself. Correct me if I'm wrong, but such function must be in perl allready and is curently called before it exits. Altho would probably need minor modifications. Not counting

        The memory cleanup on perl's exit, is a whole other type of cleanup I'm afraid. Perl allocates a bunch of memory from the OS, but the problem is in how it uses this memory itself. The memory is returned to the OS at perl exit offcourse.

        Doing the same kind of cleanup without exiting perl would probably not be a trivial change to perl. Just an example of a problem you'd encounter would be how to handle the special global variables that are linked to the perl parameters from the shebang (#!) line...

        A bigger problem though, would be that cleaning up all of the memory would mean all modules would have to be reloaded too, and the module loading and initialisation is often even slower then the perl interpreter loading...

        As a perl module can change variables in *any* namespace, so we can't just clear the part of the namespace it's using, and keep it loaded. There is just no proper way to 'reset' a running perl to initial state. The 'flow' of compilation can change, because of the existence of stuff like BEGIN blocks, so the line between compilation time and runtime can get blurry. This makes resetting perls internal state even harder...

        It's for these reasons that mod_perl has the complexity is has, and that the programmer needs to be aware of the persistent behaviour some variables may exhibit. There's just no way around it I'm afraid... Not without removing a ton of functionality from perl itself, which in turn would break half the modules out there that people are using.

        It's a hard subject to explain without getting into deep technical detail, but I hope I've hinted in the right direction...

Re: More than mod_cgi less than mod_perl.
by Mutant (Priest) on Jun 07, 2005 at 12:22 UTC
    I'm pretty sure what you want *is* Apache::PerlRun. You don't give any clear reasons why it doesn't do what you want. AFAIK, this is pretty much how PHP achieves it's faster processing that vanilla CGI, it only loads one instance per webserver. (Of course, a lot of PHP configurations actually run in vanilla CGI mode, so they don't get this performance benefit).

    As you point out, to gain a further increase in speed requires pre-compiling code. This will only work properly if the code is written properly (primarily that it deals with the potential pitfalls of persistent variables).

    One option between Apache::PerlRun and full mod_perl is Apache::Registry, although this may require some modifications to existing CGI scripts (but not quite as much as migrating to a mod_perl handler).

    If you're really concerned with performance, then you're always going to struggle unless you have flexibility around your webserver's configuration. For the 'masses', they probably could/should be using Apache::PerlRun. But even if they're not, I doubt the performance of their applications is actually an issue.
Re: More than mod_cgi less than mod_perl.
by cees (Curate) on Jun 07, 2005 at 12:53 UTC

    What you are looking for is PersistentPerl. This will keep the perl interperter in memory, and keep your perl code compiled. It provides almost all of the speed benefits that mod_perl offers without needing to be integrated directly into Apache.

    There is also a mod_persistentperl module for Apache (and Apache2) to avoid the need to fork a process for each request (which is how CGI works). This means the only extra overhead over mod_perl is a socket connection to the PersistentPerl daemon for your script.

    Other benefits of PersistentPerl are that you can run multiple scripts under the same interpreter, and you can use the suexec features of Apache and run your code under a different username (unlike mod_perl).

Re: More than mod_cgi less than mod_perl.
by Joost (Canon) on Jun 07, 2005 at 12:59 UTC
    My first thought when I saw Apache::PerlRun is to simply use it. But as it turns out, it's not so great. I mean, if it were so great, then Apache would be configured in such a way. That PerlRun would run all CGI scripts ...

    Apache has many ways to run perl scripts, all of them have their advantages and disadvantages. If you install mod_perl, Apache::Perlrun is installed by default. The fact that you still need to configure it doesn't mean that it's a bad option.

    Also, you seem to want to embed perl in apache (that is mod_perl), but you don't want to use mod_perl (for some reason). You want to clean out all the variables and code, but you still want caching of some code, you want to use PerlRun, but you think it's bad, because it's not the default configuration.

    There is no replacement for CGI if you want to run every script unmodified. If you want more speed, your programs need to take into account that they can be run more than once (i.e. that not everything is reloaded for each request).

    Choose one: Apache::PerlRun (for "dirty" scripts), Apache::Registry (for clean CGI scripts - in my experience, almost all well-written CGI scripts will work with Apache::Registry) or fast-cgi (language/webserver agnostic).

Re: More than mod_cgi less than mod_perl.
by derby (Abbot) on Jun 07, 2005 at 13:29 UTC

    2. mod_cgi - It has served us for years but it's getting quite old and what most people care for - it's slow.

    Well so am I ... but seriously, I think the slow argument is not as valid as it once was. Copy-on-write has made fork on modern *nixes very efficient. So before you start calling mod_cgi slow, take a look at your *own* code and ensure it's not slow.

    mod_cgi has a lot going for it *because* it's old - it's well understood, it's universal, it's cleaner, any script/executable can be run under it (I wouldn't be suprised to see java progs under mod_cgi as the JVM start-up times continue to improve).

    -derby
      Well actualy I dont think my code is that bad/slow :)

      I'll try to find exact code that recently made problems to one of my "customers". It's some banner rotation script. And he puted ~ 8 banners on the pages (wich means on each page request, you get extra 8 calls of that CGI/Perl script). And his server went down.

      He belives that particular script made problems as once he removed it, server was faster, and didnt went down since. Nothing else was added/removed so ...

      Of course that doesnt mean it's scripts fault for shure, nor that he couldnt have done it some other way that wouldnt call it 8 times per each page.


      So where can I find a way to test the speed of the script? I mean, how many request per second can it process, or something like that? Someone wrote that forking is speeded up these days. So let's test it out. Maybe it's only bad code ...

      PS. The funiest thing about that script, is that it's creator - said he should normaly only place 2 - 3 banners per page :)
Re: More than mod_cgi less than mod_perl.
by kiat (Vicar) on Jun 07, 2005 at 16:19 UTC
    Great post, techcode!

    I'm interested too to find what intermediate solutions there're between CGI on the one hand and mod_perl on the other.

    mod_perl is not an option to me because I'm using a shared server. From what I've read, it also appears to be quite complicated.

    Something in-between would be nice.

    Btw, this post might interest you: Perl cgi without mod_perl, your experience.

Re: More than mod_cgi less than mod_perl.
by fizbin (Chaplain) on Jun 07, 2005 at 18:26 UTC
    As with everyone else, I can't seem to figure out what you want. It appears that what you want is a way to host perl-based web applications that are:
    1. Faster than the standard cgi-bin interface, possibly by eliminating the extra fork per request.
    2. More functional than the standard cgi-bin interface, though this seems to be a minor concern
    3. Easier to use than mod_perl; that is, without the standard mod_perl gotchas
    4. More available than fastcgi or mod_perl. That is, easier for the hosting provider to offer, so that more hosting providers will. Preferably, those providers that don't think about it should be offering whatever this new service is "by default". Certainly, it must be possible to set this up without allowing a clueless script to completely trash the webserver instance for everyone. (aka, "suexec would be nice")

    If this is what you want, I think that the last requirement there is going to be the kicker.

    First off, any module that requires the perl source code and/or libperl to build all its parts is never going to be in the default apache configuration. This includes Apache::PerlRun, since that depends on mod_perl. It also includes PersistentPerl, mentioned elsewhere, since that requires building a separate executeable linked to libperl.

    Instead, may I ask: what's wrong with fastcgi?

    In the past, fastcgi was almost killed by licensing issues - I think the original developers were for a while wanting you to pay for access to the module - but that's not the case any longer. mod_fastcgi is open source, and there are fastcgi implementations for most webservers out there. (the webserver world doesn't begin and end with Apache) If your webhost isn't offering fastcgi, switch to one that does, and let your old host know why you're switching.

    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
      You summed it up very good. And I agree, especialy with the 4th.

      First off, any module that requires the perl source code and/or libperl to build all its parts is never going to be in the default apache configuration.
      The way I was thinking is exactly that.

      That's why I said that basicly there is no other thing than mod_cgi and mod_perl. And when I checked the FastCGI it did required you to change the code of the scripts that previosly runed under mod_cgi.

      Anyway, since I got you all interested in this, the reason why I started even thinking about all this is that many people have a (wrong) belif that Perl is slow... Well, slower than PHP.

      Of course Perl is not slower than PHP, and thing that makes it slow is CGI itself.

      Other thing they say about Perl is that it's hard to set up the scripts (set the right path and chmod the files).

      So I started thinking how this could be improved. And mentioned Apache as the most used web server (also has mod_perl) - of course if solution could be aplied to anyother web server, even better.

      I just want to see more people using Perl on the web (for start).

        CGI itself is not that slow, but the overhead of loading the perl interpreter and modules is.

        You may want to look at Juerd's PLP and use it to demonstrate to people that Perl kicks PHP's ass... As a sidenote, it only performs that well under mod_perl... ;)

        I agree that Perl is starting to gain a bad reputation on the web. mod_php is safe to use on shared hosting and mod_perl is not, which leads some people (even fairly experienced developers) to believe that PHP is just naturally faster. I've heard that mod_perl2 solves this with its perchild MPM, which (I think) gives each script its own namespace and runs it as its owner, but I don't expect speedy adoption.
Re: More than mod_cgi less than mod_perl.
by blahblah (Friar) on Jun 08, 2005 at 05:41 UTC
    PersistentPerl (aka SpeedyCGI) does not run on perls compiled with threads. Other than that, it works great. I run my OpenWebMail on it everyday.

    2c
Re: More than mod_cgi less than mod_perl.
by jdhedden (Deacon) on Jun 08, 2005 at 13:10 UTC
    Compiling CGIs with the Perl bytecode compiler (B::Bytecode) would at least eliminate the compilation stage with each Perl CGI execution.

    Then, if you are able to make use of PersistentPerl to keep the Perl interpreter around after each execution, you've done everything you can short of going the mod_perl route.

      Last time when I checked the B::Bytecode. Which was about few months ago, only thing that I was able to run under it was simple Hello World. Anything more complicated than that simply didnt work ...

      Still it looks like you folks don't get it?!?

      I'm not looking into this for myself. I'm curious about solution that anybody will be able to use. Especially ISP's or should I say (Shared) Hosting Providers ...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://464209]
Approved by blazar
Front-paged by cLive ;-)
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2014-07-24 03:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (156 votes), past polls