How do I make a PSGI program do costly initialisation only once per process, not per thread?

daxim has asked for the wisdom of the Perl Monks concerning the following question:

cross-post: https://stackoverflow.com/q/44257530/46395

Consider app.psgi:

#!perl
use 5.024;
use strictures;
use Time::HiRes qw(sleep);

sub mock_connect {
    my $how_long_it_takes = 3 + rand;
    sleep $how_long_it_takes;
    return $how_long_it_takes;
}
sub main {
    state $db_handle = mock_connect($dsn);
    return sub { [200, [], ["connect took $db_handle seconds\n"]] };
}
my $dsn = 'dbi:blahblah'; # from config file
my $app = main($dsn);
[download]

Measuring plackup (HTTP::Server::PSGI: Accepting connections at http://0:5000/):

› perl -MBenchmark=timeit,timestr,:hireswallclock -E"say timestr timei
+t 10, sub { system q(curl http://localhost:5000) }"
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
connect took 3.0299610154043 seconds
2.93921 wallclock secs ( 0.03 usr +  0.06 sys =  0.09 CPU) @ 107.53/s 
+(n=10)
[download]

Measuring thrall (Starting Thrall/0.0305 (MSWin32) http server listening at port 5000):

› perl -MBenchmark=timeit,timestr,:hireswallclock -E"say timestr timei
+t 10, sub { system q(curl http://localhost:5000) }"
connect took 3.77111188120125 seconds
connect took 3.15455510265111 seconds
connect took 3.77111188120125 seconds
connect took 3.15455510265111 seconds
connect took 3.77111188120125 seconds
connect took 3.64333342488772 seconds
connect took 3.15455510265111 seconds
connect took 3.77111188120125 seconds
connect took 3.85268922343767 seconds
connect took 3.64333342488772 seconds
17.4764 wallclock secs ( 0.02 usr +  0.09 sys =  0.11 CPU) @ 90.91/s (
+n=10)
[download]

This performance is not acceptable because the initialisation happens several times, despite the state variable. How do you make it so it happens only once?

Comment on How do I make a PSGI program do costly initialisation only once per process, not per thread? Select or Download Code

Replies are listed 'Best First'.
Re: How do I make a PSGI program do costly initialisation only once per process, not per thread? by Your Mother (Archbishop) on Jun 01, 2017 at 11:17 UTC
You’re measuring oddly or the wrong part, maybe? It will never take less than 3 seconds to connect to the mock DB server, because that’s just what the code calls for. The connection to the webapp and its responses are not related though—the `$app` sub is initialized, holding a DB handle, ready to execute—and seems to be perfectly zippy unless I’m missing something– `moo@cow[21]~>plackup pm-1191821 HTTP::Server::PSGI: Accepting connections at http://0:5000/ -- moo@cow[701]~>time curl http://0:5000/ connect took 3.1313986862815 seconds 0.004u 0.004s 0:00.00 0.0% 0+0k 0+0io 0pf+0w` [download] Here it the same with some internal timing code– `use Time::HiRes qw( sleep gettimeofday tv_interval ); # ... sub main { state $db_handle = mock_connect(+shift); sub { my $t0 = [ gettimeofday ]; [ 200, [], [ sprintf "Application sub took %.6f seconds\n", tv_interval( $t0, [ gettimeofday ] ) ] ]; }; } # ... __END__ moo@cow[56]~>curl http://0:5000/ Application sub took 0.000001 seconds moo@cow[57]~>curl http://0:5000/ Application sub took 0.000002 seconds` [download]	[reply] [d/l] [select]
Re^2: How do I make a PSGI program do costly initialisation only once per process, not per thread? by daxim (Curate) on Jun 01, 2017 at 13:10 UTC
unless I’m missing something Indeed; you did not run thrall and couldn't make the crucial observations of the difference to plackup yourself. With plackup, the initialisation happens only once, when `main` is called. It is easy to notice that the server only starts accepting connections three seconds after executing plackup. So the characteristics of the bad performance are at server start-up, only once, happening under the sysadmin's control - this is altogether acceptable. An end user never gets to experience it: either the server is down, or it's up and always responding fast. The example program does not show it, but there's one db connection. With thrall, the server starts accepting connections almost immediately. However, each request runs the initialisation separately. I surmise this happens for each spawned thread until the pool is filled, afterwards requests are handled fast on each thread. So the characteristics of the bad performance are at unforeseeable times after the server has been started, up to `--max-workers` several times, happening under no one's control - this is altogether not acceptable. The user experience is spotty: works fast for most requests, especially when the server had already been running for some amount of requests, but when bad luck strikes and the request happens to be handled by a new thread, the server responds slowly. The example program does not show it, but there are up to `--max-workers` db connections, and the operations team does not appreciate that.	[reply] [d/l] [select]
Re^3: How do I make a PSGI program do costly initialisation only once per process, not per thread? by Corion (Patriarch) on Jun 01, 2017 at 14:54 UTC
The simple approach is to prime the cache of worker threads by spawning your server and then making requests to your server from `localhost` just so that each thread connects to the database. Otherwise, you will have to consult the documentation of your server as to what hooks it offers and how to take advantage of them. Looking at Thrall::Server, it seems that it simply creates a new thread in `->_create_thread`. You could override that or do your initialisation in a BEGIN block, or use a threads::shared variable to share information between your threads.	[reply] [d/l] [select]
Re^3: How do I make a PSGI program do costly initialisation only once per process, not per thread? by vr (Curate) on Jun 01, 2017 at 13:51 UTC
To comment out line #38 (`loader => 'Delayed'`) in thrall.bat? Setting environment variable `PERL_THRALL_DEBUG=1`, and adding something like `say "...in main, thread is ", threads-> tid;` into "main" of your app might help to debug. Will this change to .bat break anything? I don't know, but...	[reply] [d/l] [select]
Re^4: How do I make a PSGI program do costly initialisation only once per process, not per thread? by daxim (Curate) on Jun 07, 2017 at 11:02 UTC
Re: How do I make a PSGI program do costly initialisation only once per process, not per thread? by Corion (Patriarch) on Jun 01, 2017 at 10:54 UTC
If your server really is using threads, just set up a shared variable maybe? You will still have to handle race conditions and thundering herds during the initialization. If your server is forking, you're out of luck and you'll need some out-of-process way of caching things. Update The `$env` will tell you what your server does: `psgi.multithread`: This is a boolean value, which MUST be true if the application may be simultaneously invoked by another thread in the same process, false otherwise. `psgi.multiprocess`: This is a boolean value, which MUST be true if an equivalent application object may be simultaneously invoked by another process, false otherwise. I didn't find any hooks for "run this on process startup" or "run this code on thread startup". Update2: If you are really using threads, I would stare long and intently on the documentation of the database library on whether it needs reinitialization per-thread or if it does the initialization transparently. I wouldn't expect database handles to naturally work as shared resources across threads.	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


Pathologically Eclectic Rubbish Lister
	PerlMonks