Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

How do I make a PSGI program do costly initialisation only once per process, not per thread?

by daxim (Curate)
on Jun 01, 2017 at 10:07 UTC ( [id://1191821]=perlquestion: print w/replies, xml ) Need Help??

daxim has asked for the wisdom of the Perl Monks concerning the following question:

cross-post: https://stackoverflow.com/q/44257530/46395

Consider app.psgi:

#!perl use 5.024; use strictures; use Time::HiRes qw(sleep); sub mock_connect { my $how_long_it_takes = 3 + rand; sleep $how_long_it_takes; return $how_long_it_takes; } sub main { state $db_handle = mock_connect($dsn); return sub { [200, [], ["connect took $db_handle seconds\n"]] }; } my $dsn = 'dbi:blahblah'; # from config file my $app = main($dsn);
Measuring plackup (HTTP::Server::PSGI: Accepting connections at http://0:5000/):
› perl -MBenchmark=timeit,timestr,:hireswallclock -E"say timestr timei +t 10, sub { system q(curl http://localhost:5000) }" connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds connect took 3.0299610154043 seconds 2.93921 wallclock secs ( 0.03 usr + 0.06 sys = 0.09 CPU) @ 107.53/s +(n=10)
Measuring thrall (Starting Thrall/0.0305 (MSWin32) http server listening at port 5000):
› perl -MBenchmark=timeit,timestr,:hireswallclock -E"say timestr timei +t 10, sub { system q(curl http://localhost:5000) }" connect took 3.77111188120125 seconds connect took 3.15455510265111 seconds connect took 3.77111188120125 seconds connect took 3.15455510265111 seconds connect took 3.77111188120125 seconds connect took 3.64333342488772 seconds connect took 3.15455510265111 seconds connect took 3.77111188120125 seconds connect took 3.85268922343767 seconds connect took 3.64333342488772 seconds 17.4764 wallclock secs ( 0.02 usr + 0.09 sys = 0.11 CPU) @ 90.91/s ( +n=10)
This performance is not acceptable because the initialisation happens several times, despite the state variable. How do you make it so it happens only once?
  • Comment on How do I make a PSGI program do costly initialisation only once per process, not per thread?
  • Select or Download Code

Replies are listed 'Best First'.
Re: How do I make a PSGI program do costly initialisation only once per process, not per thread?
by Your Mother (Archbishop) on Jun 01, 2017 at 11:17 UTC

    You’re measuring oddly or the wrong part, maybe? It will never take less than 3 seconds to connect to the mock DB server, because that’s just what the code calls for. The connection to the webapp and its responses are not related though—the $app sub is initialized, holding a DB handle, ready to execute—and seems to be perfectly zippy unless I’m missing something–

    moo@cow[21]~>plackup pm-1191821 HTTP::Server::PSGI: Accepting connections at http://0:5000/ -- moo@cow[701]~>time curl http://0:5000/ connect took 3.1313986862815 seconds 0.004u 0.004s 0:00.00 0.0% 0+0k 0+0io 0pf+0w

    Here it the same with some internal timing code–

    use Time::HiRes qw( sleep gettimeofday tv_interval ); # ... sub main { state $db_handle = mock_connect(+shift); sub { my $t0 = [ gettimeofday ]; [ 200, [], [ sprintf "Application sub took %.6f seconds\n", tv_interval( $t0, [ gettimeofday ] ) ] ]; }; } # ... __END__ moo@cow[56]~>curl http://0:5000/ Application sub took 0.000001 seconds moo@cow[57]~>curl http://0:5000/ Application sub took 0.000002 seconds
      unless I’m missing something
      Indeed; you did not run thrall and couldn't make the crucial observations of the difference to plackup yourself.

      With plackup, the initialisation happens only once, when main is called. It is easy to notice that the server only starts accepting connections three seconds after executing plackup. So the characteristics of the bad performance are at server start-up, only once, happening under the sysadmin's control - this is altogether acceptable. An end user never gets to experience it: either the server is down, or it's up and always responding fast. The example program does not show it, but there's one db connection.

      With thrall, the server starts accepting connections almost immediately. However, each request runs the initialisation separately. I surmise this happens for each spawned thread until the pool is filled, afterwards requests are handled fast on each thread. So the characteristics of the bad performance are at unforeseeable times after the server has been started, up to --max-workers several times, happening under no one's control - this is altogether not acceptable. The user experience is spotty: works fast for most requests, especially when the server had already been running for some amount of requests, but when bad luck strikes and the request happens to be handled by a new thread, the server responds slowly. The example program does not show it, but there are up to --max-workers db connections, and the operations team does not appreciate that.

        The simple approach is to prime the cache of worker threads by spawning your server and then making requests to your server from localhost just so that each thread connects to the database.

        Otherwise, you will have to consult the documentation of your server as to what hooks it offers and how to take advantage of them.

        Looking at Thrall::Server, it seems that it simply creates a new thread in ->_create_thread. You could override that or do your initialisation in a BEGIN block, or use a threads::shared variable to share information between your threads.

        To comment out line #38 (loader => 'Delayed') in thrall.bat?

        Setting environment variable PERL_THRALL_DEBUG=1, and adding something like say "...in main, thread is ", threads-> tid; into "main" of your app might help to debug.

        Will this change to .bat break anything? I don't know, but...
Re: How do I make a PSGI program do costly initialisation only once per process, not per thread?
by Corion (Patriarch) on Jun 01, 2017 at 10:54 UTC

    If your server really is using threads, just set up a shared variable maybe? You will still have to handle race conditions and thundering herds during the initialization.

    If your server is forking, you're out of luck and you'll need some out-of-process way of caching things.

    Update

    The $env will tell you what your server does:

    psgi.multithread: This is a boolean value, which MUST be true if the application may be simultaneously invoked by another thread in the same process, false otherwise.

    psgi.multiprocess: This is a boolean value, which MUST be true if an equivalent application object may be simultaneously invoked by another process, false otherwise.

    I didn't find any hooks for "run this on process startup" or "run this code on thread startup".

    Update2: If you are really using threads, I would stare long and intently on the documentation of the database library on whether it needs reinitialization per-thread or if it does the initialization transparently. I wouldn't expect database handles to naturally work as shared resources across threads.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1191821]
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-24 18:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found