Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

problem porting to threaded mode

by cmac (Monk)
on Jan 01, 2009 at 05:03 UTC ( [id://733599]=perlquestion: print w/replies, xml ) Need Help??

cmac has asked for the wisdom of the Perl Monks concerning the following question:

Trying to shift our largely mod_perl2 web site to an Apache2 threaded MPM and perl ithreads. The following works under the non-threaded prefork MPM:
use DB_File; my @dbs; # array of hash references my @dbModTime; # mod times of db files my @dbfns; # array of database pathnames # executed before fork into child processes sub post_config { my $db; my $s = $_[3]; # tie the DBs and get their mod times for ($db = 0; $db < @dbfn; $db++) { $dbs[$db] = {}; tie %{$dbs[$db]}, "DB_File", $dbfn[$db], O_RDONLY or die ((caller 0)[3]. " can't tie " . $dbfn[$db] . ": $!"); $dbModTime[$db] = (CORE::stat($dbfn[$db]))[9] or die ((caller 0)[3]. " can't stat " . $dbfn[$db] . ": $!"); } }
The routines that use the databases re-stat the DB files and untie and re-tie a DB that has changed. Each child process must do this for itself.

In the threaded environment, any thread within a process may discover that such an untie and re-tie is necessary, but such an operation should be effective for the other threads in the process as well. This means that @dbs and @dbModTime should be shared among the threads:
use threads; use threads::shared; my @dbs :shared; # array of hash references my @dbModTime :shared; # mod times of db files
Making only the changes above makes perl complain "Invalid value for shared scalar" about the $dbs[$db] = {}; line. This error message can be fixed as follows:
for ($db = 0; $db < @dbfn; $db++) { $dbs[$db] = shared_clone({}); tie %{$dbs[$db]}, "DB_File", $dbfn[$db], O_RDONLY or die ((caller 0)[3]. " can't tie " . $dbfn[$db] . ": $!"); $dbModTime[$db] = (CORE::stat($dbfn[$db]))[9] or die ((caller 0)[3]. " can't stat " . $dbfn[$db] . ": $!"); $s->log->notice ($dbfn[$db]." has " .scalar(keys(%{$dbs[$db]}))." entries"); }
Unfortunately, when this is done the DBs look empty, and the log notices for each DB show "0 entries".

Removing the ':shared' tag for @dbs and the 'shared_clone()' wrapper for '{}' causes the log notices to show the proper number of entries for each DB, but blows up the Apache configuration process (before the 'resuming normal operations' message) with
httpd in free(): error: chunk is already free
in error_log and the following on the terminal:
Abort trap (core dumped) Error invoking apachectl start command
I guess not having databases is better. I've tried using @dbs as an array of references to named, shared hashes: also no database content. The 'worker' and 'event' MPMs work identically w/r/t this problem.

They say monks aren't into fancy threads, but suggestions of things to try will be very welcome.

Happy New Year, cmac

Replies are listed 'Best First'.
Re: problem porting to threaded mode
by tilly (Archbishop) on Jan 01, 2009 at 11:45 UTC
    The most obvious question is why you want to shift to Apache2 threaded MPM and perl ithreads. It seems like a source of trouble to me with no obvious benefits.

    Anyways you are trying to do it, so what is going wrong in your attempts? Well the first problem was that {} can't be shared. You can wrap it with shared or shared_clone to solve that problem, and you did. But then you have a situation where you have declared a shared variable and then try to tie it. But as Liz says in Things you need to know before programming Perl ithreads, sharing already works through a tie so you can't tie a shared variable. So your dbs continue to look like empty hashes after the tie. Then you got rid of the share and ran into the fact that DB_File is a C-level library that was never designed to be thread safe..

    How then can you solve this? Your best bet i Thread::Shared. From the documentation you should not share the variables and instead write something like this:

    tie %{$dbs[$db]}, "Thread::Shared", {module => "DB_File"}, $dbfn[$db], + O_RDONLY or die ((caller 0)[3]. " can't tie " . $dbfn[$db] . ": $!");
    Alternately in the unshared version you can try to replace DB_File with something else. BerkeleyDB is a more sophisticated version of the same, but I doubt it is thread-safe. The pure Perl DBM::Deep is more promising. It should be easy to do a one-time copy from one format to the other. I'd be somewhat concerned about whether seeks on the database file in different threads could interfere with each other, but you can test that fairly easily and it probably works. (I'm paranoid though, and would test it.)

    Update: Also don't forget to use locking where appropriate!

      Why? Mostly so I can use a lower-cost hosting plan from my IHP, whose plans include a maximum number of processes. When I get it working I mean to run some performance tests between the 'event' version and the 'prefork' version, and decide which to stay with.

      Are you sure you mean to say "Thread::Shared"? Its documentation on CPAN includes nothing like the "tie" usage that you recommend. The article by Liz mentions a "proof of concept" module that she wrote called "Thread::Tie" that has such use noted, including {method => ...} as the 3rd operand. But then the Thread::Tie doc says that it should be superceded by an XS/C version for production use.

      I'm pretty sure that @dbs has to be shared by some method, so that any thread can untie and retie a hash/db when the db changes, and have this action be effective for all of the threads in the process. I only backed away from having it shared as an experiment/data point.

      An alternative that just occurred to me is to create one thread that does all of the DB accesses for the other threads. No sharing needed! Your thoughts (or anyone else's) on this approach? Can I create the thread at post-config time and have it carried over into all the Apache child processes?

      If anything, my mod_perl2 code probably contains more locks than necessary at this point. I have gotten several answers from module authors that their modules are thread-safe (or "can't see any reason why not") and have taken locks out accordingly.

      Thanks and HNY, cmac
Re: problem porting to threaded mode
by monarch (Priest) on Jan 01, 2009 at 23:09 UTC
    Apache provides a number of hooks that can be used during the different stages of process spawning and request handling (mostly relevant to the worker model). I've done this using the APR C API but not with mod_perl. (There seems to be little documentation on such advanced use in mod_perl).

    If you are intending to open a resource that you want to share across threads within a process have you considered writing the appropriate code for the PerlChildInitHandler callback? Once you've opened your resource you could store a handle to it in the Process (see Apache2::Process) memory pool (see Apache2::Pool). When you get to your PerlResponseHandler you would then make use of the handle stored in the Process memory pool (with appropriate locking, of course, to protect against multiple thread access).

    I cannot find any concrete examples to show you, nor do I know if this is actually technically possible using mod_perl.

    Update: adjusted formatting.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://733599]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-25 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found