Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Sharing database handle

by Marcello (Hermit)
on Apr 18, 2002 at 13:45 UTC ( [id://160200]=perlquestion: print w/replies, xml ) Need Help??

Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have an application which forks of child processes to do all the work. At the moment, every child process creates its own database connection, does its work and closes the database connection again. This causes a lot of connections to be made to the database.

What I would like to do is to create one database connection, and let all child processes use this same connection. But sometimes the database connection drops, and a new one needs to be created.

Question 1: Is it wise to use the same database handle for all child processes?
Question 2: How do I check if a database handle has been destroyed (e.g. connection lost) without performing a query? I want the server process only to create the database connection, and check if it is still usuable by child processes or otherwise restore it.

Replies are listed 'Best First'.
Re: Sharing database handle
by Fletch (Bishop) on Apr 18, 2002 at 13:52 UTC

    Unless you're very cozy with the database backend (enough so that you can figure out how to do some sort of locking of whatever communications mechanism it's using to talk to its server), you really don't want to do this. This is why things such as Apache::DBI usually set up persistent handles after the child processes have forked.

    Check out POE::Component::DBIAgent and POE for another possibility.

    As for the second, DBI has a ping method. Check the docs.

Re: Sharing database handle
by perlplexer (Hermit) on Apr 18, 2002 at 13:57 UTC
    "Have I told you... lately..." that (Super Search)++ ?

    Check this out - Sharing a database handle over multiple processes.

    From 'perldoc DBI'
    ping $rc = $dbh->ping; Attempts to determine, in a reasonably efficient way, if the database server is still running and the connection to it is still working. Individual drivers should implement this function in the most suitable manner for their database engine.

    --perlplexer
Re: Sharing database handle
by TheHobbit (Pilgrim) on Apr 18, 2002 at 13:59 UTC

    Hi,
    you do not say so, but I suppose you use DBI. In that case, if your database driver implements it, you could use the ping() method to check the connection.

    What would you like to do, is (more or less) the same thing that the Apache::DBI module do. Maybe you could get inspiration from that module (I only use it, never went into his bowels)

    Cheers


    Leo TheHobbit
    GED/CS d? s-:++ a+ C++ UL+++ P+++>+++++ E+ W++ N+ o K? !w O? M V PS+++
    PE-- Y+ PPG+ t++ 5? X-- R+ tv+ b+++ DI? D G++ e*(++++) h r++ y+++(*)
Re: Sharing database handle
by Elian (Parson) on Apr 18, 2002 at 15:13 UTC
    It's not wise, and it won't always work. Some databases, Oracle specifically though there may be others, will fail horribly if you try this--the data behind the handle is process-specific, and handles won't work right if at all when you try this.

      Hi,
      well, this <emph>only</emph> implies that you'll have to devote a process (either the father or one of the siblings) handle the handle (bad pun, sorry) and the (other) children will have to talk with that process to use the connection.

      IIRC, that's the way Apache::DBI works.

      Cheers
      Leo TheHobbit
      GED/CS d? s-:++ a+ C++ UL+++ P+++>+++++ E+ W++ N+ o K? !w O? M V PS+++
      PE-- Y+ PPG+ t++ 5? X-- R+ tv+ b+++ DI? D G++ e*(++++) h r++ y+++(*)
        No, Apache::DBI doesn't do any inter-process communication. It's dirt simple. It just opens a separate database handle in each process and keeps it open.
Re: Sharing database handle
by mamboking (Monk) on Apr 18, 2002 at 18:16 UTC
    Instead of sharing a connection you probably want to do connection pooling. You could have your app open up a bunch of connections at application startup then as a child process is spawned it gets a connection from the pool. When it is finished it returns the connection to the pool. If there aren't any available connections then your process can block until one becomes available. Since opening and closing database connections is very costly this should greatly help your performance. Also, by pre-allocating the connections, you always have control over how many connections are out there.
      I don't think that will actually work. The trouble is, you can't easilly pass DBI objects (which involve open sockets and XS code) back and forth between processes.
      Here is a connection pooler, sqlrelay. Good thing is it has interfaces for numerous languages as well (so you can pool what would otherwise be per-language puddles). I haven't gotten around to setting it up yet (tuit deficit), but look forward to doing so.

      --
      perl -pew "s/\b;([mnst])/'$1/g"

Re: Sharing database handle
by gav^ (Curate) on Apr 18, 2002 at 22:28 UTC
    It would have been very handy to know what database you're talking about. With MySQL, connecting is fast, in fact they list it when talking about optimization.

    You might find it more portable (and perhaps faster) to tweak your database to allow more connections and to do anything suggested to improve performance on that end.

    Another option maybe for the child processes not to write to the database at all (especially in the case with logging) and for the parent to dump the data into the db at regular intervals.

    Hope this helps...

    gav^

      Another option maybe for the child processes not to write to the database at all (especially in the case with logging) and for the parent to dump the data into the db at regular intervals.

      Well, this may be an option, but it's probably not a good idea. You'd hate to have a bunch of inserts or updates queued in the parent and then have that process go down before it can finish the queue, or find out that the connection dropped for whatever reason since the last time it processed a queue.

      Has anyone (esp. Marcello) seen data about how much it costs loadwise to have a bunch of child connections? My own (anecdotal, not systematic) experience has been that the number of connections is less important than what those connections are doing. If the SQL demands of any one process are heavy, having more than a couple going at once will hurt, no matter how you do it. If it's easy stuff, then number of concurrent requests is not much of an issue.

Re: Sharing database handle
by Zapawork (Beadle) on Apr 18, 2002 at 19:58 UTC
    Here is another idea,
    You are already spawning child processes so why not set up either a bidirectional pipe or a seperate IPC to allow the child to send requests to the parent. In this fashion the parent could have one main routine to poll the child processes for queries, test the connect (dbi->ping, dbi is just a great package) and then execute and return the information to the child.

    In this model you would seem to accomplish your goal, you wuold allow all of your child processes to communicate through one ODBC connection, you'd have only one ODBC connection and for fun you'd have one place to check for improper data within queries.
    Dave -- Saving the world one node at a time
      This is a nifty thought, but I'd worry that it would make the parent/server a bit monstrous and hard to get right. I guess if each child request is read, executed and answered back to the child in one block, it's certainly manageable.

      There still is the empirical question: is it better to serialize all activity through a single, very active connection, as opposed to having a bunch of connections (leaving it to the DBMS to coordinate the actions, which is, in part, what it was written to do)? I don't really know that yet (sorry).

        Sorry about the late response,
        Proof to the pudding, so to speak, for this type of practive goes back to the ODBC models that all of your modern day websties use.
        Having one central connection allows you to control access while guaranting preformance. If you need data in a large scale form, you usually implement application servers to cache and give performance enhancements.
        Hope this helps
        Dave -- Saving the world one node at a time
Re: Sharing database handle
by elwarren (Priest) on Apr 18, 2002 at 21:02 UTC
    Sounds like it might be a good use for DBI::Proxy
Re: Sharing database handle
by BUU (Prior) on Apr 18, 2002 at 18:33 UTC
    Is there anyway to do something similar to connnection pooling just using cgi?
Re: Sharing database handle
by d_i_r_t_y (Monk) on Apr 21, 2002 at 03:10 UTC

    hi marcello,

    I have an application which forks of child processes to do all the work. At the moment, every child process creates its own database connection, does its work and closes the database connection again. This causes a lot of connections to be made to the database.

    how many children are you forking?!
    seriously, we have done the same thing for our biological data since the amount of work preceding each insert/update justifies the parallel model.

    i don't think there is a problem with many db connections as long as the work for each thread justifies the connection time overhead. on mysql, connections are cheap and fast and i wouldn't even think twice about doing it (though concurrency/locking with mysql is more or an issue...). on db2, where 1 connection costs about 750msec each (that's client/server on same machine!), you would want to have at least 750msec worth of work for each process to justify the connection overhead. that said, divide and conquer works a treat if you can break down your workload into mutually exclusive chunks.

    btw, have to say it's sure satisfying to see those CPUs and disks crying for mercy when you have 8 concurrent processes hammering away inserts to the db...

    matt

      Hi Matt,

      I'm forking about 1 process every second. And indeed, I use DBI to connect to a MySQL database. The child processes only perform simple SELECT and INSERT queries, so no heavy stuff.

      Judging from the reactions above, I guess it's best to keep it the way it is. I was just wondering if I could be more efficient.

      Grtz Marcello
        i'm forking about 1 process every second.

        continuously?! surely not?!!... perhaps you should post the code?

        if divide and conquer is not applicable then i have one other suggestion:

        1. open a local UNIX socket/pipe
        2. fork a predefined set of children (say, 8), have each of the children open a DBI handle to mysql
        3. set up the parent process to loop over your insert data spooling it to the socket that the children are all listening to/blocking on; meanwhile each of children are blocking in the $server->accept() call waiting for the parent to send them data
        4. upon reading a chunk of data, enter the insert routine and loop back to the accept
        5. continue until parent runs out of data

        an example of this approach (a pre-forking server, like apache) is in the perl cookback in the chapter on sockets or IPC, though i don't have the book in front of me at the moment, so can't give specific recipe.

        but you're probably right; unless you're loading >1Gb of data (which does or does not need transformation prior to insert), it's probably fast enough. ;-)

        d_i_r_t_y

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://160200]
Approved by mattr
Front-paged by mattr
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-24 12:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found