Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Sharing data structures among http processes?

by jbert (Priest)
on Jun 28, 2001 at 12:40 UTC ( #92211=note: print w/ replies, xml ) Need Help??


in reply to Sharing data structures among http processes?

Other answers have described the copy-on-write of memory in Unix and the fact that you need to use shared memory (SYSV SHM stuff or mmap'd files) and a good thing would be to use a wrapper around these things like Apache::Sharedmem.

Shared memory is tricky stuff in a similar way that threads are tricky things, since you open yourselves up to race conditions where two processes are altering the shared memory and violate each other's assumptions.

As a simple example, a process might increment a variable held in shared memory by 1 and assume that it has that value later on in the same routine, whereas another process might have incremented it in the meantime. Hard-to-find bugs which are cured by adding locks (semaphores, mutexes, whatever) to define critical sections of code which only one process at a time may execute. Ugh.

There might be a simpler solution for you though. You mention that you want changes to your data to go directly to persistent store (i.e. on disk) but you also want your data to live in memory.

I'll assume that you want the data in memory for performance reasons - i.e. you don't want to suffer a disk access per-request. But...operating systems are smart and if you have sufficient RAM on your box (say, for example, enough to build the data structure you were talking of) and you are repeatedly accessing this data then the OS should keep it all nicely in cache for you. So whilst you might be accessing hash values in a GDBM tie'd hash, the OS is doesn't bother to touch disk. When you change data, the OS has the job of getting it to disk. If your data store is a relational database similar arguments apply.

The nice thing about this is that you get it for free. You still need to be careful in that different processes may change the underlying data store at a time which might be inconvenient for the other processes - this is where atomic transactions on databases come into play...

There might be other reasons why you want the memory structure, but I thought it was worth a thought.


Comment on Re: Sharing data structures among http processes?
Re: Re: Sharing data structures among http processes?
by sutch (Curate) on Jun 28, 2001 at 15:40 UTC
    You are correct, the shared memory structure is for performance reasons. It is for an application that I expect to be accessed often. The queries against the database are complex and will probably overload the database server so much that the required performance will not be met with a database alone.

    Your GDBM idea sounds good enough, as long as the OS can be made to share the cache among all of the processes. Will the GDBM tied hash be automatically shared (through the OS), or does that need to be shared using shared memory? Or does this method require that each process have a separate tied hash?

      The OS-level cacheing I mentioned was simply good old file-level cacheing. If your data store is held in files accessed through the file system (as is the case for simple databases like GDBM, flat file, etc) then often-used data is kept around in RAM - shared between processes.

      You still need to spend some CPU cycles in doing lookups, etc but you don't spend any I/O - which is a win.

      OK - so your back end data store is in a database which you wish to protect from the load which your hits are likely to generate. Do you know for certain this is going to be a problem? If not can you simulate some load to see?

      Presumably you don't want to cache writes - you want them to go straight to the DB.

      So you want a cacheing layer in front of your data which is shared amongst processes and invalidated correctly as data is written.

      I don't know which DB you are using but I would imagine most/many would have such a cacheing layer. If this isn't possible or it doesn't stand up to your load then the route I would probably choose is to funnel all the DB access through one daemon process which can implement your cacheing strategy and hold one copy of the cached info.

      But I wouldn't do that until it was clear that I couldn't scale my DB to meet my load or do something else...say regularly build read-only summaries of the complex queries.

      I guess it all kind of depends on the specifics of your data structures...sorry to be vague. There is a good article describing a scaling-up operation at webtechniques which seems informative to me.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://92211]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-09-21 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (168 votes), past polls