Database queue logic

Marcello has asked for the wisdom of the Perl Monks concerning the following question:

A large Perl application currently queries a certain MySQL table for records. As soon as records are found, they are processed by the application and DELETE'ed from the database. So, basically the table acts as a queue.

This process needs to be as fast as possible. Therefore, the idea is to use multiple Perl processes (2 or more) to query this database and process the records.

The requirements are as follows:

Process the records as fast as possible

Avoid different processes handling the same record

Sometimes a process cannot handle records when a connection to a remote server is lost (which could take hours to resolve). At this point, the other processes should handle its records or otherwise the records would stay in the queue for too long.

I am not sure what kind of logic to implement to meet the requirements above.

For 2 processes, one could handle the records with odd Id's and the other one with even Id's, but then still the second requirement is not met when one process cannot handle the records.

If processes SELECT the same records, there is a significant overhead because 50% or more of the records are already handled by other processes and therefore should not be handled.

What is the best approach I can use to meet these requirements?

TIA!

Comment on Database queue logic

Replies are listed 'Best First'.
Re: Database queue logic by ruoso (Curate) on Jun 14, 2005 at 12:20 UTC
I've dealt with a similar problem once... The key to the solution is that the $sth->rows method will return the number of rows affected by the last query-like statement. SELECT and UPDATE are query-like statements, so here it goes... You have to create a control field in the table with an int value and set it's default value to some non-null value. Then you do the following... `-- fetch 10 to avoid querying too many times.... SELECT * FROM tablex ORDER BY dateofinsertion LIMIT 10; -- Mark the register as owned by you UPDATE tablex SET controlfield=controlfield+1 WHERE id=$id AND control +field=$controlfieldvalue;` [download] When you execute the second query, you test the rows. `if ($sth->rows) { # it did the update, so the row is mine. } else { # someone else took the register, let's try another one }` [download] Simple, isn't it? If some process fail, next time a process get the 10 registers, it will get that again and try to process. daniel	[reply] [d/l] [select]
Re: Database queue logic by trammell (Priest) on Jun 14, 2005 at 20:20 UTC
I'd use something like the following: `# pseudocode while (1) { acquire_lock(); # blocks choose_record(); # choose an unflagged record flag_record(); # this one is mine release_lock(); # let others in process_record(); # do work }` [download] MySQL has "advisory" locking that is suitable for this (see MySQL functions `GET_LOCK()` and `RELEASE_LOCK()`).	[reply] [d/l]
Re: Database queue logic by monarch (Priest) on Jun 14, 2005 at 12:11 UTC
Read more... (546 Bytes) Update 2: As Foxcub correctly points out below, it is futile to protect the SELECT using a MUTEX but not wrapping the DELETE in the same call. Thus I am completely wrong, my logic is entirely flawed. Apologies. Update 3: Added `readmore` tags.	[reply] [d/l]
Re^2: Database queue logic by Marcello (Hermit) on Jun 14, 2005 at 12:14 UTC
This was an idea that crossed my mind, but it causes an overhead because processes have to wait for the MUTEX to be released and cannot process records simultaneous. Marcello	[reply]
Re^3: Database queue logic by monarch (Priest) on Jun 14, 2005 at 12:41 UTC
Read more... (1096 Bytes) Update 2: as pointed out by the reply to this post by Foxcub I was wrong to ignore the fact that the DELETE also needs to occur within the atomic action that contains the SELECT. Apologies. Update 3: added `readmore` tags.	[reply] [d/l]
Re^4: Database queue logic by Tanalis (Curate) on Jun 14, 2005 at 13:20 UTC
Re: Database queue logic by Solo (Deacon) on Jun 14, 2005 at 17:33 UTC
I would have a single process that watches and updates the table which farms records out to worker processes--pre-forking if performance dictates. How fast do you want to check for new records? Rather than repeatedly querying the table, you might consider an insert trigger (not sure if that could work) or tailing a log file instead. --Solo -- You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.	[reply]
Re: Database queue logic by astroboy (Chaplain) on Jun 14, 2005 at 19:56 UTC
I'd have a single master process monitoring the queue in a tight loop. As soon as there is a new record to deal with, the master process would spawn a worker to perform the desired tasks. Spawned processes can only deal with the record that they have been allocated. If a spawned process can not handle the task, it alerts the master process, who will reallocate the task as required Of course, a lot of this comes down to how much resource you've got. You don't want to spawn more processes than the server can handle. You may wish to precreate a finite pool of workers if resources are a problem	[reply]
Re: Database queue logic by Ben Win Lue (Friar) on Jun 14, 2005 at 12:12 UTC
I would add a timestamp column to my process list. Any process would: select all processes that have a null timestamp or a timestamp that's older than 10* Minutes update all selected processes with the timestamp of now. process all selected processes delete processed processes *the 10 Minutes are subject to fiddling	[reply]
Re^2: Database queue logic by moot (Chaplain) on Jun 14, 2005 at 14:34 UTC
You're still racing between the select and the update, although something like this might work: Update records where 'owner' column is null, with my value and timestamp now(). Select records with my owner value or where timestamp is older than threshold. This catches the case where some records have gone stale because a process died or what-have-you. Process & delete selected records. This way you're letting the database server handle updating the correct records. Depending on how speedily new records are added to the queue, it seems likely that processes will be working with records more than waiting for selects. Of course this approach doesn't help with splitting the records evenly between processes, but neither will a mutex.	[reply]
Re^3: Database queue logic by Marcello (Hermit) on Jun 15, 2005 at 08:05 UTC
Good solution, I will try this approach shortly. Thanks all!	[reply]
Re: Database queue logic by thcsoft (Monk) on Jun 14, 2005 at 13:30 UTC
anyway, you will have to perform some kind of lock on the data. a mutex seems to be the correct approach to me - in a not-threaded environment. @Marcello what about threading? is it a must for your concept to run in different processes? if, once threading, you take one shared scalar for each of the tables your threads will be acting upon, your problem will possibly no longer be one. language is a virus from outer space.	[reply]

Back to Seekers of Perl Wisdom