Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

eduardo's scratchpad

by eduardo (Curate)
on Jun 03, 2004 at 13:22 UTC ( #360123=scratchpad: print w/ replies, xml ) Need Help??

Why did we use a relational database... well, there is really a simple answer for that. We didn't for 1/2 of our project, and we did for the other 1/2. :) Let's look at what we built, shall we?

The "project" at surfari consisted off two main products, a website and a search engine. The website needed to keep basic user information, such as username, password, favorite sites, personal bookmarks, etc... Now, I understand your "stunned outrage" at a statement like: Since we didn't need transactions... but, let's analyze why it is that our esteemed friend jeffa would make such a dangerous statement.

When does one "need" transactions? Well, first and foremost, when your data is valuable :) Let's analyze the different dimensions of data that we were storing, and let's see how transactions would have added value...

  • User information - Nope... it was a single "add user" button that pretty much just inserted or did not insert a user into the user table. Maverick's code checked the return status of that insert, and it was impossible for the user table to be left in an indeterminate state
  • User bookmarks - Well, let's see, you could either remove a bookmark, or add a bookmart to the bookmark table. This consisted of "look up the user id", and "add a bookmark ID to the bookmark table with a fake foreign index to the user id." Hm... doesn't really seem like the data could have been left in an indeterminate state here either...
  • Visit tracking - Let's see, each time the user hit a webpage, we stored where they'd come from (thank you HTTP referer!) where they were, etc... we didn't store how long they'd taken to get there, because we could batch process the time dimensionality offline and not negatively impact user experience. So, let's see, another atomic addition to a singular table... yet again, Transactions would not have helped us out here.
  • User preferences - Ah, customization! This was going to be what was going to make my 10% of the company in stock options enough to retire... you see, the user would love to spend hours on a shopping mall website setting up their own private color scheme! well, let's see, each time they changed a parameter, an update was fired off to their "customizations" entry for that particular parameter. Hm... another singular atomic transaction. Crap.

Hm... So, why were we using a relational database in the first place?

We were using the ability to store "fake foreign keys" (referential integrity wasn't insured by MySQL, but that's ok, it didn't change the value of our data) with great ease. We had an *incredibly* nice interface to a very customizable persistent data store... We had a really nice query language for our data store... and When we started doing collaborative filtering (Other users that like this stuff liked this other stuff), having it all in a relational database made that development *super fast*! Not to mention that with "time to market" constraints of Internet time, the fact that it provided all of this super duper functionality in an API we all already knew and loved made choosing MySQL a no-brainer. None of the features of the, admittedly incredible, Postgress would have made 1 lick of difference... after all, it's not like we were:

Removing 100 dollars from debit table.
Adding 100 dollars to credit table.

:) God I love classical examples.

So, where didn't we use relational databases? In our search engine! We took a DEC Alpha optimized B-Tree library, put an advanced forest-and-trees tree balancing and distributing decision walker in front of it, added a *sweet* splay-like aging cache mechanism in front of it to alleviate the von-neumann bottleneck, and took advantage of log2(n) as much as we could :)

So... did we care about the integrity of our data? Sure! We were going to retire from the IPO... however, in *so many* of the applications I have faced in my professional life, (which granted, have *not* been in the financial sector), all "logical transactions" have been of single statement cardinality.

I advocate right tool for the right job... and in this case, the only reason a relational database was used was the really sweet query interface to a persistent data store, and the handy dandy functionaity provided by auto-increments, indexing, etc... I can guarantee you that any time I see a more complex data model, I reach for a transactional data store...

Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2014-08-29 11:30 GMT
Find Nodes?
    Voting Booth?

    The best computer themed movie is:

    Results (280 votes), past polls