http://www.perlmonks.org?node_id=57759


in reply to Re: Databases and tied hashes.
in thread Databases and tied hashes.

Some clarifications on those points.
  1. If your final version won't need an SQL database, then a dbm database is fine for the concept stage.
  2. What separates the need for a relational database from a dbm is your data model. If you are starting to get into relationships and correlations between data (eg taking sales figures and getting reports of sales by customer, by product etc) then you clearly wanted a relational database. If you want a simple lookup, then a dbm is just fine.
  3. Berkeley DB is indeed an industrial strength database. It is particularly well suited to situations which need very high performance for simple tasks. (It is also great for embedded use, but I digress.) The bottlenecks that you will hit first have to do with the CGI model.
  4. Yes. GDBM may as well have BTrees. The wins of BTrees here are that they keep data in order (hashes do not) and get better locality of reference (a very organized access pattern). If your data fits in memory then hashes are generally faster. If not, then BTrees are not.
  5. Yes. In high performance read-write situations, locking is important and how it is done is going to be your bottleneck. Most web applications are write seldom, read many times.
  6. Yes. Backup. And don't expect that binary data formats will be portable from machine to machine.
  7. If you want a website to scale, definitely. It is much easier to balance a load across 5 webservers than keep 5 databases in sync. However if you are anticipating this need, using a dbm solution will likely involve some custom work. Relational databases all have the data access segregated into its own process so the database can be moved to another machine. dbms traditionally do not.
  8. I think you are dramatically overestimating the needed resources.