Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Answer: Where is the bottleneck in flat file databases?

by MarkM (Curate)
on Jan 23, 2003 at 04:45 UTC ( #229222=categorized answer: print w/replies, xml ) Need Help??

Q&A > database programming > Where is the bottleneck in flat file databases? - Answer contributed by MarkM

The two primary problems with flat file (text) databases are:

  1. Fields are not usually indexed.
  2. If the flat file uses variable length records, updates to existing records that require the record length to change require a re-write of all records below the record being updated.

Fields in a flat file database can be indexed. Either the index can be stored in the flat file database, or in an adjacent file in a known location. However, this is not usually done, as the effort of writing routines to index a flat file database can likely be better spent using an existing binary database format, or toolset.

An additional issue with the requirement that much of the database be re-written if updates are necessary, is the problem of contention. What if multiple processes need to update the database? If process A needs to update the database, it must lock the database, re-write a large section of the database, and unlock the database. Process B needs to lock the database, re-read the database to ensure that it incorporates Process A's change, then re-write a large section of the database including Process B's change, and so on.

In general, flat file databases should only be used if they are very small (I would say less than 100Kbyte, not less than 1Mbyte), and if they only need to be updated by one process, and read by others. The 100Kbyte number is a question of fitting the database content into cache memory, as even having to fallback to RAM can be quite expensive.

I have had a great deal of success with one process writing common data to a text file, and many (500+) clients reading from the text file. The trick here is to ensure that the operation is atomic from the perspective of the client. The client must always see a complete file at the configured location. Also, the client should not rely on the data being fully up-to-date. From a network perspective, for a small amount of data, it is far more efficient to use a network protocol such as NFS to serve a static 'flat file database' to clients, than to serve the data from a generalized database server such as MySQL.

  • Comment on Answer: Where is the bottleneck in flat file databases?
Replies are listed 'Best First'.
Re: Answer: Where is the bottleneck in flat file databases?
by Wysardry (Pilgrim) on Jan 24, 2003 at 00:15 UTC

    I think that one of the reasons so few programmers bother to try and index flat file databases is that they view them as a temporary measure until they can implement a "real" DBMS.

    Your points are well taken, but I have specific reasons for wanting to use flat text files, and it would be beneficial to make the system as efficient as is practical, given the limitations you describe. Some users may well expand on the basic program, so I wish to create a resilient core.

    In this particular case, the database itself would only ever be modified by a single administrator. Modifying existing records would likely be a case of reading in the existing file a record at a time and writing them to a new (modified) file. Once that process is complete, the original file would be renamed to become a backup, and the new file renamed to replace the original file.

    This solves the problem of having to fit the entire database in memory, and also creates a backup. File locking would still be a good idea, of course.

    Your idea of using MySQL to create flat text files is intriguing (and one I may consider for future projects), but unfortunately the program is intended for users who do not have access to an external RDBMS.

    "Every program has at least one bug and can be shortened by at least one instruction -- from which, by induction, one can deduce that every program can be reduced to one instruction which doesn't work." -- (Author Unknown)

Log In?

What's my password?
Create A New User
[chacham]: experience is 8286. reminds me of our first computer
[Corion]: :D

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2017-03-22 22:11 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (276 votes). Check out past polls.