|Think about Loose Coupling|
Joint Database Technologyby erichansen1836 (Acolyte)
|on Mar 24, 2015 at 21:40 UTC||Need Help??|
erichansen1836 has asked for the
wisdom of the Perl Monks concerning the following question:
Is anyone else besides me using? what I call JOINT DATABASE TECHNOLOGY, i.e. tandem/dual use of: Flat "text" File databases (w/fixed-length records) for massive data storage, along with binary Perl SDBM databases of key/value pairs (tied to "in memory" program hash tables) for persistent random access indexing into the Flat File records. I believe this falls under the categories of both: Relational, ISAM (Indexed Sequential Access Method), and NoSQL, and Embedded Database. Parent/Child one-to-many record relationships are supported as well as Lookups to eliminate redundant data in the Flat Files.
Suited best for: Data Warehouses, Data Marts, Decision Support Systems (DSS), & Executive Information Systems (EIS) or Executive Support Systems (ESS).
The KEY in the SDBM KEY/VALUE pair is one or more fields and/or partial fields contained within the fixed-length records of a Flat File.The VALUE is the record offset (in bytes) to set the file pointer in a Flat File before performing READ/WRITE operations. Supported are Unique Primary Key, Unique Alternate Keys, and Alternate Keys with Duplicates.
I use fixed-length record flat file databases to store large amounts of data with large record lengths. In tandem with this, I use Perl SDBM database files to store the integer(pos+/neg-) record offsets (in bytes) used to seek to flat file records by the file pointer, from either the top or end of file. If the integer used is a negative number, the byte position is relative to the END of FILE.
I tie the SDBM database file to a Perl Hash Table, for persistent random access indexing to my Flat File records of max file size 4 Gig each file, with unlimited nbr of files - or whatever nbr is practical for you. A single 4 GIG Flat File may contain multi-millions of records (Example: 5.6 million records where the record lengths are fixed at 760 characters). 4 GIG is a limitation of my older binary distribution of Win32 Perl 5.6.1 build 638 (ActiveState ActivePerl) because I can only use an integer value to randomly seek to a byte offset (+/- 2 GIG from TOP or END of file). Not everyone will experience this limitation.
I use Native Windows API, O/S specific I/O, but you could use portable: sysopen, close, sysseek, sysread, syswrite to navigate about the flat file and read/write data. I also use pack/unpack to read/write the data in or back out. TO BE CLEAR... You can use ANY programming language you like (e.g. Perl, Python, C++, etc.) with SDBM support, and ANY batch or GUI DB user-interface you like (e.g. portable Perl TK, Perl Win32-GUI by Aldo Calpini), and ANY file handling syntax you like (Win32API::File, portable sysopen/sysseek/sysread/syswrite/close, etc.).
My Flat file databases approach 1 terabyte in size, and 1 Billion "text" records, and are made up of a FILE SYSTEM of (DAT for Flat, PAG/DIR for SDBM) file extensions.
Please check out my latest comment additions in my 2nd message published on this thread, where I address Bitmap inclusion, and multi-user concurrency. That is a great topic for discussion, concurrency. The topic of data encoding/encryption/compression is also discussed.
When editing Flat File records, the changes are made "in place" overwriting existing data in the flat file record. A DELETE flag indicator field can be employed to mark records in both the Flat Files and SDBM files (for later BATCH deletion Server-side during off hours) for an application program to recognize as a BYPASS indicator until physically removed.
For ADDING/APPENDING records in a concurrent multi-user environment, to avoid race conditions, it is recommended that you employ a semaphore file for locking during these edits instead of placing a lock directly on the file to be edited. This "Perl" strategy is discussed in depth by an expert/author on the subject here: http://www.informit.com/articles/article.aspx?p=28258&seqNum=3 . For UPDATES to existing records within the database, I would use a RECORD LEVEL locking strategy. For Parent/Child relationships, I would design my DB user-interface to only allow edits to the child records when the parent record was locked for edit. The CHILD records would be housed in a separate FLAT FILE(s) and would have a foreign key index pointing to the PARENT. EXAMPLE: If you had a PARENT flat file of LOANS and a CHILD flat file of COLLATERAL securing those LOANS, such as motor vehicles, you could have a PRIMARY UNIQUE KEY on the CHILD flat file by VIN (Vehicle ID Number) and an ALT KEY w/DUPS on the LOAN NUMBER + UNIQUE SEQUENCE NBR (i.e. LoanNbr|SeqNbr) where the SEQ NBR is an incremented number starting at one(1) and is incremented by one(1) for each COLLATERAL record added to that LOAN. As a rule-of-thumb, you want to retain the ability to rebuild your binary SDBM indexes from the "text" data stored within your Flat File records while at the same time retaining the ties between CHILD records and their PARENT record. Thus, in the example just provided, you would want to store LOAN NBR within the Flat File CHILD records to tie back to the corresponding Flat File PARENT record.
A primitive attempt to index the records within a variable-length record, delimited field Flat File can be found here: https://docstore.mik.ua/orelly/perl4/cook/ch08_28.htm
The only problem with this primitive method/technique is that it only shows you how to randomly access a particular Flat File, variable-length record by the sequential record number. That is only useful if you know the sequential record number to go to. This does not give you the benefit of arbitrarily seeking to a particular record(s) based upon information contained within the fields of a record (such as the compound KEY: BirthDate, + LastNameFirst4Chars, + FirstNameInitial).