Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

SDBM databases: Alternate Keys with Duplicates

by erichansen1836 (Acolyte)
on Oct 22, 2017 at 18:39 UTC ( #1201847=CUFP: print w/replies, xml ) Need Help??

Perl SDBM databases of key/value pairs (tied to program hash tables) can hold/house multiple format KEYS - which is convenient for persistent random access indexing to Flat File database records. <Note: The VALUE in the KEY/VALUE pairs is used to store the byte offsets of the Flat File records indexed>. For EXAMPLE... If you have a Flat File database (of millions of records) having the fixed-length records, random access indexed, by Social Security Number (UNIQUE PRIMARY KEY), you may also wish to have an ALTERNATE KEY WITH DUPLICATES too, in case the Social Security Number is not known for Look Up. The below code snippet (incomplete, used just to illustrate a methodology) shows how this may be accomplished to setup the indexing. Once the indexing is setup, you can use a FOR LOOP iterated from: 1 to $NUM_RECS, to random access retrieve all the Flat File database records matching any arbitrary compound KEY, composed of info contained with the fields of the records. Note: The FILE POINTER is set to any record byte offset before performing READ/WRITE operations. This is ISAM(Indexed Sequential Access Method), NoSQL, Embedded database technology. This indexing stays persistent, so that Lookup is immediately available every time you launch your database user-interface (or batch) application program. SDBM is in the public domain, so you can distribute your FlatFile/SDBM database files and Perl Application Code, FREE of CHARGE to as many companies and end-users as you like. For more discussion and Perl code examples at Perl Monks, see "JOINT DATABASE TECHNOLOGY" thread.

#-- YYYYMMDD #-- Key example: BirthDate|LastNameFirst4Chars|FirstNameInitia +l|StateCode #-- "19591219|Will|K|TX" #-- $KEY without a Seq Nbr is used to increment the number of rec +ords saved to the database #-- having a particular ALT KEY w/DUPS - in this example: "1959 +1219|Will|K|TX" $KEY=$BirthDate . "|" . $LastNameFirst4Chars . "|" . $FirstNameIn +itial . "|" . $StateCode; $Hash{$KEY}=0; #-- Now index the first record encountered in the Flat File datab +ase with this particular ALT KEY w/DUPS $num_recs = $Hash{$KEY}; $num_recs++; #-- i.e. one(1) $Hash{$KEY}=$num_recs; $newKEY=$KEY . "|" . $num_recs; #-- produces: "19591219|Will|K|TX|1" $Hash{$newKEY}= #-- The VALUE would be set to the byte offset o +f the Flat File record just indexed #-- Now index the second record encountered in the Flat File data +base with this particular ALT KEY w/DUPS $num_recs = $Hash{$KEY}; $num_recs++; #-- i.e. two(2) $Hash{$KEY}=$num_recs; $newKEY=$KEY . "|" . $num_recs; #-- produces: "19591219|Will|K|TX|2" $Hash{$newKEY}= #-- The VALUE would be set to the byte offset o +f the Flat File record just indexed #-- and so on...

Replies are listed 'Best First'.
Re: SDBM databases: Alternate Keys with Duplicates
by Corion (Pope) on Oct 23, 2017 at 08:02 UTC

    Have you considered using DBD::SQLite instead, to convert your database from a key/value store to relational tables? That way, you get many benefits including ad-hoc queries and also make reordering and handling of keys much easier.

    Also note that using an SSN as primary key isn't really a great choice, as SSNs have duplicates.

      Thanks "Corion" for the reply and suggestion. I am sure your suggestion will benefit some, especially those inclined towards SQL. My suggestion is just a NoSQL option for some inclined towards Perl, NoSQL/ISAM, FREE, embedded database technology. Anyone interested in further discussion and Perl code examples can search at PerlMonks on "Joint Database Technology". This is very simple Relational Flat File Database Technology (RDMS) capable of storing perhaps up to 1 Billion records in practicality for READONLY or READ/WRITE databases.

      Each of my Flat Files in my Relational Flat File database system may contain millions of records each (Example: 5.6 million records where fixed-length records are 760 characters in length - with no CR/LF). It would take 200 Flat Files in your database each containing 5 million records to reach 1 Billion records. These records could be logically/physically sorted/segregated across all 200 Flat Files for optimized sequential access within your Indexed Sequential Access Method (ISAM) database system. Parent/Child one-to-many record relationships are supported. The Child records stored in separate Flat Files from the Parent records. Note: Always store the foreign key in the Child records pointing back to the Parent Record, so that the ties are maintained, and the indexing can be rebuilt from the Flat File data should a binary SDBM file become corrupted (unlikely I think).

      I have a 4-GIG limitation on each of my FLAT FILES due to my binary distribution of Win32 PERL only being able to SEEK TO and set the FILE POINTER within a Flat File +/- 2-GIG relative to TOP or END of FILE. This is because I only have support for INTEGER values up to pos/neg 2-GIG. If you have DOUBLE INTEGER SUPPORT, your FLAT FILES could be much larger than 4-GIG. This may be Operating System dependent too?? I'm not savvy on that.

      When editing/modifying records within this Flat File database system, changes are made "in place". This means it is practical to random access and modify a single record within a 1 Billion record database, and do it within a matter of a few seconds. Changes ARE NOT written out to a new Flat File. A record may be overwritten in its entirety, or you may only overwrite the contents of a single or multiple fields within a single record. Record changes may be logged to a separate Flat File(s) including the user-name, node, and datetime. THIS IS YOUR PERL EMBEDDED DATABASE SYSTEM. You can make it as complex or simple as you need. You can incorporate SEMAPHORE files/programming into your PERL APPS to indirectly protect your Flat File database against RACE CONDITIONS within a CONCURRENT MULTI-USER READ/WRITE DATABASE ACCESS ENVIRONMENT. I have not tried this. It may also be necessary to protect the SDBM files in this same manner?

      The SDBM database(s), tied to Perl in-memory hash table(s), hold the byte offsets of each record to be random accessed within this Flat File database system. This indexing can be setup for "each record" or the "first record in a sorted group of records" where the first record is randomly accessed, and the remaining records are sequentially accessed (or) where the entire sorted group of records is read in one massive READ and UNPACKED into an ARRAY for processing. Records are arbitrarily random accessed by the data contained with the fields of the records of your Flat File database system. You can store multiple KEY formats within the same SDBM database file. One SDBM file can contain all your UNIQUE PRIMARY KEYS, UNIQUE ALTERNATE KEYS, and ALTERNATE KEYS w/DUPLICATES. As I have previously mentioned, the KEY in the KEY/VALUE SDBM pair can be one or more fields and/or partial fields contained within the Flat File records. UNIQUE PRIMARY KEY, UNIQUE ALTERNATE KEYS, and ALTERNATE KEYS w/DUPLICATES are all supported.

      You can even store Bitmaps within your Flat File "text" database system, storing them as inline text, and making the conversion "on the fly" back to an image within your Perl DB application user-interfaces. PLEASE SEE: Perl Win32::GUI module by Aldo Calpini for the INLINE BITMAP CONVERSION UTILITY.

      I use Perl MIMEBASE64 encoding/decoding to store the Flat File data so it is non-readable. I also have a CUSTOM word-to-code encryption/compression routine (using SDBM) to store the data that way. Words are converted to Codes completely at random, and no two conversions of the same data will generate the same Word to Code mappings.

      You don't have to worry about paying for 3rd Party Software(and endless upgrades) nor end-user/site licensing, nor maintaining Data Access Components at each workstation. Nor is a DBA needed to maintain such a database system. The Perl Developer is in complete control.

      AS FAR AS REPORTING GOES... If the end-users already have access to MS-EXCEL, you could easily use Win32 PERL's, Win32::OLE module to setup an interprocess communication connection to MS-Excel in the background to send your data to formatted reports, pivot tables, charts, graphs, etc. I have done this for many years. If MS-Excel is not available, you can output to an RTF format WORDPAD file which comes standard on Windows O/S systems. Other FREE reporting options may be available (I have not researched this), including FREE Perl modules.

        So windows specific with zero advantages to existing database systems described in response your previous posts about this setup?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1201847]
Approved by beech
help
Chatterbox?
[M4ver1k]: I feel like being able to toggle a running script on the fly by updating variables from a config file using Config::Simple. But in the while loop, it seems like it doesn't update the variable after the first past. Is this generally just a bad idea?
[M4ver1k]: pass*

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2017-11-20 03:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:













    Results (283 votes). Check out past polls.

    Notices?