key value in text format

pwagyi has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

I would like to seek advice from monks regarding problem I'm trying to solve. I am developing perl application to extract some information from raw binary data, and store extracted data. i.e persist data somehow.

so I've choice of using relational database, flat text file, CSV/JSON/XML. The nature of data is each record has fixed field + optional key-value pairs (in future may also expand).

example record
  field1, field2, field3,  [ key1=value1, key2=value2, ...]
[download]

There may be multiple records (10K or so) extracted from each binary file. I need quick lookup of record using key (key is a composite of some fixed fields), from records generated from all binary files. It's also required to know which binary file (file name) is this record come from.

So what data file format experienced Monks would recommend ? Should I use Storable/ Berkley DB? Should I have flat text file for each binary file, store some header info (like which binary file source in header once only).?

Thanks!

Comment on key value in text format Download Code

Replies are listed 'Best First'.
Re: key value in text format by GrandFather (Saint) on Nov 06, 2019 at 06:13 UTC
Sounds like a good candidate for DBD::SQLite. Text based formats aren't great for storing binary data and are pretty bad for "quick lookup". SQLite is light weight to install and get going. In fact it is built into DBD::SQLite so there is no other installation and configuration! Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re^2: key value in text format by pwagyi (Monk) on Nov 06, 2019 at 06:35 UTC
Yes, it sounds great. SQLite is quite lightweight too. Just to clarify, records data are only textual (numeric/string) only. But how would Relational DB like SQlite address potential sparse key -value pair ??	[reply]
Re^3: key value in text format by Corion (Patriarch) on Nov 06, 2019 at 08:40 UTC
I think ideally you will model your (composite) key as separate columns. Then you can query either all key columns or just a subset of them. If you want to keep things simple, keep the optional key-value pairs at the end as a single string. If you want to also query them, an approach a slight step better is to format and store them as JSON. Then you can query them in the database almost as if they were additional columns. The ideal way is to convert these optional things either into a fixed set of additional columns or add another table that consists of three columns, (row-key, keyname, value). But doing that makes the queries somewhat more ugly.	[reply]
Re^4: key value in text format by Don Coyote (Hermit) on Nov 06, 2019 at 10:32 UTC
Re^3: key value in text format by The_Dj (Beadle) on Nov 06, 2019 at 15:34 UTC
I second SQlite. I use it a lot, and if your dataset is big(ish), you don't want to be parsing flat files for every lookup. How you store this depends on just how many `[ key1=value1, key2=value2]` there are. If there is a fixed list of '`key1`','`key2`' and it's not too many: bite the bullet and add the columns. Otherwise, you get to use a link (This is Database 101. search for 'database normalization') each record should be `[id], field1, field2` then in a separate table `[link_id], key1, value1 [link_id], key2, value2` [download] You can also have a 3rd table with `[link_id], source_file_name` or whatever other metadata you need to keep. SQLite very kindly has a magic `ROWID` column. It wont' return it on `SELECT `, but will on `SELECT ROWID,`. Of course you get to do two queries for each lookup: `SELECT ROWID,* FROM main_data` and then `SELECT * FROM extra_data WHERE link_id = ?` HTH. TO clarify: if your 'real' data is `name=bob, age=75, [ hair=balding, glasses=bifocal] name=john, age=20, [ sport=chess ]` [download] Then you get: Primary table: `1, bob, 45 2, john, 20` [download] lookup table: `1,hair,balding 1,glasses,bifocal 2,sport,chess` [download]	[reply] [d/l] [select]
Re^4: key value in text format by pwagyi (Monk) on Nov 07, 2019 at 02:05 UTC
Re^5: key value in text format by The_Dj (Beadle) on Nov 11, 2019 at 02:56 UTC
Some notes below your chosen depth have not been shown here
Re^2: key value in text format by pwagyi (Monk) on Nov 07, 2019 at 02:13 UTC
I'm not sure I should create a new post. Let's say I'm using SQLite, but SQLite supposedly cannot be reliably locked on NFS so multiple processes could not safely manipulate database w/o corruption. Since the application may be running multiple instances to crunch data files and inserting to a database file. If I create individual database file for each raw file, then I've to merge them; and need to manage parent/foreign key constraint.. The final database file should reside on NFS, so other applications can access (read).	[reply]
Re^3: key value in text format by GrandFather (Saint) on Nov 07, 2019 at 02:27 UTC
That just raises the DB requirements and puts any text file based solution right out of the running. The solution is essentially the same, but use one of the other DBD:: modules to access your database of choice. Once you get beyond simple requirements that can easily be met by SQLite you will find fairly strong division between proponents of different databases. The saving grace is that you can do a lot of development against DBD::SQLite then migrate to using the more serious database of your choice when the system needs to hit the NFS. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re^4: key value in text format by pwagyi (Monk) on Nov 07, 2019 at 05:57 UTC
Re^5: key value in text format by choroba (Cardinal) on Nov 07, 2019 at 08:51 UTC
Some notes below your chosen depth have not been shown here


The stupid question is the question not asked
	PerlMonks