Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Wisdom sought on migrating from text files to Berkeley DB or SQL

by punkish (Priest)
on Dec 23, 2004 at 02:34 UTC ( #416983=perlquestion: print w/ replies, xml ) Need Help??
punkish has asked for the wisdom of the Perl Monks concerning the following question:

I have a program that processes emails and extracts info from them. I have been storing all the config info, and the log of actions (datetime, info about the emails, success of processing, total number of messages processed, etc.) in text files thus far. Works mostly fine.

I have been refining this work, and am considering moving all of the above to a more structured db. The first obvious answer is SQLite... everything self- contained, blah blah. However, I have been trying Bdb (using just DB_File). Works fine with the little mods I have made thus far, e.g., moving the config info from the text file to a file-based hash.

I've also built a web-based interface to set the config values, but eventually I would build one to also monitor the progress of the program. The kind of questions that I would like answered would be --

config params: hash total number of messages processed: int total number of messages processed today: int total number of messages processed in this run: int total number of messages processed between ? and ?: int started program on: datetime action record: array of hashes

I seek the following wisdom: is Bdb a good or even a possible tool here?

On a related note -- I find the whole concept of Bdb very fascinating. Perhaps because it is a novelty to me after years of getting bored by rdbms and SQL. However, I find little or no discussion of using Bdb as the backend of websites. The reasons seem obvious -- Bdb is not relational, and while some relational stuff can be emulated, well, heck, it was just not designed to answer the kind of questions a rdbms can. Still, there is something elegant about everything being contained in a hash that can be loaded in memory, all self-contained, clean... like a single, shiny object.

Any insights?

Comment on Wisdom sought on migrating from text files to Berkeley DB or SQL
Download Code
Re: Wisdom sought on migrating from text files to Berkeley DB or SQL
by aquarium (Curate) on Dec 23, 2004 at 03:48 UTC
    unless you are storing a single fact about a key value, your shining object will quickly turn into routines for parsing and manipulating fields within a packed field...with all the nasties to get around, especially if you start packing more than one relationally normalized table into this single Bdb file
    the hardest line to type correctly is: stty erase ^H
Re: Wisdom sought on migrating from text files to Berkeley DB or SQL
by perrin (Chancellor) on Dec 23, 2004 at 04:35 UTC
    Berkeley DB, if used via the BerkeleyDB API and not a tied DB_File approach, is faster than an RDBMS. An RDBMS is flexible, and lets you answer more questions without changing things around. The kind of questions you want to answer may be possible to handle by using a sorted BTree for your BerkeleyDB database, but this will involve stepping through the records. They would be likely be simpler with an RDBMS.

    For the record, eToys.com used Berkeley DB for caching data, and imdb.com used to use it extensively. I don't know if they still do.

      For the record, eToys.com used Berkeley DB for caching data, and imdb.com used to use it extensively. I don't know if they still do.

      So, did they use a regular rbdms to store the data, and to construct "answers" to complicated but frequently asked questions, and then they cached those answers in a Berkeley DB to quickly retrieve and display?

      I can imagine that even trying to do simple rdbms things like MAX(), MIN(), BETWEEN, and GROUP kinda SQL stuff would be headache-inducing with BDB.

        In the case of eToys, it was a cache of frequently used data from an Oracle RDBMS. There are people doing serious work with Berkeley DB as their primary database, but I think they are more in the scientific or embedded systems fields.
Re: Wisdom sought on migrating from text files to Berkeley DB or SQL
by archen (Pilgrim) on Dec 23, 2004 at 14:24 UTC
    I use BerkelyDB for a LOT of stuff, especially configuration info. It's fast and easy and certainly less of a pain in the ass to set up, but just keep in mind that there are some drawbacks as well. For a time I found myself creating pseudo-logic structions by accessing data by $db{"$key.$val"}. Using such a structure is cute but becomes hard to manage rather quick. The other issue is fixing incorrect data. Because there is no direct access to the database file itself, you might need to keep this in mind. I made a program myself that allows me to fix/add/delete stuff from the database, but as far as I know there isn't any defacto program that anyone has made.

    The other big drawback is that BerkelyDB databases aren't very portable. I'd generally assume that once you make a db file on one server, that it's only going to work on that server. Making a raw export into text might be something to consider for a backup plan.

    Now for what you want to do, storing simple things that you want to access via a key (like configuration info and simple mail dump) BerkelyDB works REALLY well. Where it sucks is when you need to get meaningful information out of stuff that is locked up in your database - which means you end up doing stuff like rather messy loops and such. So for what you do now it would be great, for what you want to do in the future, maybe not. I guess that depends on how many db files you plan on breaking this up into, and how you structure your keys.

    As an aside an alternative thing to think about is using XML , although slower, it's a lot more fault tollerant.
      BerkelyDB is very susceptible to frequent corruption and difficult recovery, in fact we could say it "Use Berkely DB if you don't have crashed in your apps and that too you don't care of application data after crashes"

        BerkelyDB is very susceptible to frequent corruption and difficult recovery, in fact we could say it "Use Berkely DB if you don't have crashed in your apps and that too you don't care of application data after crashes"

        malarky

        Not if you use BerkeleyDB 4.4 and higher
Re: Wisdom sought on migrating from text files to Berkeley DB or SQL
by jZed (Prior) on Dec 23, 2004 at 15:11 UTC
    As you mention "some relational stuff can be emulated" - the DBI distribution now includes DBD::DBM which layers a DBI/SQL interface over any DBM file, including BerkeleyDB files. It can operate either directly on the file, or make use of MLDBM with a serilaization method of your choice (Storable, Data::Dumper, etc.). The SQL is limited, but certainly covers the kinds of aggregate functions you mention as well as several kinds of joins.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://416983]
Approved by monkfan
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (12)
As of 2014-07-29 17:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (225 votes), past polls