http://www.perlmonks.org?node_id=830879

Steve_BZ has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I've searched PM, but I can't see a good spirited discussion of Linux databases of the type that occur for IDE's from time to time.

So here it is:

I expect a good Linux or Cross-Platform database to:

  1. I guess I've missed a lot, so what do you expect from a database?
  2. What is your favourite Linux or cross-platform database?

Have a good day

Regards

Steve

Replies are listed 'Best First'.
Re: What is your favourite Linux or cross-platform database? (PostgreSQL)
by erix (Prior) on Mar 25, 2010 at 15:22 UTC

    PostgreSQL (http://www.postgresql.org/, see also http://wiki.postgresql.org/)

    • BSD license ( ultra-free )
    • the pgAdmin (gui) is nice enough; psql, the commandline client, is great.
    • *very* active & helpful community; reported bugs often fixed within hours. There is often instant help at #postgresql / freenode.
    • SQL:2011 ( see: SQL Conformance )
    • ultrareliable: yes
    • DBD::Pg (see: DBD::Pg )

    Postgres is supported on a wide range of platforms (see the distributed buildfarm status page (which continually tests if compiling is still faultless) - not even all platforms are represented)

    SQLite is only good for 1-person use, and then only with limited data size (the SQLite website honestly admits this). It is therefore IMHO not really a serious candidate for the epithet 'database', but YMMV, and you can see from the other replies that opinions vary wildly and widely. SQLite /is/ good for embedded use (which is a usage that postgres does offer).

    update: Added mild SQLite bashing & some wordsmithing.

    update (2012.10.27): s/SQL:2008/SQL:2011/g;

      Hi Erix,

      I guess I am being increasingly drawn to PostgreSQL, but I hadn't really considered SQLite until it came up in this discussion so prominently. I looked at the limits http://sqlite.org/limits.html, but they are really design limits, not data limits. For instance it doesn't seem to limit the number of rows in a table, just the number of columns. Have I got that right? And maybe the blob limit is a bit low if you want to embed videos or HiRes photos, but not if you just store a URI to the media which could be held in a directory.

      I'd be interested in any other details you have.

      As you see from my other posts I currently have Firebird, with DBD::InterBase, and the DBD just crashes too regularly for comfort.

      Thanks.

      Steve

        Per the 80/20 rule, I would go with SQLite for the majority case and then postgresql for the minority case. For the outlier 110+, Cassandra is looking good these days for NoSQL. For the majority NoSQL solution, I would vote for the module called y_serial at http://yserial.sourceforge.net -- definitely the easiest to implement, and fast in terms of development time. As for the BLOB issue, yserial seems to handle it well via compression. From a database performance perspective, you want BLOB under 2M, otherwise one is better off using files.
Re: What is your favourite Linux or cross-platform database?
by BrowserUk (Patriarch) on Mar 25, 2010 at 16:24 UTC

    FWIW: I'll testify to (on windows) the portability of PostGreSQL; the competence and helpfulness of the support (including Win32 issues!); the usability of the graphical tools; the completeness of the implementation (triggers, cursors; stored procedures (including the ability to write them in Perl)); simple and fast (binary) install/uninstall.

    I did have some trouble compiling DBD::Pg at one point (circa 5.8.5), but even that was bypassable because of DBD::PgPP. I had no problem building DBD::Pg for 5.10 64-bit or 5.8.9 32-bit.

    I don't have much occasion to use it, but when I have, it has been a pleasure.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: What is your favourite Linux or cross-platform database?
by Corion (Patriarch) on Mar 25, 2010 at 15:20 UTC

    I use SQLite. The files are cross-platform, there are viewers for many operating systems and even Firefox nowadays, and it doesn't require any server software to be constantly running. The SQL support is Good Enough. Reliability is OK, as SQLite journals its changes into a separate file until a transaction is committed. Concurrency is very limited though.

Re: What is your favourite Linux or cross-platform database?
by moritz (Cardinal) on Mar 25, 2010 at 15:26 UTC
    Another thing I expect is: Get a good error message when something goes wrong.

    Sounds trivial, but is far from being wide spread - for example there are cases where violated permissions lead postgres to silently do nothing, instead of throwing an error. (At least a colleague of mine observed that; which is the reason why we did the last project with Firebird. Firebird is great in many ways, but the Perl driver (DBD::InterBase) sucks). Also Mysql with the myisam driver parses lots of things and then ignores them - not good, IMHO.

    Anyway, SQLite is quite good, Postgres is quite good (I've never encountered the permissions problems myself, so I don't know how much of a problem they are in the wild). Firebird is also quite good (and rock solid), but in some regards a bit ancient, and as mentioned above, the Perl driver is not so good.

    Perl 6 - links to (nearly) everything that is Perl 6.

      I agree about Firebird and DBD::InterBase, Firebird seems great (and on Windows it works very well with ODBC), but on Linux DBD::InterBase crashes randomly. I also tried unixODBC, but that was even worse. So it seems a little sad that what seems to be such a nice database with good support and tools, is marred by such a poor Perl interface.

      Steve

        Out of interest, what ODBC driver did you use and why was it so bad?

Re: What is your favourite Linux or cross-platform database?
by marto (Cardinal) on Mar 25, 2010 at 15:26 UTC
Re: What is your favourite Linux or cross-platform database?
by jpepersack (Initiate) on Mar 25, 2010 at 16:41 UTC
    There's no simple answer - it depends on your application.
    • What is the size of the data set? How fast is it growing?
    • What is the access pattern? (reads vs inserts vs updates). How is usage volume going to scale over time?
    • Is it transaction-oriented, batch-oriented, or both?
    • What resources are available (EG dedicated DB hardware, full time DBA)?
    • What are your backup, disaster recovery, and availability requirements?
    • Do you need to support ad-hoc queries (versus having a set of pre-defined operations)?
    • Do you need replication? If so, what kind, how much latency, etc?

    Postgres and MySQL are good for moderately sized data sets under moderate transactional usage and ad-hoc queries. Heavy transactional usage, larger sets of data, complex multi-site replication, or high availability are best served by one of the commercial databases (Sybase, Oracle, etc). Small to moderate sets of transactional data with low utilization are fine using something like SQLite.

    If you don't need ad-hoc queries or strict ACID transactions, a NoSQL type solution might be better. For smaller data sets, a non-relational database like BerkeleyDB may be a better choice. IF you need extreme scalability (multi-terabyte data sets), a bigtable implementation (HBase, Cassandra, Voldemort, etc) is probably something you should consider. For ad-hoc data analysis against a large set of data, a columnar database like Sybase IQ or Vertica might be the best choice.

Re: What is your favourite Linux or cross-platform database?
by JSchmitz (Canon) on Mar 25, 2010 at 17:40 UTC
    Since you didn't specify relational databases I am going to say Cassandra DB

    Cassandra can be thought of as a huge 4-or-5-level associative array, where each dimension of the array gets a free index based on the keys in that level. The real power comes from that optional 5th level in the associative array, which can turn a simple key-value architecture into an architecture where you can now deal with sorted lists, based on an index of your own specification. That 5th level is called a SuperColumn, and it's one of the reasons that Cassandra stands out from the crowd.

    Cassandra has no single points of failure, and can scale from one machine to several thousands of machines clustered in different data centers. It has no central master, so any data can be written to any of the nodes in the cluster, and can be read likewise from any other node in the cluster.
    It provides knobs that can be tweaked to slide the scale between consistency and availability, depending on your particular application and problem domain. And it provides a high availability guarantee, that if one node goes down, another node will step in to replace it smoothly.

    Writing about all the features of Cassandra is a whole different post, but I am convinced that its data model is rich enough to support a wide variety of applications while providing the kind of extreme scalability and high availability features that few other databases can achieve--all while maintaining a lower latency than other solutions out there.

    Cheers,

    Jeffery Schmitz

      You say:

      Since you didn't specify relational databases

      but he explicitly stated:

      • Support SQL92

      update: (Btw, you should be ashamed to copy & paste such a large block of text from someone else, and not even mention where it comes from)

        You are correct I missed that
        The huge block of txt you speak of is from here:
        http://www.planeterlang.org/en/planet/article/My_Thoughts_on_NoSQL

        Cheers -
Re: What is your favourite Linux or cross-platform database?
by JavaFan (Canon) on Mar 25, 2010 at 17:58 UTC
    I guess I've missed a lot, so what do you expect from a database?
    There's only one thing that's important: it should keep my data safe. (This implies having ACID, reliability, etc). Everything else is a secondary feature - although depending on the project, those secondary features could be essential as well.

    Features you haven't mentioned: scalability, replicatability, accountability (having the option of turning on auditing), fine grained permission system, vendor support.

    Totally irrelevant features for determining whether a database is 'good' or not: its price (your first point); whether it has a good support community; whether there's a Perl interface (that may be important to you, but the existence of such an interface doesn't alter the quality of the database).

    As for my favourite database, I'd go with Sybase. But I've only experience with a handful of databases, some of which are unavailable to the general public. And there are many databases I've never worked with. But I've no idea whether my favourite database is suitable for your project. A database for the website of the local chapter of the association of amateur chicken breeders will have different requirements than the database an international telco uses for billing purposes.

    Now, tell me again, what does this have to do with Perl?

Re: What is your favourite Linux or cross-platform database?
by nikosv (Deacon) on Mar 28, 2010 at 16:00 UTC

    I am biased towards Ingres since I use it at work.

    It is cross platfform as it supports many Linux and Unix flavors as well as Windows

    SQL92 with propriety extensions but can be tweaked to allow for strict 92 compliance

    Many providers (DBD,ODBC,.NET,JDBC) to interact with, as well as embedded SQL with C and a legacy propriety language called 4GL that can create 'green screen' "guis" in Unix which now is revamped into Openroad which can be used to built rich GUIs

    ANSI isolation level of course but in version 10 it will incoorporate MVCC (Oracle like) transaction model

    Great active support forums and newsgroups. When you post a question you get an immediate reply from their support

    It comes in two licenses, open source and commercial. It is being promoted given emphasis to the open source part but frankly I think that it is just a publicily stunt as I don't think that anyone outside Ingres can provide code for the dbms internals/backend

    The weak point is that there are no books available;the first book on Ingres (sql) was released last year and it's a good one too. So there is no book like step by step guide or howtos; 5 years back you would excusivelly rely on the manuals but nowdays they release a lot of information online (wiki style,web presentations,guides etc)

Re: What is your favourite Linux or cross-platform database?
by Steve_BZ (Chaplain) on Mar 27, 2010 at 20:14 UTC

    So in summary, updated at as at 02:20 hrs 1/4/2010 GMT we have:
    DatabaseFans and votes
    PostgreSQL & DBD::Pg37 + 27 = 64
    SQLite & DBD::SQLite27 + 20 = 47
    Cassandra DB11
    Sybase10
    Ingress6
    MySQL0

    on this basis, I'll download PostgreSQL and give it a trial. It's interesting that nobody mentioned MySQL.

    Thanks

    Steve

      It's interesting that nobody mentioned MySQL.

      Hmmm, sounds like good news. *SCNR*

      No, really. MySQL has some strengths, especially its raw read performance is mentioned repeatedly. But it has a lot of gotchas due to over-optimizing for some rare use cases.

      Oracle also wasn't mentioned here. There are some free versions of Oracle, I don't know how they are called today, I think something like "Personal Edition" or "XE". I've worked with Oracle in at least three different projects, and I like it, except for some minor annoyances. Empty strings sometimes mutate into NULL, you can't (always) use column aliases in WHERE or HAVING conditions, and so on. And Oracle is *FAT*. Why does a relational database have to install its own versions of Apache, Perl, Java, and tons of other software?

      I really like PostgreSQL. Nice, clean, fast and lean. Free as in beer and free as in speech. The pgAdminIII GUI often comes bundled, or you can download it as source. Compiling PostgreSQL is just fun, pgAdminIII has some dependencies that have to be resolved. And newer versions of PostgreSQL have the Auto-Vaccum feature, so you don't have to run VACUUM manually any more. Alas, there are some minor gotchas, too.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        IMHO, that (2007) postgresql-'gotchas' webpage is not very useful anymore: almost all of these so-called 'gotchas' were relevant only to pretty old versions of Pg.

        Here is a list of release dates:

        pg version 7.4:  2003
        pg version 8.0:  2005
        pg version 8.1:  2005
        pg version 8.2:  2006
        pg version 8.3:  2008
        pg version 8.4:  2009
        

        Running a version older than 8.3 is rarely necessary.

        Re: comparison PostgreSQL - Oracle: important things that are missing or not good enough in current (=8.4) postgresql:

        • Replication: there are (good) replication solutions, but all are outside projects. Version 9.0, scheduled for later this year, will contain native replication.
        • In-place upgrades: Upgrades need a dump and restore - and not all databases can afford that downtime. (Version 9.0 will probably have an in-place upgrade facility.)
        • Partitioning: possible, but limited to a few hundred partitions.

        Then again, compared to Oracle, PostgreSQL is much more immune to over-deployment ;-)

        Update (2012.10.27):

        PostgreSQL now has native replication (both asynchronous and synchronous (it cannot do synchronous multimaster).

        In-place upgrade is also now provided.

        Partitioning remains a somewhat weak point. (although it works well enough for many common scenarios)

        Update (2018.10.27):

        PostgreSQL 10 has logical replication (i.e., you can limit replication to only the necessary tables)

        PostgreSQL 10 has much improved partitioning.