Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Need DBM file that holds data up to 50,000 bytes

by bulrush (Scribe)
on Aug 11, 2014 at 14:33 UTC ( [id://1096995]=perlquestion: print w/replies, xml ) Need Help??

bulrush has asked for the wisdom of the Perl Monks concerning the following question:

Perl 5.8.8 on Redhat RHEL Linux 5.5.56

I'm using SDBM to store data, but it can only store data where the key and data length is less than 1008 bytes. I've hit that limit. I'd like to use a hash-like storage, but what are my options here?

  • Super Search on Perlmonks back to 2005 years did not help.
  • ODBM, NDBM, GDBM all have the same limit of 1008 bytes.
  • MLDBM seems to have a file size limit of 2GB. That may not be enough. See this Perlmonk question.
  • BerkeleyDB seems promising but I couldn't find a limit on the data per key or number of keys.
  • The data field should hold 50,000 bytes just in case. There will be about 1000 keys. The largest data size for a single key is unknown since I've never done this before.
  • For the future, I'd like to use a hash-tied DB which has very large limits. I might need them.
  • Speed is not critical.
  • Does DBD::SQLite require installing SQL or anything else?
  • I'd prefer to do a hash-like structure, but I'm willing to use a database via Perl also. I know I need to be flexible and plan for the future.
  • I'm looking for any and all ideas. Please don't stop commenting just because someone proposed a solution. There are many ways to do things.
  • I'm the only programmer (and user of the software), so I don't have to adhere to all programming rules specific to a large group of programmers.

Thank you!

EDIT: Looks like DBD::Sqlite has no limit on TEXT fields. Is there anything I should watch out for using DBD::SQLite? Through very basic testing I did notice:

  • I have to create my own table for a new db file.
Perl 5.8.8 on Redhat Linux RHEL 5.5.56 (64-bit)
  • Comment on Need DBM file that holds data up to 50,000 bytes

Replies are listed 'Best First'.
Re: Need DBM file that holds data up to 50,000 bytes
by marto (Cardinal) on Aug 11, 2014 at 14:37 UTC

    "Does DBD::SQLite require installing SQL or anything else?"

    From the DBD::SQLite docs:

    "DBD::SQLite is a Perl DBI driver for SQLite, that includes the entire thing in the distribution. So in order to get a fast transaction capable RDBMS working for your perl project you simply have to install this module, and nothing else."

Re: Need DBM file that holds data up to 50,000 bytes
by Tux (Canon) on Aug 11, 2014 at 14:50 UTC

    I will also second DBD::SQLite. You can ease the transition using Tie::Hash::DBD, which should be a drop-in replacement for SDBM.

    use Fcntl; use SDBM_File; tie my %hash, "SDBM_File", "file.sdbm", O_RDWR | O_CREAT, 0666; --> use Tie::Hash::DBD; tie my %hash, "Tie::Hash::DBD", "dbi:SQLite:dbname=file.sqlite";

    Read the manual for persistence options.


    Enjoy, Have FUN! H.Merijn
Re: Need DBM file that holds data up to 50,000 bytes
by jfroebe (Parson) on Aug 11, 2014 at 14:43 UTC

    I second using DBD::SQLite.

    If you're stuck using some sort of *DBM, then consider doing what we did in the old dBase days: split the field into two or more fields and combine them in your Perl code.

    Jason L. Froebe

    Blog, Tech Blog

Re: Need DBM file that holds data up to 50,000 bytes
by tangent (Parson) on Aug 11, 2014 at 15:24 UTC
    BerkeleyDB seems promising but I couldn't find a limit on the data or number of keys
    The limit on data is somewhere in the terabyte range. I don't think there is a limit on the number of keys (I have one with 70 million keys). SQLite is excellent but if you want a drop-in replacement DB_File or BerkeleyDB is the way to go.

      I agree with that statement for BerkeleyDB, but I wrote Tie::Hash::DBD as I ran into serious limitations when I used DB_File on a system with low resources. Those limitations caused the complete hash to be invalid.


      Enjoy, Have FUN! H.Merijn
        I haven't come up against those limitations myself, but then again I migrated all of my DB_File usage to BerkeleyDB some time ago.
        what limitations, got link?
      Are you saying I can have a single hash key that holds terabytes of information, limited only by my hardware and OS capabilities?
      Perl 5.8.8 on Redhat Linux RHEL 5.5.56 (64-bit)
        Max file size is 256 terabytes.

        From the FAQ:

        Are there any constraints on table size?
        The table size is generally limited by the maximum file size possible on the file system.

        Are there any constraints on the number of records that can be stored in a table?
        There is no practical limit. It is possible to store the number of records that can be indexed in a signed 64-bit value.

        Are there any constraints on record size?
        There is no practical constraint. The maximum length of a string or blob field is 1 billion bytes.
        If you've got a 64-bit Perl build running on a 64-bit OS, then yes, that's indeed the case. That said, tangent was talking about BerkeleyDB's limitations, not Perl's.
Re: Need DBM file that holds data up to 50,000 bytes
by erix (Prior) on Aug 11, 2014 at 15:41 UTC

    DBD::SQLite is very easy. Not so scalable but as you talk about only 1000 keys that's unlikely to become a problem.

    Alternatively, you could go for a real database server and use PostgreSQL, which has the excellent hstore datatype. hstore is basically an associative array (=hash).

    Of course, with postgres you have a server on your hands that will need some maintainance. It has much more possibilities and performance and much less limitations but it is not nearly as easy as SQLite.

      I was asking myself exactly the same question: 'why not one of the bigs like mysql or postgres or firebird or...?'

      Alternatively you could consider a NoSQL database (i.e Mongo) or even plain perl for this. As you said that speed is secondary, and those use text files for storage, a 'terabyte level' size file (much more that what you need probably) is guaranteed in most systems

      mongo tutorial (CPAN)

        Though maybe interesting, Postgres' hstore feature is a language on itself and does not easily integrate with how other access methods work. There is Pg::hstore, but the API is IMHO not very obvious. It for sure is not an easy replacement for DB_File.

        In my perception *all* databases suck. Not all suck the same way, but there is no perfect database (yet). You will need to investigate your needs before making a choice. Oracle has NULL problems (and is costly), MySQL does not follow ANSI in its default configuration and uses stupid quoting, Postgres will return too much by default on big tables, Unify does not support varchar, CSV is too slow, SQLite does not support multiple concurrents sessions, Firebird has no decent DBD (yet), DB2 is bound to IBM, Ingres has not many users in the Perl community etc etc.

        Too many factors to think about. For a single-user easy DB_File replacement, BerkeleyDB comes first, then Tie::Hash::DBD in combination with DBD::SQLite. I say so because neither needs any special environment or configuration. Once you choose a major DB (whatever you choose), you will need additional knowledge or services. My choice then would be Postgres, as it is the easiest to work with and confronts me with the least irritation.

        Nobody mentioned other alternatives yet:


        Enjoy, Have FUN! H.Merijn
Re: Need DBM file that holds data up to 50,000 bytes
by erix (Prior) on Aug 12, 2014 at 12:36 UTC
    Looks like DBD::Sqlite has no limit on TEXT fields. Is there anything I should watch out for using DBD::SQLite?

    Datatype 'text' is unlimited; if you want a limited text column, I think you have to use VARCHAR, e.g.: VARCHAR(100).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1096995]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-19 01:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found