Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

DB_File vs flat file in terms of speed

by kiat (Vicar)
on Oct 25, 2002 at 15:35 UTC ( [id://208042]=perlquestion: print w/replies, xml ) Need Help??

kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm new to perl's database modules and I'm now experimenting with 'use DB_File'.

I've been used to parsing data from a flat text file but am considering using a proper database such as DB_File.

Let's say I've 1000 entries stored as follows:

orange:fruit
bicycle:transport
movie:entertainment
abc:alphabet
.
.
and so on.

Let's say I want to retrieve the entry 'movie' and modify it. With a flat file, I would need to loop through the entire file to find a match and then make the modification.

With 'use DB_File' and using 'tie', I can locate the entry simply by using the key 'movie' as in
$key='movie'; $retrieved = $hash{$key};
to retrieve its corresponding value.

It appears then that using DB_File is a lot faster and more efficient. Am I on the right track with regards to how DB_File via 'tie' works and the merits of DB_File as a method storing and retrieving data?

I look forward to hearing your feedback and comments :)

cheers,

kiat

Replies are listed 'Best First'.
Re: DB_File vs flat file in terms of speed
by MZSanford (Curate) on Oct 25, 2002 at 15:44 UTC

    Am I on the right track with regards to how DB_File via 'tie' works and the merits of DB_File as a method storing and retrieving data?

    You have it working, so all of the syntax is correct. As for the usage of DB_File, it seems to me you have found it's exact usefulness. I tend to work in RDMS's, so i don't normally use DB_File, but you seem to have it's number ... keep it up.


    from the frivolous to the serious
      Thanks, MZSanford! I'll check out on RDMS :)
Re: DB_File vs flat file in terms of speed
by perrin (Chancellor) on Oct 25, 2002 at 15:51 UTC
    You are correct. Here's another tip for you: SDBM_File is much faster than DB_File when dealing with small keys (less than 2k each). Also, if you want to use a dbm file for both reading and writing in a multi-process situation (like CGI), use MLDBM::Sync.
      Thanks, perrin! Is SDBM_File a standard perl module just like DB_File?
        Yes, it is.
Re: (nrd) DB_File vs flat file in terms of speed
by newrisedesigns (Curate) on Oct 25, 2002 at 15:49 UTC

    MZSanford++

    Not only will DB_File keep your little duckling hashes in the proverbial row, but it's a much easier method of retrieval than a flat file database.

    There is a small file size difference. I recently moved away from flat files for storing my simple programs' information. My data files are now around 40KB for files that were once 2-4KB flat files. Regranted, on most PCs today, this kind of difference is insignificant, but on a hosted webserver, it may become a nuisance (if you have a lot of databases).

    Best of luck in your database endeavors.

    John J Reiser
    newrisedesigns.com

      Thanks, John! The increase in file size (from 2 to 40 KB) is atrocious :) So assuming server space is premium, one will have to weigh the speed and efficiency gained against the increase in file file...?

        I don't believe the database size is directly related to the flat file size. If you tie a %hash, put nothing in it, then untie, the db file will be around 32KB (at least for me: Win2K/ASPerl). I assume this 32KB is the data structure with no data. It grows slowly (not exponentially, thank God) as you put more data into the tied hash.

        Size isn't a pressing issue, just a thought. I have 125MB of data on my webserver, but have only used about 10 for the past year.

        Next on your plate should be learning SQL and the DBI module, so you can interface with mySQL, Postgres, Oracle, and Access. Dominus has a good article on SQL and DBI here

        Glad to be of service,

        John J Reiser
        newrisedesigns.com

      Hi John My apps have 100k entries In flat its looks like this(matched users): 3052002:1054002 3052002:2050002 1054004:2050002 1054004:1054002 and its size is 4M but when i use SDBM like this: $all_match{'3052002:1054002'}=1; $all_match{'3052002:2050002'}=1; $all_match{'1054004:2050002'}=1; $all_match{'1054004:1054002'}=1; The file size gets to 1G any idea how can i get it smaller? Thanks Eyal
      why its all in one line? thus is the format for one line: 3052002:1054002
Re: DB_File vs flat file in terms of speed
by princepawn (Parson) on Oct 25, 2002 at 16:23 UTC
    I cant help but answer a question you didn't ask. Now, if flexibility of slicing and dicing the data were the issue, then DBD::AnyData would put you far ahead of the game with flat files.

    But then again, wait, if DBD::AnyData is indeed for AnyData, then why not dbm_files as well?!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://208042]
Approved by fireartist
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-26 01:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found