Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Which is quicker - writing to a file or inserting into a db?

by Anonymous Monk
on Dec 13, 2007 at 23:00 UTC ( [id://656936]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a form on a website and I need to save the email of the user that's submitting the form.

Which is quicker and less likely to break if a deluge of visitors come to the site all at one time: Writing the email address to a file or inserting the email into a mysql db?

Thank you!

Replies are listed 'Best First'.
Re: Which is quicker - writing to a file or inserting into a db?
by Joost (Canon) on Dec 13, 2007 at 23:13 UTC
    Oh, assuming the actual disk speed is the same; well let's assume the database is on the same machine and writes to the same disk, I would guess that appending to a file is much quicker than inserting a record in the database.

    All bets are off once you need to check against existing records (basically starting when you need to see if an email has already been registered). For simple cases, it's still relatively trivial to beat mysql, but you're going to have to build a lot more infrastructure yourself, or use some simpler, faster (probably non-relational) database system, like BerkelyDB.

    update: you are already using mod_perl or some other persistent processing system, right? otherwise there's really not much point in even thinking about this.

Re: Which is quicker - writing to a file or inserting into a db?
by eserte (Deacon) on Dec 13, 2007 at 23:08 UTC
    Writing to a plain file is probably faster. Talking to a database has some overhead like passing data through a network/socket connection, parsing and interpreting SQL statements (although excessive parsing can be avoided by using a prepared statement). Also there's additional overhead in the underlying file structure of databases, and possible the index overhead.
      If I'm not checking whether the emails are duplicates or not (meaning duplicate emails are getting appeneded to the file)...does it matter that the files keeps getting bigger and bigger? Will that affect performance at all?

      I was thinking of appending to the file and then when the time comes just hashing everything to find the unique emails.

        There's no performance penalty in appending to a large file.

        But you should probably think about locking, as multiple processes simultaneously writing to the same file may cause problems. Or just make sure that each process writes to its own file, e.g. by appending $$ to the file name.

      Don't forget about file locking, a datbase can typically handle (with some caching) simultaneous writes to the same table, but simultaneous writes to a single text file from different processes is likely to fail.

      just another cpan module author
Re: Which is quicker - writing to a file or inserting into a db?
by KurtSchwind (Chaplain) on Dec 14, 2007 at 00:33 UTC

    There seems to be an assumption that the over-head of the db + the disk io of the db is going to be greater than the disk io alone. This is most likely true.

    However, it's entirely possible to cache db writes so that you can essentially subtract the disk io from the db. RAM is orders of magnitude faster than disk. And if you have extensive caching on the db, it really can out-perform straight disk io.

    I'm just sayin . . .

    --
    I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
      True, but (and this is partially a response to the post right above): once you've opened the file in a mod_perl or equivalent persistent process, there's no need to close and re-open it between requests, which may or may not shift a lot of IO into RAM anyway. And on an OS designed for high performance open/close may not even make much difference (except for the extra 2 calls).

      Which is to say, a RAM disk will probably not gain you much (if anything) once you much once you've got the basic IO strategy optimized. Provided your system isn't completely overloaded. And you really don't want to use a RAM disk or any serious write-caching in a situation where you're pushing the whole system to the limit. Assuming you would actually want to store the data somewhere where it's safe.

      speaking of, a ram-disk is another option if appending to a file seems slow.

      but that does mean having to write some file management bits to move your ram cache to disk at a faster rate than it can be filled.

Re: Which is quicker - writing to a file or inserting into a db?
by naChoZ (Curate) on Dec 14, 2007 at 03:37 UTC

    I recently did a poor man's benchmark just for kicks. The results might at least be interesting to you.

    I wrote this little script to populate a db with 50,000 rows into a db table of three columns (one just an autoinc id).

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 3; use Digest::MD5 qw/md5_base64/; use base qw/DBIx::Class::Schema::Loader/; DBIx::Class::Schema::Loader->loader_options( relationships => 1 ); my $schema = DBIx::Class::Schema::Loader->connect('dbi:SQLite:/tmp/tes +t'); #my $schema = DBIx::Class::Schema::Loader->connect('dbi:Pg:dbname=test +', 'postgres', '' ); #my $schema = DBIx::Class::Schema::Loader->connect('dbi:mysql:dbname=t +est', 'mysql', '' ); my @inserts; for ( 1 .. 50000 ) { push @inserts, { foo => md5_base64( $_ . 'foo' ), bar => md5_base64( $_ . 'bar' ) }; } my $new_row = $schema->resultset('TestTable'); $new_row->populate( \@inserts );

    Loading to postgresql-server-8.1.9-2.1 took just under 8 minutes, mysql-5.0.26-12 about 35 seconds, sqlite-3.3.8-14 even with synchronous mode turned off couldn't even seem to handle 1000 rows a minute (I didn't wait for it to finish all 50,000 rows). I reused this script just now for writing 50,000 lines to a flat text file and it was done almost before I pressed enter.

    Obviously YMMV. My workstation is a sun ultra 40 dual opteron with a pair of sata drives that I striped into a raid 0 and I'm running opensuse 10.2.

    --
    naChoZ

    Therapy is expensive. Popping bubble wrap is cheap. You choose.

Re: Which is quicker - writing to a file or inserting into a db?
by aquarium (Curate) on Dec 14, 2007 at 00:26 UTC
    typically you'd use a database when there is a need to do so...i.e. there's more to the data than a single table with a single column, and you want to query/update the data in different ways.
    as to performance...mysql is very fast...but for a single field append, you're not going to see any substantial difference in performance between mysql or plain file anyway.
    one thing a database does give you, without lifting a finger, is locking
    the hardest line to type correctly is: stty erase ^H
Re: Which is quicker - writing to a file or inserting into a db?
by chromatic (Archbishop) on Dec 14, 2007 at 07:53 UTC
    Which is quicker and less likely to break if a deluge of visitors come to the site all at one time: Writing the email address to a file or inserting the email into a mysql db?

    Your network connection is most likely to get saturated, unless you run out of file descriptors or available database connections first.

Re: Which is quicker - writing to a file or inserting into a db?
by pajout (Curate) on Dec 14, 2007 at 18:36 UTC
    I think that solution depends on some factors:
    1. How many requests do you want to REALLY serve? (10/sec? 1000/sec?)
    2. Should be saved data on-line immediately for further use? If not, how long time gap from their save to their usability is acceptable?
    3. How many http listeners will do this service?
    4. Are there some requirements for data consistency and integrity?
    5. Are there some scenarios of HW/OS/SW crash?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://656936]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-19 10:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found