Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: RDB updates: rebuild from scratch vs. incremental

by jfroebe (Parson)
on Aug 30, 2005 at 13:20 UTC ( [id://487745]=note: print w/replies, xml ) Need Help??


in reply to RDB updates: rebuild from scratch vs. incremental

Hi,

Performing incremental changes of your data is generally preferred over rebuilding the database every month. During the recreation of your database, your database would be unavailable for your users.

There are several good books on both Postgres and MySQL... my recommendation is to take a weekend and read one of the books. It would be good if you were to pick up a book on database design, else you will likely wish to redesign the database later on.

hope this helps :)

Jason L. Froebe

Team Sybase member

No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

  • Comment on Re: RDB updates: rebuild from scratch vs. incremental

Replies are listed 'Best First'.
Re^2: RDB updates: rebuild from scratch vs. incremental
by jhourcle (Prior) on Aug 30, 2005 at 15:27 UTC
    During the recreation of your database, your database would be unavailable for your users.

    Depending on how you rebuild your data, that may not be true.

    I reload a number of tables every morning, and the process takes about 25 minutes for me to acquire the data, and do the pre-processing for the application.

    For each table, as I am processing it, I first remove tables named backup_$name. I next load the data into temp_$name, and perform whatever processing needs to be done, such as sanitizing the data, generating derived values, and indexing. Once all of the tables are loaded, I loop through the list of tables and rename $name to backup_$name and rename temp_$name to $name.

    Although there's added load on the database every morning (both the one I'm exporting from, an the one that I'm importing to), both databases remain available through the process (I've yet to have anyone complain about issues when the tables are getting renamed, but there is that possibility ... it's still significantly less than the time that it takes to recreate the database).

    In my situation, I have the luxury of disk space -- it's possible that someone with a larger database may not have that. (I keep the backups not just for the ability to roll back, but also to make sure that other folks on the machine don't try to use the space that I'm going to need for the reprocessing)

    I'll go back to one my overused mantras -- There is no one best solution for every scenario. There are ways of doing this with either incremental updates, or a complete rebuild. Without knowing more about the nature of the data, I don't think it'd be good to recommend either one, because this data may be completely different from everything that I've ever worked with.

      I agree :) It would also depend on the chosen DBMS on how much time the tables would be inaccessable to the users. In your case, it is probably less than a second. On low utilized servers, this is probably more than acceptable.. in other scenerios, the loss of a half second could amount to be thousands of dollars (trading systems for example).

      Definitely, the possible solutions will vary :)

      Jason L. Froebe

      Team Sybase member

      No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://487745]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2025-07-14 08:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.