Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: RDB updates: rebuild from scratch vs. incremental

by anonymized user 468275 (Curate)
on Aug 30, 2005 at 15:01 UTC ( [id://487789]=note: print w/replies, xml ) Need Help??


in reply to RDB updates: rebuild from scratch vs. incremental

Although the updates should be performed incrementally, they need to be calculated in bulk. The approach I usually take to this is as follows:

Process the files from other sources into a load format that suits the DBMS you choose. You might have a directory for this named by the download date (most operations go for daily upload from other sources rather than monthly, to avoid delay while processing a big backlog and to stay up to date)

These files should therefore be one file per table, PK columns in their proper order and sorted for the next step:

To calculate an incremental load, use the unix command com command with option -23 for deletes and -13 for inserts (the updates will then be represented as a delete and an insert) e.g. for the deletes:

for ( glob path/YYYYMMDD/* ) { my @split = split( '/' ); my $file = pop @split; open my $com, "com -23 $_ path/$prv_bus_day/$file |"; for my $delrow ( <$com> ) { # build the delete statement from $delrow } close $com; }
To build the delete statements, you will need to query the metadictionary of the database to get its PK which has to be matched with the columns retrieved from the above pipe.

One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://487789]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2025-07-12 17:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.