|Perl: the Markov chain saw|
You would? Not me.
Did you take a good look at all the data conversions and substrings and stuff going on in that SQL? SQL can be pretty optimal at performing comparison, that's its bread and butter work, but those types of data manipulations and conversions are not it's strong suite.
I attempted to verify my suspicions, but about half of the syntax in that article doesn't seem to be valid with the only SQL database I have available, but I'm betting (a coffee:) that it ain't quick on any platform.
I would hazzard that dumping the table using the export facilty and using a dedicated binary digest(or) program would be considerably faster.
Either way, once the determination of difference is made, you have still to correct it and that means transmitting the data. Easier, surer and possibly quicker to just zip up the dumped table and send it I think.
Unless the data involved is already compressed binary--jpgs or similar--then the 100GB would probably reduce to 25% or so, and transmitting 25GB at 100Mb/s will take 34 minutes, assuming no contention.
Running a dedicated md5 executable on 1GB takes around 20 seconds, so around 1/2 hour for 100GB, but that is calculating a single hash from a contiguous datastream.
You're suggesting calculating 2 hashes for every piece of data, retrieved in iddy biddy chunks and doing all the math in SQL?
In the absence of evidence to the contrary, my money would be on the transmission finishing long before the checksumming.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.