http://www.perlmonks.org?node_id=965886


in reply to Re^3: Testing database updates.
in thread Testing database updates.

Values from sequences (e.g. postgres' serial datatype) can (validly) vary and will often thwart the otherwise so attractively simple dump & diff approach.

And IMHO tobyink is right: without index, order is not garantueed, and I'd consider it unwise to rely on order in the database on the basis of the order of the source material.

Just a general remark: the OP doesn't give much detail, and doesn't mention what system he works with, but in case the system is PostgreSQL: it has a md5() function which can be handy for directly comparing data (and searching for differences via SQL).

Replies are listed 'Best First'.
Re^5: Testing database updates.
by BrowserUk (Patriarch) on Apr 19, 2012 at 08:37 UTC
    Values from sequences (e.g. postgres' serial datatype) can (validly) vary and will often thwart the otherwise so attractively simple dump & diff approach.

    Assuming a known starting position and the same sequence of operations applied to that starting position; and the same versions; would you not expect even such serial types to attain the same values and be dumped in similar order?

    Of course, the OP said: "I don't care about the order of inserts, or about any reads, just that the overall state at the end is what is should be.", which could mean he would choose to ignore subtle differences in sequence data-types; but were I refactoring such code, I'd at least like to be informed of any differences, before choosing to dismiss them. Assuming the problem actually exists in his dataset.

    I'd consider it unwise to rely on order in the database on the basis of the order of the source material.

    There is no risk attached because any such differences would be flagged as different; even if they turn out to be false negatives.

    My personal take is that I'd much rather start out looking at everything and explicitly choose what I can disregard -- probably through post processing the dumps -- than to prejudge the issues by preselecting what to compare.

    I'd also steer clear of tying myself to the facilities of any given DBMS if there is any chance that the process might need to be ported to others.

    I'd also avoid reliance upon digest type comparisons -- except as a quick check to avoid more detailed inspection, which can equally be applied directly to the dump files -- as a failure of digests to compare only tells you something is different. Not what.

    But this is just the way I would go about tackling the OPs problem. He is obviously free to choose between this and all the other suggestions offered; his own ideas; or any combination thereof.

    Feel free to add yours to the list.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?