Re: Verifying 2 databases are in sync with brute force

in reply to Verifying 2 databases are in sync with brute force

Here is a comparison between two large tables in two different instances. One instance is postgres 9.0.3, the other 9.1beta (from git://git.postgresql.org/git/postgresql.git).

The table compared (named "public.x") has 230M rows.

This approach may not particularly fast, but it is simple. (Make sure the order by phrase is such that the order is determined.) And it just compares the table data itself -- there may still be differences, in indexes, for example.

(Obviously, -h and -p parameters are specific to my setup.)


--
--  (psql): showing rowcount:
--

$ select to_char(count(*), repeat('G999',4)) as rowcount from public.x
+;
    rowcount         
-------------------
       232,765,672
(1 row)

#
#  (bash): dump tab-delimited file from the 9.1 instance table
#
$ time echo "copy (select * from public.x order by 1,2,3)
    to stdout delimiter E'\t'" \
  | psql -qtA -h /tmp -p 6542 -d testdb > public.x_on_testdb.txt

real    42m47.107s

#
#  (bash): dump tab-delimited file from the 9.0.3 instance table
#
$ time echo "copy (select * from public.x order by 1,2,3)
    to stdout delimiter E'\t'" \
  | psql -qtA -h localhost -p 55432 -d ms > public.x_on_ms.txt
    
real    57m49.028s

#
#  (bash): compare the two output files
#

$ time md5sum public.x_on_testdb.txt public.x_on_ms.txt
3313cba4d2a1cb14b5ba3dcc5378c33e  public.x_on_testdb.txt
3313cba4d2a1cb14b5ba3dcc5378c33e  public.x_on_ms.txt
[download]

Another approach would be to use postgres' md5 function (documentation), and dump the md5 of a text representation of every row, and compare those.

HTH... and I am interested how you eventually do the comparison; perhaps you could let us now how it goes?

Comment on Re: Verifying 2 databases are in sync with brute force Download Code

In Section Seekers of Perl Wisdom