http://www.perlmonks.org?node_id=907857


in reply to Verifying 2 databases are in sync with brute force

Here is a comparison between two large tables in two different instances. One instance is postgres 9.0.3, the other 9.1beta (from git://git.postgresql.org/git/postgresql.git).

The table compared (named "public.x") has 230M rows.

This approach may not particularly fast, but it is simple. (Make sure the order by phrase is such that the order is determined.) And it just compares the table data itself -- there may still be differences, in indexes, for example.

(Obviously, -h and -p parameters are specific to my setup.)

-- -- (psql): showing rowcount: -- $ select to_char(count(*), repeat('G999',4)) as rowcount from public.x +; rowcount ------------------- 232,765,672 (1 row) # # (bash): dump tab-delimited file from the 9.1 instance table # $ time echo "copy (select * from public.x order by 1,2,3) to stdout delimiter E'\t'" \ | psql -qtA -h /tmp -p 6542 -d testdb > public.x_on_testdb.txt real 42m47.107s # # (bash): dump tab-delimited file from the 9.0.3 instance table # $ time echo "copy (select * from public.x order by 1,2,3) to stdout delimiter E'\t'" \ | psql -qtA -h localhost -p 55432 -d ms > public.x_on_ms.txt real 57m49.028s # # (bash): compare the two output files # $ time md5sum public.x_on_testdb.txt public.x_on_ms.txt 3313cba4d2a1cb14b5ba3dcc5378c33e public.x_on_testdb.txt 3313cba4d2a1cb14b5ba3dcc5378c33e public.x_on_ms.txt

Another approach would be to use postgres' md5 function (documentation), and dump the md5 of a text representation of every row, and compare those.

HTH... and I am interested how you eventually do the comparison; perhaps you could let us now how it goes?