http://www.perlmonks.org?node_id=607776


in reply to locales, encodings, collations, charsets... how can I match a given MySQL collation?

Yes, sorting can be tricky between two different environments. Note that this is not just between Perl and MySQL, because the way a database gets sorted can be subject to locale on the database server side (so 2 databases can be sorted differently) and it can get even worse.

So you need a solution that does all sorting in one environment or does not rely on sorting at all. Here are some options:

1) if the tables are relatively small so that they could fit into memory consider to have Perl do the matching for you. Load the row data of table A into a hash A (key=col1), then get each row from table B (the other server), and check if it exists/matches data in hash A. If so then data matches and you can delete the hash A key. If there was no match then you have hit an entry unique to one table.

2) for larger tables you will need to rely on the database server to perform match operations for you. See if you can copy the content of the table in database A to a temporary table in database B. Next compare the content of the two tables in database B via a select statement that returns rows that do not match.

3) if all that is not feasible then your last resort is to select all rows from table A, and foreach row in table A do a lookup in table B via a SELECT WHERE on the indexed key in table B to see if there is a corresponding row. When finished you will have to do the reverse as well to find rows in table B that do not have a matching row in table A.
make sure to prepare the SELECT WHERE sql outside of the loop, because to reissue the same sql statement with a new bind value is relatively inexpensive, especially when searching over an indexed column.

Update: a little trick to use MySQL's sorting rules within Perl would be to do a database call with no table (or the dummy table 'dual') with a select with two bind variables, e.g.:

SELECT ? > ?;
The result of this query will be true (1) or false (0)

Now why didn't I think of that option before?