http://www.perlmonks.org?node_id=607776


in reply to locales, encodings, collations, charsets... how can I match a given MySQL collation?

Yes, sorting can be tricky between two different environments. Note that this is not just between Perl and MySQL, because the way a database gets sorted can be subject to locale on the database server side (so 2 databases can be sorted differently) and it can get even worse.

So you need a solution that does all sorting in one environment or does not rely on sorting at all. Here are some options:

1) if the tables are relatively small so that they could fit into memory consider to have Perl do the matching for you. Load the row data of table A into a hash A (key=col1), then get each row from table B (the other server), and check if it exists/matches data in hash A. If so then data matches and you can delete the hash A key. If there was no match then you have hit an entry unique to one table.

2) for larger tables you will need to rely on the database server to perform match operations for you. See if you can copy the content of the table in database A to a temporary table in database B. Next compare the content of the two tables in database B via a select statement that returns rows that do not match.

3) if all that is not feasible then your last resort is to select all rows from table A, and foreach row in table A do a lookup in table B via a SELECT WHERE on the indexed key in table B to see if there is a corresponding row. When finished you will have to do the reverse as well to find rows in table B that do not have a matching row in table A.
make sure to prepare the SELECT WHERE sql outside of the loop, because to reissue the same sql statement with a new bind value is relatively inexpensive, especially when searching over an indexed column.

Update: a little trick to use MySQL's sorting rules within Perl would be to do a database call with no table (or the dummy table 'dual') with a select with two bind variables, e.g.:

SELECT ? > ?;
The result of this query will be true (1) or false (0)

Now why didn't I think of that option before?

  • Comment on Re: locales, encodings, collations, charsets... how can I match a given MySQL collation?
  • Download Code

Replies are listed 'Best First'.
Re^2: locales, encodings, collations, charsets... how can I match a given MySQL collation?
by xaprb (Scribe) on Apr 02, 2007 at 13:09 UTC

    Slight clarification: I don't know about other RDBMS products, but in MySQL's case, the sorting is only dependent on the column's collation, which is easy to find with SHOW FULL COLUMNS. If my tool discovers that the two columns have different collations, I think it will have to die with an error.

      Note that even in MySQL if not defined explicitly collation will be dependent upon collation settings for database and if that is default then even upon collation for server instance.

      BTW the handling of collation is vendor dependent so if you want to stick to vendor-neutral DBI then you don't want to count on it.

Re^2: locales, encodings, collations, charsets... how can I match a given MySQL collation?
by xaprb (Scribe) on Apr 02, 2007 at 14:03 UTC

    Oooh, that's a great idea! It would still require a potentially large amount of network traffic, but there's an additional benefit: if I specify the collation explicitly when I create the $sth, I can let MySQL handle any coercion between collations if the columns in the two tables do have different collations. Then there's a much better shot at getting consistent behavior.

Re^2: locales, encodings, collations, charsets... how can I match a given MySQL collation?
by xaprb (Scribe) on Apr 02, 2007 at 12:46 UTC

    Yep, I had thought about all these things too, but was trying my best to find a way to do it in Perl to be as network and time efficient as possible. *sigh* It may be an impossible dream. There are a lot of variables, even in options 2 and 3.