|Perl: the Markov chain saw|
locales, encodings, collations, charsets... how can I match a given MySQL collation?by xaprb (Scribe)
|on Apr 02, 2007 at 01:03 UTC||Need Help??|
xaprb has asked for the
wisdom of the Perl Monks concerning the following question:
I've read enough about encodings and collations and such to make my brain squishy. However, I still can't figure out how to solve my problem.
The problem is I need to compare character data in a way consistent with MySQL. I am comparing data in a table across two servers. Assume the tables have identical structures in every way, but may contain different data. I connect to each server and select col1 from each table. col1 is a varchar column with charset latin1 and collation latin1_swedish_ci (these are defaults). Now I loop through the rows returned from each table, comparing the data to determine which rows are extra or missing between the two tables.
MySQL sorts case-insensitive (hence latin1_swedish_ci) of course. I can duplicate this without problem: I just use "lc $a cmp lc $b".
This works okay until I come to a word like éclair. MySQL knows that this should be sorted just before "e". Perl thinks é is greater than "z".
I can go the other way, and make MySQL sort things the same way as Perl with "ORDER BY BINARY col1", but this defeats indexes and makes MySQL sort inefficiently, and that is highly undesirable for my purposes.
I feel sure there's a way to do this, but danged if I can figure out how. I have 8 Firefox tabs open with everything from perluniintro to Encode::Byte, and my brain is full :-) Anyone? Anyone?
UPDATE The idea to reach back to MySQL and ask it to compare the strings is how I'm going to try to do it. Thank you all very much for the great ideas!