Even after adding in the default-collation=utf8_unicode_ci
, I'm still getting all question marks for multibyte characters. What a headache.
Not sure if anyone who have the answer here, but assuming the DB is set up appropriate with all UTF-8 encoding and data appearing valid in the tables, it wouldn't really matter how it's getting into the database to begin with, right?
We're using a 3rd party program as a scraper, and its underlying Java is dumping the data to the DB. I haven't looked into it much just because the data appears right in the DB with all UTF-8 encoding configured, so I assumed it wasn't the issue.