I'm trying to import a corpus of spam mail into a Postgresql database using Email::Folder to read in an MBox mail store. I'm parsing it and getting the info I want but I'm receiving an error trying to put the data into the database:
I'm using a prepared statement to do the INSERT INTO and it works for most of the records. Some are giving this error:
DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding
+ "UTF8": 0x95
HINT: This error can also happen if the byte sequence does not match
+the encoding expected by the server, which is controlled by "client_e
How do I convert or otherwise clean the data before putting it into the database? If I lose characters, I am OK with that. This is only for my own personal use so a bit of lost data on foreign language emails that I cannot read anyway is fine.
Any help appreciated. I can post the (admittedly ugly and unpolished) code if necessary.