I'm also working on something somewhat related (How best to avoid mojibake, when attempting to automatically convert documents to utf-8?). I mention it because a Perl Module Unicode::Tussle was suggested. Which has a couple of utilities in it, you might find helps you with this.

