What's the best way to detect character encodings? (Redux)

Jim has asked for the wisdom of the Perl Monks concerning the following question:

Two years ago, I posted What's the best way to detect character encodings, Windows-1252 v. UTF-8? to SoPW. I got plenty of helpful answers to my question then. Now, I need to solve essentially the same problem again, but with UTF-16/UTF-16LE/UTF-16BE added to the mix.

Is there a Perl module that will automatically detect text files in these character encodings and normalize them to UTF-8 with byte order marks?

ASCII
ISO-8859-1 (Latin 1)
Windows-1252 (ANSI)
UTF-8 (with or without a byte order mark)
UTF-16
UTF-16LE
UTF-16BE

For my purposes, I can assume that text in a single-byte "legacy" encoding (i.e., not Unicode) consisting solely of characters in the ranges 01-7F and A0-FF is ISO-8859-1. If it has characters in the ranges 80-9F as well, it's Windows-1252. In other words, I can pretend there's no such thing as C1 control codes. (This is what all modern web browsers do, and it's what's specified in the draft HTML5 specification.)

UPDATE: I also want to know which of the lowest common denominator encodings each text file is in. For example, a file that consists solely of bytes in the range 01-7F is, for my purposes, ASCII. Sure, it's also ISO-8859-1, Windows-1252, UTF-8, and dozens of other encodings besides. But it's strictly in the ASCII character encoding, so that's what I want it to be identified as.

Comment on What's the best way to detect character encodings? (Redux)

Replies are listed 'Best First'.
Re: What's the best way to detect character encodings? (Redux) by jakeease (Friar) on Jun 10, 2013 at 08:11 UTC
Stackoverflow has a detailed discussion at http://stackoverflow.com/questions/1970660/how-can-i-guess-the-encoding-of-a-string-in-perl. It mentions Encode::Detect and Encode::Guess among other modules.	[reply]
Re: What's the best way to detect character encodings? (Redux) by gnosti (Chaplain) on Jun 10, 2013 at 05:02 UTC
I've heard that detecting encodings is a hard problem.	[reply]
Re^2: What's the best way to detect character encodings? (Redux) by Jim (Curate) on Jun 10, 2013 at 05:12 UTC
It gets difficult when it's arbitrarily any "legacy" character encoding you're trying to detect. For that problem, fancy algorithms that use character frequencies, n-grams, dictionary look-ups, and other methods are required. But I'm able to factor out most of this complexity because I know that any text in a "legacy" single-byte encoding is ASCII/ISO-8859-1/Windows-1252. If it's not, then the damage I do converting it to UTF-8 will be what the provider of the text files deserves on-account-of-because she didn't provide me Unicode text as she was supposed to do in the first place.	[reply]