find out file charset and encoding?

DreamT has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: find out file charset and encoding? by GrandFather (Saint) on Aug 06, 2011 at 11:35 UTC
Actually you can't make sure a file is encoded in any particular way, you can only show that it isn't. However, it is fairly likely that a text file would show the nature of its line endings at least (assuming them to be consistent) within a modest number of characters - say a few hundred. *nix uses the line feed character \n as a line ending character. The other common line endings are Windows (carriage return, linefeed: \r\n) and Mac (carriage return: \r). Note that Perl translates the OS specific line ending sequence into a character represented by \n for files opened using default processing so \n may be used as the line end character across platforms. Thus, to determine the actual line ending character sequence used by a file it may be necessary to use binmode to ensure no line end translation takes place. Ensuring you have latin1 (probably you mean ISO/IEC 8859-1) is much harder and probably requires that the file contain some suitable foreign language text that you can check against an appropriate dictionary. However it may be that all you require is to check that the file is not inconsistent with it using some particular character coding. It may help to take a look at ISO/IEC_8859-1. True laziness is hard work	[reply]
Re: find out file charset and encoding? by Khen1950fx (Canon) on Aug 06, 2011 at 16:21 UTC
Take a look at piconv. It's an very useful tool. For example, you want to resolve the alias "latin1" `piconv -r latin1` [download] It'll return the canonical name iso-8859-1. If you want to make sure that it's encoded in latin1 `piconv -t iso-8859-1 $file` [download] It'll print the iso-8859-1 file to STDOUT.	[reply] [d/l] [select]
Re: find out file charset and encoding? by cormanaz (Deacon) on Aug 06, 2011 at 19:46 UTC
How about using Encode::Detect? Steve	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks