laziness, impatience, and hubris | |
PerlMonks |
What encoding am I (probably) using?by tphyahoo (Vicar) |
on May 13, 2005 at 12:25 UTC ( [id://456690]=perlquestion: print w/replies, xml ) | Need Help?? |
tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:
O wise monks,
Let's say I want to process some text whose encoding is uncertain, except that it is probably text, and probably in a Western (1 byte character) language. I want to do some text processing on it such as, extract all words from it. Before doing anything, I want to use
to put everything into iso-8859-1 in (probable) good form. Is there anything I can use that will give me the "probable encoding" for a file / string / whatever? I was led in this direction by the venerable Thundergnat's answer to my matching german characters output from system call. where he suggested I run Encode::from_to($latinresult, 'cp437', 'iso-8859-1'); before matching the output of a system call on my german WinXP box. But how did he know to use 'cp437'? UPDATE: Thanks monks, Encode::Guess looks good. I'm going to go try it out.
Back to
Seekers of Perl Wisdom
|
|