Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: What's the best way to detect character encodings, Windows-1252 v. UTF-8?

by grantm (Parson)
on Jun 18, 2011 at 00:44 UTC ( #910276=note: print w/ replies, xml ) Need Help??


in reply to What's the best way to detect character encodings, Windows-1252 v. UTF-8?

You might want to look at Encoding-FixLatin - I created it for a very similar situation. In my case I had a Postgres database from an application that had treated text as 8-bit binary strings. Each record was one of: ASCII, UTF-8, ISO-8859-1 or CP1252, but the DB dump as a whole was a mixture of all these. The documentation for Encoding::FixLatin describes the heuristics it uses.


Comment on Re: What's the best way to detect character encodings, Windows-1252 v. UTF-8?
Re^2: What's the best way to detect character encodings, Windows-1252 v. UTF-8?
by Khen1950fx (Canon) on Jun 18, 2011 at 11:37 UTC
    I tried your module using ikegami's cp1252. It works for me:
    #!/usr/bin/perl use Modern::Perl; use Search::Tools::UTF8; use Encoding::FixLatin qw(fix_latin); use Encode::Locale; use Encode; if ( -t ) { binmode(STDIN, ":encoding(console_in)"); binmode(STDOUT, ":encoding(console_out)"); binmode(STDERR, ":encoding(console_out)"); } my $text = "\xC9ric"; if (is_latin1($text) eq 1) { say "$text is latin1"; } else { return; } my $fix = fix_latin($text, ascii_hex => 0); if (looks_like_cp1252($fix) eq 0) { say "$fix cannot be mapped to utf8:-)"; } else { return; } say is_flagged_utf8($fix); say is_sane_utf8($fix); say is_valid_utf8($fix);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://910276]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2015-07-04 15:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls