Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Find encoding that should have been used

by ikegami (Pope)
on Oct 21, 2009 at 19:08 UTC ( #802507=CUFP: print w/ replies, xml ) Need Help??

Trying to debug an encoding problem? The following will try to figure out what encoding was used and what encoding should have been used.

use 5.008; use strict; use warnings; use Encode qw( encode decode ); use charnames qw( :full ); my $expected = "\N{LATIN CAPITAL LETTER I WITH CIRCUMFLEX}"; my $got = "\N{LATIN CAPITAL LETTER A WITH TILDE}" . "\N{LATIN CAPITAL LETTER Z WITH CARON}"; my @encs = ( 'US-ASCII', ( map "UTF-$_", qw( 7 8 16be 16le 32be 32le ) ), ( map "UCS-$_", qw( 2be 2le 4be 4le ) ), ( map "iso-8859-$_", 1..11, 13..16 ), ( map "Windows-$_", 437, 737, 775, 850, 852, 855, # OEM pages 857, 858, 860, 861, 862, 863, 865, 866, 869, 874, 932, 936, 949, 950, # ANSI pages 1250..1258, ), ); for my $enc_for_enc (@encs) { my $encoded = encode($enc_for_enc, $expected); for my $enc_for_dec (@encs) { my $decoded = decode($enc_for_dec, $encoded); next if $decoded ne $got; print("$enc_for_enc as $enc_for_dec:\n"); for ($decoded =~ /./sg) { my $code = ord; my $name = charnames::viacode($code); printf("(U+%04X) %s\n", $code, $name); } print("\n"); } }
UTF-8 as Windows-1252: (U+00C3) LATIN CAPITAL LETTER A WITH TILDE (U+017D) LATIN CAPITAL LETTER Z WITH CARON

Known bugs and limitations:

  • Doesn't provide a means to specify input without modifying the program.
  • Doesn't handle different codepoints that produce similar graphemes.
  • Should display nearest matches if there aren't any exact matches.

Comment on Find encoding that should have been used
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://802507]
Approved by MidLifeXis
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2015-08-05 07:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The oldest computer book still on my shelves (or on my digital media) is ...













    Results (80 votes), past polls