Do you know where your variables are?

Re: Composite Charset Data to UTF8?

by Corion (Pope)
in reply to Composite Charset Data to UTF8?

What do you mean by "composite charset"?

The only sane approach is to Encode::decode all data as you read it into Perl, and to Encode::encode the data to the intended target encoding as you write it.

If you don't know the input encoding yet, you have to either use the existing guessing modules or come up with a way of your own to find the "best" possible input encoding of your file(s). For example if you have a dictionary of your source language, you can guess the encoding of a document by finding certain byte sequences that correspond to a word/phrase in that source language. There is very little we can do here without further information.

Update: According to your update, you have not exactly mojibake but still a horrible mess of encodings. Maybe you can still employ the approach of having well-known words/phrases to determine where a new encoding starts, but it will be much, much uglier and harder.

Re^2: Composite Charset Data to UTF8?
by AlexTape (Monk) on Jun 18, 2013 at 14:27 UTC
    topic update :)
    $perlig =~ s/pec/cep/g if 'errors expected';

