Re^2: Using encoding

by nikosv (Chaplain)
on Jan 14, 2013 at 12:29 UTC

in reply to Re: Using encoding
in thread Using encoding

in Windows the translation is from Unicode (UTF16) to ANSI according to your System "Language for non Unicode programs". So the pound sign will be broken down to bytes according to it

Re^3: Using encoding
on Jan 14, 2013 at 12:37 UTC

    Ok, I think that makes sense. So ord is not what I'm after

    What's the best way to find 'funny' characters in a text file, and to translate them into meaningful characters in a text/unicode file?

    I'm assuming that it's me that's making this difficult and it's probably quite straight forward

        I tried the UTF16LE and it was producing the error "UTF-16LE:Partial character", so it wouldn't work

        Taking all encoding off the input file (presumably letting it read as ASCII) allowed me to substitute

        $col =~ s/\x{B6}\x{9C}/\x{A3}/g;

        as long as my output file was encoded with ISO-8859-1

        That encoding (ISO-8859-1) works for the input file as well. So I guess I was leading people astray by suggesting using utf8.

        I now have signs appearing in the finished file. However, it also contains little squares which seem to be from return markers the users have put in.

        I would have thought that with this file able to be input without encoding that it would be a case of s/\r\n/\n/, but this doesn't seem to work

        Alternatively, is there a way to say "Perl, if you don't recognise the character, blitz it!"?

        I've read quite a few web pages about the whole thing, but I don't seem to be quite getting it.

        Any further help much appreciated

