Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Composite Charset Data to UTF8?

by Khen1950fx (Canon)
on Jun 18, 2013 at 15:02 UTC ( #1039571=note: print w/ replies, xml ) Need Help??


in reply to Composite Charset Data to UTF8?

Encode::StdIO is what you're looking for. For example:

#!/usr/bin/perl use strict; use warnings; use Encode::StdIO encoding => 'utf-8';
Your STDOUT and STDERR will automatically be encoded in utf8.

Also, note that the author recommends Term::Encoding, so I would install that first, then Encode::StdIO.


Comment on Re: Composite Charset Data to UTF8?
Download Code
Re^2: Composite Charset Data to UTF8?
by AlexTape (Monk) on Jun 19, 2013 at 11:56 UTC
    ok, thats like my first approach:
    use utf8; use open ':std', ':encoding(UTF-8)'; use open IO => ':encoding(UTF-8)';
    but ok.. internal error like this:
    utf8 "\xA9" does not map to Unicode at /usr/local/share/perl/5.14.2/XML/Tidy.pm line 780.
    utf8 "\xAE" does not map to Unicode at /usr/local/share/perl/5.14.2/XML/Tidy.pm line 782.

    anyway that is not the really part of the problem.. anybody got a quick solution to test a file for a constant charset? e.g. true/false for file eq utf8 or not?! can i say that the file is utf after utf8::decode($_) or die "Input is not valid UTF-8";    just to say there are more then one charsets in the file or not??? or is it part of the problem?!

    kindly perlig
    $perlig =~ s/pec/cep/g if 'errors expected';

      Have a look at the encoding rules of UTF-8.

      A valid UTF-8 sequence starts either with 0b0xxxxxxx or with 0b11xxxxxx. So any octet starting with 0xb10xxxxxx is invalid UTF-8:

      > perl -wle "print sprintf '%08b', $_ for (0xa9,0xae)" 10101001 10101110

      An untested easy check could be to match your string against /[\x80-\xBF]/, which are the hex representations of the bit patterns we've identified:

      perl -wle "print sprintf '%08b - %02x', $_,$_ for (0b10000000,0b101111 +11)" 10000000 - 80 10111111 - bf

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039571]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-12-25 01:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls