Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: UTF-8 to Latin1 - unmatched characters?

by samtregar (Abbot)
on Mar 20, 2008 at 16:48 UTC ( #675252=note: print w/replies, xml ) Need Help??


in reply to UTF-8 to Latin1 - unmatched characters?

Text::Unidecode, optionally using PerlIO::via::Unidecode should do the trick.

UPDATE: Yup, it does:

$ perl -MText::Unidecode -le 'print unidecode("\x{201c} \x{2013}")' " -

-sam

Replies are listed 'Best First'.
Re^2: UTF-8 to Latin1 - unmatched characters?
by uncommon13 (Novice) on Mar 27, 2008 at 15:08 UTC
    Thanks Sam. I used this, however, it also converts the other valid latin1 characters to ASCII.

    So, I found this which converts non-matched UTF-8 characters to something: http://linuxgazette.net/117/tag/4.html

    So basically, the code would be something like:

    # Converted UTF codes for non-matching ISO-8859-1 # Strip it down to basic ASCII %utf_entity = ( "\x{2019}", "'", "\x{201c}", '"', "\x{201d}", '"', "\x{2026}", "...", "\x{fffd}", "", ); s/(\X)/ exists $utf_entity{$1} ? $utf_entity{$1} : $1 /eg;
      I was going to recommend passing only characters that don't exist in iso-latin-1 to unidecode using a fallback handler to encode. It works, but I'm getting an error (Close with partial character.) when the file handle is closed, and I have no idea how to fix it.

      Here's the code anyway:

      use strict; use warnings; use PerlIO::encoding qw( ); use Text::Unidecode qw( unidecode ); use constant FB_UNIDECODE => sub { unidecode(chr($_[0])) }; my $file = '...'; local $PerlIO::encoding::fallback = FB_UNIDECODE; open(my $fh, '>:encoding(iso-8859-1)', $file) or die("Unable to create file \"$file\": $!\n"); print $fh "abc\x{201C}def\x{2013}ghi";
        Dear ikegami,

        This is exactly what I wanted :)

        It's absolutely brilliant to think about using the fallback handler.

        I don't get the error (Close with partial character.) which u mentioned though.

        Many thanks again :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://675252]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2019-06-27 10:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (111 votes). Check out past polls.

    Notices?