Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

print unicode characters from hex format

by sistermaryguacamole (Initiate)
on Aug 31, 2016 at 20:38 UTC ( [id://1170927]=perlquestion: print w/replies, xml ) Need Help??

sistermaryguacamole has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,

I have what I thought would be a simple issue. I'm reading a YAML file full of user names, email, phone, etc. Many are French-Canadian, and have accented characters. In the file I'm reading, it looks like:

jean-fran\xe7ois chr\xe9tien

I know that \xe7 is "į", and \xe9 is "é", etc., but it prints to the terminal as just \xe7, \xe9.

I've looked up all sorts of stuff: use utf8; use Encode; binmode(STDOUT, ":utf8"), blah blah blah.

I just want to print the stupid messed up characters the way they're supposed to look; for the love of God, please, help me.

(The next step, of course, is to forbid our French-Canadian employees to use ridiculous non-english characters when creating user accounts - but one thing at a time).

Regards & God Bless,

Sister Mary Guacamole

  • Comment on print unicode characters from hex format

Replies are listed 'Best First'.
Re: print unicode characters from hex format
by choroba (Cardinal) on Aug 31, 2016 at 21:01 UTC
    You can use substitution to replace the codes by acutal characters, and set the IO layer of the output to accept the encoding:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my $string = 'jean-fran\xe7ois chr\xe9tien'; $string =~ s/\\x(..)/chr hex $1/ge; binmode STDOUT, 'encoding(UTF-8)'; say $string;

    Also note that telling anyone their name is "ridiculous" might sound inpolite.

    Update: Simplified using binmode.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: print unicode characters from hex format
by kennethk (Abbot) on Aug 31, 2016 at 22:17 UTC
    How are you importing the YAML? The issue here is not that the data is being written incorrectly (you have the literals properly represented in the in-memory string) but that whatever is doing your data import is not doing the necessary character unescaping. You could fix this as choroba recommends with a regex post-fix, but this should really be handled in your serialization/deserialization layer.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: print unicode characters from hex format
by RonW (Parson) on Aug 31, 2016 at 23:17 UTC
Re: print unicode characters from hex format
by Anonymous Monk on Sep 07, 2016 at 14:02 UTC

    In case you are talking about Windows, there are some obstacles:

    I recently found that Data::Dumper on a Windows Console migth mess it up, maybe you find
    $Data::Dumper::Useperl = 1;
    helpful in that case or avoid Data::Dumper. I havenīt dug deeper than that, as it was just the debugging output that was not as expected.

    To change the consoleīs codepage on Windows to utf-8:
    chcp 65001
    and the window should use a utf8-capable font like Lucida console.

    On Linux you could check if a utf-8 locale is set, if you use a certain terminal like Putty, check for that being set to e.g. UTF8. In case of doubt, do the output and convert to hex to see what is going on and what changes throughout changes.

    If those characters are formatted right in memory, saving them to a file requires that files output mode set to UTF-8.

    e.g. for File::Slurp
    write_file ($filename, {binmode => ':utf8'}, $string);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1170927]
Front-paged by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-20 03:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found