Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Printing a UTF8 code

by bobt_1234 (Initiate)
on Nov 08, 2019 at 12:59 UTC ( [id://11108475]=perlquestion: print w/replies, xml ) Need Help??

bobt_1234 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks. I'm hoping to gain some perl wisdom about outputting unicode or "utf8" codes. I've tried things like
open FO, ">test.txt"; binmode FO, ":utf8"; print FO "\u00b9"
But it doesn't work. The output is "00b9". I've tried other things like
print FO "\u-00b9"; # Outputs -00b9 print FO "\u{00b9}"; # Outputs {00b9} print FO "\x{00b9}"; # Outputs a strange character (the code for 00b9 +is a superscript 1}
I've tried searching the Internet to no avail. Every page on this topic requires you to read an encyclopedia. All I want is the code to output a unicode literal. Thanks everyone!

Replies are listed 'Best First'.
Re: Printing a UTF8 code
by hippo (Bishop) on Nov 08, 2019 at 13:53 UTC
    print FO "\x{00b9}"; # Outputs a strange character (the code for 00b9 is a superscript 1}

    This works fine for me on Perl v5.20.3:

    $ cat uniprint.pl 
    #!/usr/bin/env perl 
    use strict;
    use warnings;
    
    open my $out, '>:utf8', 'test.txt';
    print $out "Hello\x{00b9}\n";
    close $out;
    $ ./uniprint.pl 
    $ cat test.txt
    Hello¹
    $ hexdump test.txt
    0000000 6548 6c6c c26f 0ab9                    
    0000008
    $
    

    If you get different output from the hexdump, that would be informative to see. If the hexdump is the same, but the cat output doesn't match then your terminal or locale is probably to blame.

Re: Printing a UTF8 code
by haj (Vicar) on Nov 08, 2019 at 13:31 UTC
    Actually, the following is correct:
    open FO, ">test.txt"; binmode FO, ":utf8"; print FO "\x{00b9}";

    If you see a strange character, then probably the program you're using to display your test file doesn't handle UTF-8 correctly, or must be told to do so.

    I use to forget the correct syntax because I need them almost never. But a google search for "Perl unicode literals" reveals the nice article Perl Unicode Cookbook: Unicode Literals by Number by Tom Christiansen. That's how I did it right now.

      Actually, that worked!
      Thanks!
      My mistake was using the wrong case. binmode FO, ":UTF8" # Does not work! binmode FO, ":utf8" # This works!
Re: Printing a UTF8 code
by haukex (Archbishop) on Nov 08, 2019 at 19:43 UTC

    Since it hasn't been mentioned yet, note that \u in Perl actually means "titlecase the next character" (I assume you got \u from JS) - see Quote and Quote like Operators. You're looking for either \xB9 (two hexadecimal digits only), \x{00B9}, \N{U+00B9}, or \N{SUPERSCRIPT ONE} (on older versions of Perl charnames needs to be loaded for the last one to work), or of course adding use utf8; to the script, saving it as UTF-8, and using a literal ¹ character. Also, I'd suggest having a look at "open" Best Practices.

Re: Printing a UTF8 code
by Anonymous Monk on Nov 09, 2019 at 07:50 UTC
    print FO "\x{00b9}"; Outputs a strange character
    With no :encoding layer set on the file handle, Perl outputs characters with codes below 0x100 directly as bytes. This is done for byte transparency (to allow strings to consist of arbitrary bytes) and backwards compatibility:
    $ perl -e 'print "\x{00b9}"' | hd 00000000 b9 |.| ^^-- notice the literal byte 0xB9 $ perl -CO -e 'print "\x{00b9}"' | hd 00000000 c2 b9 |..| ^^^^^-- this is the UTF-8 byte sequence for U+00B9

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11108475]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-03-28 16:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found