Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

a new example

by Raymond (Novice)
on Jul 28, 2013 at 22:40 UTC ( [id://1046754]=perlquestion: print w/replies, xml ) Need Help??

Raymond has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use warnings; print "\xC4 and \x{0394} look different\n";

errors

Wide character in print at 4.pl line 5.

+ä and +ö look different

Replies are listed 'Best First'.
Re: a new example (Wide character in print ) (diagnostics)
by Anonymous Monk on Jul 28, 2013 at 22:59 UTC
Re: a new example
by kcott (Archbishop) on Jul 29, 2013 at 04:27 UTC

    G'day Raymond,

    Here's a few examples that may clarify the differences between handling UTF-8 in your output and in your source code. Here's how those characters should render: &#xc4; = Ä and &#x394; = Δ. I've used <pre>...</pre> tags so that the characters (e.g. Δ), and not the entities (e.g. &#x394;), are displayed.

    Baseline code generating "Wide character" warning:

    $ perl -Mstrict -Mwarnings -E '
        say "\xC4 and \x{0394} look different";
    '
    Wide character in say at -e line 2.
    Ä and Δ look different
    

    Using binmode function to specify UTF-8 output:

    $ perl -Mstrict -Mwarnings -E '
        binmode STDOUT => ":utf8";             
        say "\xC4 and \x{0394} look different";
    '
    Ä and Δ look different
    

    Using the open pragma to specify UTF-8 output:

    $ perl -Mstrict -Mwarnings -E '
        use open qw{:std :utf8};
        say "\xC4 and \x{0394} look different";
    '
    Ä and Δ look different
    

    Attempting to use UTF-8 in the source code without letting Perl know:

    $ perl -Mstrict -Mwarnings -E '
        binmode STDOUT => ":utf8";
        say "\xC4 and \x{0394} look different";
        say "Ä and \x{c4} look different";
        say "Δ and \x{0394} look different";
    '
    Ä and Δ look different
    Ä and Ä look different
    Δ and Δ look different
    

    Using the utf8 pragma to tell Perl there's UTF-8 in the source code:

    $ perl -Mstrict -Mwarnings -E '
        use utf8;
        binmode STDOUT => ":utf8";
        say "\xC4 and \x{0394} look different";
        say "Ä and \x{c4} look the same";
        say "Δ and \x{0394} look the same";
    '
    Ä and Δ look different
    Ä and Ä look the same
    Δ and Δ look the same
    

    -- Ken

      #!/usr/bin/perl use strict; use warnings; my $string = "This is what you have"; print $string; #this part does not print: substr($string, 5, 2) = "wasn't"; #change "is" to "wasn't" substr($string, -12) = "ondrous"; #"this wasn't wondrous" substr($string, 0, 1) = ""; #delete first character substr($string, -10) = ""; #delete last 10 characters #printing problem end here
        Uh... well, yeah, it is a new example of something.

        Please share the 'of what' as I can't see the relevance in a thread devoted to encoding/decoding utf8.

        Oh, yes. When I print the part you've labeled as non-printing, I see this:

        #!/usr/bin/perl use strict; use warnings; # 1046930 my $string = "This is what you have" . "\n";; print $string; #this part does not print: substr($string, 5, 2) = "wasn't"; #change "is" to "wasn't" print $string . "\n"; substr($string, -12) = "ondrous"; #"this wasn't wondrous" print $string . "\n"; substr($string, 0, 1) = ""; #delete first character print $string . "\n"; substr($string, -10) = ""; #delete last 10 characters print $string . "\n"; #printing problem end here =head out: This is what you have This wasn't what you have This wasn't whondrous his wasn't whondrous his wasn't =cut

        ...which is at some variance with what your comments suggest you expected.

        If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.

        This makes no sense as a reply to what I wrote, nor does it make any sense in the context of this thread.

        If you'd care to clarify your intent, that would be good. :-)

        -- Ken

Re: a new example
by Loops (Curate) on Jul 28, 2013 at 22:48 UTC
    You'll want to add:
    use feature 'unicode_strings'; use open qw(:std :utf8); use utf8;

    To the top of your script to enable Unicode support.

    Alternatively you could use utf8::all from CPAN which handles even more Unicode edge cases, and can be included in your script in a single line

      Only the open pragma fixes the problem. There's no literal (high bit) UTF-8 character in the source code, so the utf8 pragma does nothing here, and character interpolation is, to my knowledge, unaffected by the Unicode bug, so the unicode_strings feature does nothing to fix the problem here either.

      The problem is solely the missing encoding discipline on the output filehandle.

Re: a new example
by Khen1950fx (Canon) on Jul 29, 2013 at 01:34 UTC
    The simplest way to fix the problem is to use utf8::all:
    #!/usr/bin/perl -l use strict; use warnings; use utf8::all; print "\xC4 and \x{0394} look different";

    Returns:

    Ä and Δ look different

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1046754]
Approved by Loops
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-26 01:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found