Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

a new example

by Raymond (Novice)
on Jul 28, 2013 at 22:40 UTC ( #1046754=perlquestion: print w/ replies, xml ) Need Help??
Raymond has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use warnings; print "\xC4 and \x{0394} look different\n";

errors

Wide character in print at 4.pl line 5.

+ and + look different

Comment on a new example
Download Code
Replies are listed 'Best First'.
Re: a new example (Wide character in print ) (diagnostics)
by Anonymous Monk on Jul 28, 2013 at 22:59 UTC
Re: a new example
by kcott (Abbot) on Jul 29, 2013 at 04:27 UTC

    G'day Raymond,

    Here's a few examples that may clarify the differences between handling UTF-8 in your output and in your source code. Here's how those characters should render: &#xc4; = Ä and &#x394; = Δ. I've used <pre>...</pre> tags so that the characters (e.g. Δ), and not the entities (e.g. &#x394;), are displayed.

    Baseline code generating "Wide character" warning:

    $ perl -Mstrict -Mwarnings -E '
        say "\xC4 and \x{0394} look different";
    '
    Wide character in say at -e line 2.
     and Δ look different
    

    Using binmode function to specify UTF-8 output:

    $ perl -Mstrict -Mwarnings -E '
        binmode STDOUT => ":utf8";             
        say "\xC4 and \x{0394} look different";
    '
     and Δ look different
    

    Using the open pragma to specify UTF-8 output:

    $ perl -Mstrict -Mwarnings -E '
        use open qw{:std :utf8};
        say "\xC4 and \x{0394} look different";
    '
     and Δ look different
    

    Attempting to use UTF-8 in the source code without letting Perl know:

    $ perl -Mstrict -Mwarnings -E '
        binmode STDOUT => ":utf8";
        say "\xC4 and \x{0394} look different";
        say " and \x{c4} look different";
        say "Δ and \x{0394} look different";
    '
     and Δ look different
    „ and  look different
    ” and Δ look different
    

    Using the utf8 pragma to tell Perl there's UTF-8 in the source code:

    $ perl -Mstrict -Mwarnings -E '
        use utf8;
        binmode STDOUT => ":utf8";
        say "\xC4 and \x{0394} look different";
        say " and \x{c4} look the same";
        say "Δ and \x{0394} look the same";
    '
     and Δ look different
     and  look the same
    Δ and Δ look the same
    

    -- Ken

      #!/usr/bin/perl use strict; use warnings; my $string = "This is what you have"; print $string; #this part does not print: substr($string, 5, 2) = "wasn't"; #change "is" to "wasn't" substr($string, -12) = "ondrous"; #"this wasn't wondrous" substr($string, 0, 1) = ""; #delete first character substr($string, -10) = ""; #delete last 10 characters #printing problem end here
        Uh... well, yeah, it is a new example of something.

        Please share the 'of what' as I can't see the relevance in a thread devoted to encoding/decoding utf8.

        Oh, yes. When I print the part you've labeled as non-printing, I see this:

        #!/usr/bin/perl use strict; use warnings; # 1046930 my $string = "This is what you have" . "\n";; print $string; #this part does not print: substr($string, 5, 2) = "wasn't"; #change "is" to "wasn't" print $string . "\n"; substr($string, -12) = "ondrous"; #"this wasn't wondrous" print $string . "\n"; substr($string, 0, 1) = ""; #delete first character print $string . "\n"; substr($string, -10) = ""; #delete last 10 characters print $string . "\n"; #printing problem end here =head out: This is what you have This wasn't what you have This wasn't whondrous his wasn't whondrous his wasn't =cut

        ...which is at some variance with what your comments suggest you expected.

        If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.

        This makes no sense as a reply to what I wrote, nor does it make any sense in the context of this thread.

        If you'd care to clarify your intent, that would be good. :-)

        -- Ken

Re: a new example
by Loops (Curate) on Jul 28, 2013 at 22:48 UTC
    You'll want to add:
    use feature 'unicode_strings'; use open qw(:std :utf8); use utf8;

    To the top of your script to enable Unicode support.

    Alternatively you could use utf8::all from CPAN which handles even more Unicode edge cases, and can be included in your script in a single line

      Only the open pragma fixes the problem. There's no literal (high bit) UTF-8 character in the source code, so the utf8 pragma does nothing here, and character interpolation is, to my knowledge, unaffected by the Unicode bug, so the unicode_strings feature does nothing to fix the problem here either.

      The problem is solely the missing encoding discipline on the output filehandle.

Re: a new example
by Khen1950fx (Canon) on Jul 29, 2013 at 01:34 UTC
    The simplest way to fix the problem is to use utf8::all:
    #!/usr/bin/perl -l use strict; use warnings; use utf8::all; print "\xC4 and \x{0394} look different";

    Returns:

    and Δ look different

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1046754]
Approved by Loops
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (14)
As of 2015-07-30 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls