Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^5: Default encoding rules leave me puzzled...

by Jim (Curate)
on Jun 22, 2014 at 19:22 UTC ( #1090846=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Default encoding rules leave me puzzled...
in thread Default encoding rules leave me puzzled...

Your Perl script doesn't compile.

C:\>chcp
Active code page: 437

C:\>type 1090732.pl
use utf8;
my $s1 = inet_aton('195.169.195.171');  print($s1);
my $s2 = encode_utf8("├⌐├");             print($s2);
my $s3 = "├┬⌐├┬";                        print($s3);
my $s4 = "\xC3\xA9\xC3\xAB";            print($s4);

C:\>cat 1090732.pl
use utf8;
my $s1 = inet_aton('195.169.195.171');  print($s1);
my $s2 = encode_utf8("");             print($s2);
my $s3 = "éë";                        print($s3);
my $s4 = "\xC3\xA9\xC3\xAB";            print($s4);

C:\>perl 1090732.pl
Undefined subroutine &main::inet_aton called at 1090732.pl line 2.

C:\>


Comment on Re^5: Default encoding rules leave me puzzled...
Re^6: Default encoding rules leave me puzzled...
by ikegami (Pope) on Jun 23, 2014 at 01:53 UTC

    inet_aton is provided by Socket, and encode_utf8 is provided by Encode. I left a few obvious headers out since they weren't relevant.

    In all four cases, print outputs the four bytes C3 A9 C3 AB because in all four cases, the string passed to print was "\xC3\xA9\xC3\xAB".

      inet_aton is provided by Socket, and encode_utf8 is provided by Encode. I left a few obvious headers out since they weren't relevant.

      The headers can't possibly be irrelevant if the Perl script doesn't compile without them. And there's nothing intrinsically obvious about them either. If there were, then perl wouldn't need the programmer to use them. As it happens, I knew that encode_utf8() is from the Encode module because I'd used it before, but I didn't recognize inet_aton() because I'd never used the Socket module before.

      If you post something on PerlMonks to make a point, you can't neglect to make the point. Otherwise, you're just arguing obscurely and unhelpfully.

      In all four cases, print outputs the four bytes C3 A9 C3 AB because in all four cases, the string passed to print was "\xC3\xA9\xC3\xAB".

      This is the point you neglected to make. In your post, you didn't state the point explicitly, and you also didn't include the output of the Perl script you intended to demonstrate the point you were trying to make. You left it as an exercise for the reader to run your script, which we've established doesn't compile as posted.

      PerlMonks is now littered with threads much like this one. A monk comes to the Monastery seeking clarification about how character encodings and Unicode work in Perl—particularly for help understanding their myriad subtleties. Instead of getting clarification, the monk just gets more confusing details, oftentimes within a torrent of rhetorical arguments and even flame wars. You're involved in many of these discussions, and I think your explanations usually lead to more confusion rather than to greater clarity. I don't doubt that you're 100% correct in every gory technical detail. I just don't think you do an effective job of translating the technical facts from your complex mental model of them into clear information about the topic that ordinary Perl hackers can use to help them write Perl scripts.

        The output and thus whether the program runs is irrelevant to the post. They could even output different things, and it wouldn't have changed anything. I did state the point explicitly, and I'll repeat it for you: The only two possible answers are "all of them" or "none of them", since print can't tell the difference between those strings. The only thing I implied is that it's ludicrous to say iso-latin-1 is utilized for all of them.

        I wasn't posting a tutorial or documentation; I was addressing someone's post, and a snippet is not necessarily a script.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1090846]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2014-12-21 03:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (102 votes), past polls