Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^5: treat files with umlauts (utf)

by kcott (Archbishop)
on Apr 01, 2014 at 13:25 UTC ( [id://1080561]=note: print w/replies, xml ) Need Help??


in reply to Re^4: treat files with umlauts (utf)
in thread treat files with umlauts (utf)

I really don't think you understand what the utf8 pragma does. Here's another quote from the documentation (first line of the description):

"The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope ..."

It has nothing to do with the:

  • flagging variables ("getting $scandir flagged as UTF-8")
  • encoding of STDIN, STDOUT or STDERR ("perl -CS ...")
  • encoding of @ARGV elements ("perl -CA ...")

Let me reiterate the quote I provided in my earlier post from the utf8 documentation:

"Do not use this pragma for anything else than telling Perl that your script is written in UTF-8."

It has nothing to do with data read into the script, data processed by the script, data generated by the script or data output by the script. It's only about the text used to write the script and how Perl should parse that source text.

-- Ken

Replies are listed 'Best First'.
Re^6: treat files with umlauts (utf)
by hazylife (Monk) on Apr 01, 2014 at 14:00 UTC
    It has nothing to do with the: encoding of STDIN ... encoding of @ARGV elements
    Correct.
    It's only about the text used to write the script and how Perl should parse that source text.
    Yes, so...
    use utf8; my $scandir = 'something with umlauts it it'; # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ...is this string literal not part of the source?
    It has nothing to do with flagging variables
    #!/usr/bin/perl
    
    use strict;
    use Devel::Peek;
    
    {
        use utf8;
        my $var = 'für';
        print Dump \$var;
    }
    
    my $var = 'für';
    print Dump \$var;
    
      use utf8; my $scandir = 'something with umlauts it it';

      That's exactly the same code you invented, five nodes back, in your original post in this thread: "Re^2: treat files with umlauts (utf)". It is not code the OP posted (or even described in his narrative). My response is unchanged.

      # ...is this string literal not part of the source?

      That string literal is only part of the source you've invented.

      ... use Devel::Peek; ...

      Posting code without explaining why you're doing so is not particularly helpful.

      If you're referring to the output from that containing:

      FLAGS = (PADMY,POK,pPOK,UTF8)

      Then the UTF8 part of that is caused by the umlaut in 'für'. But, the OP's posted code contains no umlauts. Only your invented code contains umlauts.

      Change 'für' to 'fur', and you'll get:

      FLAGS = (PADMY,POK,pPOK)

      Just like the OP's posted code, this does not contain any umlauts and there's no UTF8 in the output.

      You can keep inventing code that requires use utf8 all you want but the OP's posted code contains no umlauts (or any other characters) that require use utf8.

      Please be very clear on these points:

      • The OP's posted code does not contain umlauts.
      • The OP's posted code does not include an assignment to $scandir.
      • the OP's posted code does not require use utf8;.

      -- Ken

        The OP's posted code does not include an assignment to $scandir
        It does not, but the OP does mention UTF-8, so $scandir being UTF-8 is a possibility, to say the least.
        The OP's posted code does not contain umlauts
        It doesn't have to be umlauts, what matters is whether $scandir is UTF-8. Make this change to your code and see what happens:
        use utf8; # just for utf8::upgrade # bytewise, this is already UTF-8... my $scandir = './pm_1080490_utf8_readdir'; #... but we need to flag it as such for # the problem to manifest itself: utf8::upgrade $scandir; # now on to readdir
        the OP's posted code does not require use utf8
        Right, it does not. And use utf8 is not absolutely necessary in the above test code - use -CS/binmode or -CA to initialize $scandir.
        If you're referring to the output from that containing: FLAGS = (PADMY,POK,pPOK,UTF8) Then the UTF8 part of that is caused by the umlaut in 'für'.
        Does that code not answer your:
        [use utf8] has nothing to do with the:... flagging variables
        # under 'use utf8' FLAGS = (PADMY,POK,pPOK,UTF8) ... "f\303\274r"\0 [UTF8 "f\x{fc}r"] # no utf8 FLAGS = (PADMY,POK,pPOK) ... "f\303\274r"\0
        Does the first variable have the UTF8 flag or does it not? What about the second variable? Aren't those two strings exactly the same?
        I'm out of this thread.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1080561]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 20:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found