Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^8: treat files with umlauts (utf)

by hazylife (Monk)
on Apr 02, 2014 at 10:54 UTC ( [id://1080748]=note: print w/replies, xml ) Need Help??


in reply to Re^7: treat files with umlauts (utf)
in thread treat files with umlauts (utf)

The OP's posted code does not include an assignment to $scandir
It does not, but the OP does mention UTF-8, so $scandir being UTF-8 is a possibility, to say the least.
The OP's posted code does not contain umlauts
It doesn't have to be umlauts, what matters is whether $scandir is UTF-8. Make this change to your code and see what happens:
use utf8; # just for utf8::upgrade # bytewise, this is already UTF-8... my $scandir = './pm_1080490_utf8_readdir'; #... but we need to flag it as such for # the problem to manifest itself: utf8::upgrade $scandir; # now on to readdir
the OP's posted code does not require use utf8
Right, it does not. And use utf8 is not absolutely necessary in the above test code - use -CS/binmode or -CA to initialize $scandir.
If you're referring to the output from that containing: FLAGS = (PADMY,POK,pPOK,UTF8) Then the UTF8 part of that is caused by the umlaut in 'für'.
Does that code not answer your:
[use utf8] has nothing to do with the:... flagging variables
# under 'use utf8' FLAGS = (PADMY,POK,pPOK,UTF8) ... "f\303\274r"\0 [UTF8 "f\x{fc}r"] # no utf8 FLAGS = (PADMY,POK,pPOK) ... "f\303\274r"\0
Does the first variable have the UTF8 flag or does it not? What about the second variable? Aren't those two strings exactly the same?
I'm out of this thread.

Replies are listed 'Best First'.
Re^9: treat files with umlauts (utf)
by mike.scharnow (Initiate) on Apr 07, 2014 at 08:25 UTC

    Hello all!

    Thank you all for your ideas and for your discussion, which taught me some more internals abuot UTF-Handling. I hope I will be able to work in the Umlaut field without further problems.

    as to the cause of the problem (as I understand it now): You gave me the correct hints: it was not the problem of readdir but the problem of $scandir.

    I have a configuration xml file, which I read in using XML::Simple. $scandir is read from this file using something like my $scandir = $config->{external_systems}->{filesIN}.

    Now, the config file is stored in ISO-8859-1. It seems that in this construction, $scandir is not stored as UTF, but as ISO-8859-1, although there are no umlauts in the directory name!

    Now, when I concat $scandir with the result of readdir, it seems that a non-utf-value (from xml file) is concatenated with an utf value (from readdir). And as soon as there is an umlaut in the filename, the resulting string is invalid, causing "-f" to say "this is not a file".

    I solved it by writing

    my $scandir = …; utf::downgrade($scandir);
    Then I could successfully read, copy and move the files.

    Hoping that this is the "correct" way of dealing with the problem and again thanks very much

    Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1080748]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2024-04-19 18:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found