|We don't bite newbies here... much|
Re: UTF-8 and readdir, etc.by kcott (Chancellor)
|on Feb 01, 2018 at 02:28 UTC||Need Help??|
Welcome to the Monastery.
"Maybe I've overlooked the obvious (if so I apologise)."
Showing us your "simple perl program" and describing your actual problem with example input, output, error messages, and so on, would probably result in a better answer. As it is, we need to fall back to guesswork. I appreciate this is your first post, and I'm not trying to beat you over the head with the rule book, but please read "How do I post a question effectively?" and "Short, Self-Contained, Correct Example" to find out what sort of information to post in order to get the best answers.
One minor deviation from the information given in that first link: use '<pre>' blocks, instead of '<code>' blocks, for presenting Unicode data outside the 7-bit ASCII range. With '<code>' blocks, your Unicode characters will typically end up being shown as entity references, i.e. something like '&#NNNNNN;'. This won't happen with '<pre>' blocks; however, the drawbacks are there's no "[download]" link, and you have to manually change special characters in your code and data (e.g. '<' and '&') to their entities (e.g. '<' and '&') — there's a list of these after the textarea where you write your post. For inline Unicode characters, e.g. inside a '<p>' or '<li>' block, I typically use '<tt>' instead of '<pre>': this is to avoid '<pre>' being forced into a block format by, for instance, a style sheet.
Other than the actual tree walking, which could be part of your problem, the following script ("pm_1208191_read_utf8_filenames.pl") performs the reading and writing tasks you specify.
Given this test directory I set up:
$ ls -al pm_1208191_utf8_filenames total 0 drwxr-xr-x 7 ken staff 238 Feb 1 11:45 . drwxr-xr-x 18 ken staff 612 Feb 1 11:34 .. -rw-r--r-- 1 ken staff 0 Feb 1 11:34 abc -rw-r--r-- 1 ken staff 0 Feb 1 11:36 åßç -rw-r--r-- 1 ken staff 0 Feb 1 11:38 αβγ -rw-r--r-- 1 ken staff 0 Feb 1 11:41 абг -rw-r--r-- 1 ken staff 0 Feb 1 11:45 ☿♃♄
Here's a sample run:
$ cat pm_1208191_utf8_filenames_listing.txt cat: pm_1208191_utf8_filenames_listing.txt: No such file or directory $ pm_1208191_read_utf8_filenames.pl $ cat pm_1208191_utf8_filenames_listing.txt . .. abc åßç αβγ абг ☿♃♄
As you can see, I didn't need any special encoding-type directives. I'm using Perl 5.26.0; MacOS 10.12.5; and I have 'LANG=en_AU.UTF-8' (normal setting).
In case you can't actually see some of those characters, here's a table of the filenames, the three codepoints used for each, and a link to the Unicode PDF code chart so you can see what they look like.
Take a look at "Re: printing Unicode works for some characters but not all", which I wrote some months ago. This may shed some light on whatever problems you're encountering — clearly, this is one of those guesswork answers I mentioned earlier.
The open pragma statement you're looking for might be something like:
Again, that's more guesswork as you haven't shown your script or adequately described your problem.