in reply to UTF-8 and readdir, etc.
G'day John,
Welcome to the Monastery.
"Maybe I've overlooked the obvious (if so I apologise)."
Showing us your "simple perl program" and describing your actual problem with example input, output, error messages, and so on, would probably result in a better answer. As it is, we need to fall back to guesswork. I appreciate this is your first post, and I'm not trying to beat you over the head with the rule book, but please read "How do I post a question effectively?" and "Short, Self-Contained, Correct Example" to find out what sort of information to post in order to get the best answers.
One minor deviation from the information given in that first link: use '<pre>' blocks, instead of '<code>' blocks, for presenting Unicode data outside the 7-bit ASCII range. With '<code>' blocks, your Unicode characters will typically end up being shown as entity references, i.e. something like '&#NNNNNN;'. This won't happen with '<pre>' blocks; however, the drawbacks are there's no "[download]" link, and you have to manually change special characters in your code and data (e.g. '<' and '&') to their entities (e.g. '<' and '&') — there's a list of these after the textarea where you write your post. For inline Unicode characters, e.g. inside a '<p>' or '<li>' block, I typically use '<tt>' instead of '<pre>': this is to avoid '<pre>' being forced into a block format by, for instance, a style sheet.
Other than the actual tree walking, which could be part of your problem, the following script ("pm_1208191_read_utf8_filenames.pl") performs the reading and writing tasks you specify.
#!/usr/bin/env perl -l use strict; use warnings; use autodie; my $dir = 'pm_1208191_utf8_filenames'; my $out = 'pm_1208191_utf8_filenames_listing.txt'; open my $fh, '>', $out; opendir(my $dh, $dir); print $fh $_ while readdir $dh;
Given this test directory I set up:
$ ls -al pm_1208191_utf8_filenames total 0 drwxr-xr-x 7 ken staff 238 Feb 1 11:45 . drwxr-xr-x 18 ken staff 612 Feb 1 11:34 .. -rw-r--r-- 1 ken staff 0 Feb 1 11:34 abc -rw-r--r-- 1 ken staff 0 Feb 1 11:36 åßç -rw-r--r-- 1 ken staff 0 Feb 1 11:38 αβγ -rw-r--r-- 1 ken staff 0 Feb 1 11:41 абг -rw-r--r-- 1 ken staff 0 Feb 1 11:45 ☿♃♄
Here's a sample run:
$ cat pm_1208191_utf8_filenames_listing.txt cat: pm_1208191_utf8_filenames_listing.txt: No such file or directory $ pm_1208191_read_utf8_filenames.pl $ cat pm_1208191_utf8_filenames_listing.txt . .. abc åßç αβγ абг ☿♃♄
As you can see, I didn't need any special encoding-type directives. I'm using Perl 5.26.0; MacOS 10.12.5; and I have 'LANG=en_AU.UTF-8' (normal setting).
In case you can't actually see some of those characters, here's a table of the filenames, the three codepoints used for each, and a link to the Unicode PDF code chart so you can see what they look like.
Filename | Codepoints | Code Chart (PDF link) |
---|---|---|
abc | U+0061, U+0062, U+0063 | C0 Controls and Basic Latin |
åßç | U+00E5, U+00DF, U+00E7 | C1 Controls and Latin-1 Supplement |
αβγ | U+03B1, U+03B2, U+03B3 | Greek and Coptic |
абг | U+0430, U+0431, U+0433 | Cyrillic |
☿♃♄ | U+263F, U+2643, U+2644 | Miscellaneous Symbols |
Take a look at "Re: printing Unicode works for some characters but not all", which I wrote some months ago. This may shed some light on whatever problems you're encountering — clearly, this is one of those guesswork answers I mentioned earlier.
The open pragma statement you're looking for might be something like:
use open IO => qw{:encoding(UTF-8) :std};
Again, that's more guesswork as you haven't shown your script or adequately described your problem.
— Ken