Perl Monk, Perl Meditation | |
PerlMonks |
Re: UTF-8 and readdir, etc.by kcott (Archbishop) |
on Feb 01, 2018 at 02:28 UTC ( [id://1208224]=note: print w/replies, xml ) | Need Help?? |
G'day John, Welcome to the Monastery. "Maybe I've overlooked the obvious (if so I apologise)." Showing us your "simple perl program" and describing your actual problem with example input, output, error messages, and so on, would probably result in a better answer. As it is, we need to fall back to guesswork. I appreciate this is your first post, and I'm not trying to beat you over the head with the rule book, but please read "How do I post a question effectively?" and "Short, Self-Contained, Correct Example" to find out what sort of information to post in order to get the best answers. One minor deviation from the information given in that first link: use '<pre>' blocks, instead of '<code>' blocks, for presenting Unicode data outside the 7-bit ASCII range. With '<code>' blocks, your Unicode characters will typically end up being shown as entity references, i.e. something like '&#NNNNNN;'. This won't happen with '<pre>' blocks; however, the drawbacks are there's no "[download]" link, and you have to manually change special characters in your code and data (e.g. '<' and '&') to their entities (e.g. '<' and '&') — there's a list of these after the textarea where you write your post. For inline Unicode characters, e.g. inside a '<p>' or '<li>' block, I typically use '<tt>' instead of '<pre>': this is to avoid '<pre>' being forced into a block format by, for instance, a style sheet. Other than the actual tree walking, which could be part of your problem, the following script ("pm_1208191_read_utf8_filenames.pl") performs the reading and writing tasks you specify.
Given this test directory I set up: $ ls -al pm_1208191_utf8_filenames total 0 drwxr-xr-x 7 ken staff 238 Feb 1 11:45 . drwxr-xr-x 18 ken staff 612 Feb 1 11:34 .. -rw-r--r-- 1 ken staff 0 Feb 1 11:34 abc -rw-r--r-- 1 ken staff 0 Feb 1 11:36 åßç -rw-r--r-- 1 ken staff 0 Feb 1 11:38 αβγ -rw-r--r-- 1 ken staff 0 Feb 1 11:41 абг -rw-r--r-- 1 ken staff 0 Feb 1 11:45 ☿♃♄ Here's a sample run: $ cat pm_1208191_utf8_filenames_listing.txt cat: pm_1208191_utf8_filenames_listing.txt: No such file or directory $ pm_1208191_read_utf8_filenames.pl $ cat pm_1208191_utf8_filenames_listing.txt . .. abc åßç αβγ абг ☿♃♄ As you can see, I didn't need any special encoding-type directives. I'm using Perl 5.26.0; MacOS 10.12.5; and I have 'LANG=en_AU.UTF-8' (normal setting). In case you can't actually see some of those characters, here's a table of the filenames, the three codepoints used for each, and a link to the Unicode PDF code chart so you can see what they look like.
Take a look at "Re: printing Unicode works for some characters but not all", which I wrote some months ago. This may shed some light on whatever problems you're encountering — clearly, this is one of those guesswork answers I mentioned earlier. The open pragma statement you're looking for might be something like:
Again, that's more guesswork as you haven't shown your script or adequately described your problem. — Ken
In Section
Seekers of Perl Wisdom
|
|