|Problems? Is your data what you think it is?|
unicode version of readdirby dk (Chaplain)
|on Sep 14, 2007 at 08:55 UTC||Need Help??|
dk has asked for the
wisdom of the Perl Monks concerning the following question:
I was researching how it would be possible to receive results of readdir in utf8, and didn't find anything useful. The problem is that I want to read file names on win32 that contain non-latin character, that are mapped to '?' within my codepage.
I found that the problem was discussed before, but couldn't find any suitable solutions, jperl hacks being discontinued and Win32API::File not having FindFirst/FindNext entries.
I was thinking if it is indeed not possible, of introducing some switches in perl core that would trigger behavior of readdir between bytes and utf8. Next steps probably would be that open would recognize utf8 file names as well, but that's for later.
Another aspect is that the problem is wider than win32 - it is perfectly legal to create utf8 file names on unix file systems (of course one can always treat them as non-unicode names, which is not possible on win32); gnome utilities use this feature when run under UTF8 locales. The point is if someone a) explicitly knows that his files have utf8 names and b) wants them to be accessed with perl utf8 semantics and little hassle (and irrespective of the locale!), there's no way to do that except to mess with Encode.
So my questions are: