http://www.perlmonks.org?node_id=651467


in reply to Re^4: Writing UTF8 Filename (Win32)
in thread Writing UTF8 Filename

One problem, even with Win32, is that you can have multiple filesystems on a single system, even within a single tree. Not every filesystem handles filenames the same way. Any solution for Perl would be incomplete without the possibility to override the encoding decision per path.

I'm hoping for a solution that is sufficiently abstracted that all platforms can use it. Win32's implementation would probably be a bit easier than one for, say, Linux, but even if you have to set things explicitly per path, it's better than what we have now. The following is copied from a post to p5p a while ago.

I tend to agree, however pragmas tend to be global, program- or packagewise, and what suits best here is individual, perl-call flag.

Global is a problem in most cases, but I feel it would be perfect here, simply because the filesystem is equally global. In fact, it's even longer lived than your Perl program :)

Better yet, global variables can be localized to dynamic scope. This is good, because when you set the encoding for /foo, it should work for encoding-unaware modules too.

Maybe a hash would be nice:

${^FS_ENCODING}{foo} = 'A'; ${^FS_ENCODING}{foo}{bar} = 'B'; ${^FS_ENCODING}{foo}{bar}{baz}{quux} = 'auto'; open my $fh, ">", "/foo/bar/baz/quux/blah/hello.txt";
Which then actually does:
open my $fh, ">", join("/", "" encode(detect_encoding("/"), "foo"), encode("A", "bar"), encode("B", "baz"), encode("B", "quux"), encode(detect_encoding("/foo/bar/baz/quux"), "blah"), encode(detect_encoding("/foo/bar/baz/quux/blah"), "hello.txt") +, );

Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Replies are listed 'Best First'.
Re^6: Writing UTF8 Filename (Win32)
by tye (Sage) on Nov 18, 2007 at 00:52 UTC

    Wow. That would suck, IMHO. Talk about a complicated mess of an over-designed system.

    Simply supporting Unicode strings as file names/paths is what should be done and is what was done in Win32. Perl doesn't support strings in multiple encodings (they are either Unicode in UTF-8 or aren't, when they are instead composed of 8-bit characters). Similarly, Win32 strings are either Unicode in UTF-16 (or so) or aren't, when they are composed of 8-bit characters. Win32 at least makes clear what the "aren't" case means; it means the string is in the encoding of the process's current locality (not in some encoding based on what part of the file system it is referring to, which would be an unholy mess).

    The support for Win32 would be fairly simple, instead of always converting to 8-bit character strings before calling a Windows *A() function (which then converts them to UTF-16), we should always convert to UTF-16 strings before calling a Windows *W() function.

    If Unix support for Unicode filenames is going a route similar to what you outlined, then I won't hold my breath for that being stable and don't think Perl should try to implement support for it, because I predict that route would be doomed to be abandoned anyway.

    - tye        

      Wow. That would suck, IMHO. Talk about a complicated mess of an over-designed system.

      I don't think it would suck or is over designed. In the common case, you would use file functions like you do now, and Perl handles everything transparently. ${^FS_ENCODING} would default to auto, resulting in autodetection for the entire system. When you want to port your latin1 mp3 collection to utf8 (to name one real world case), it would be exceptionally easy to do so: given proper OS support it would detect the encodings automatically, and without the OS support you can still override them with two lines of code.

      The problem with ANY win32-only code, or any-platform-only code, is that you put the burden of writing portable applications on the programmer. Hence, some abstraction would be nice. If only perl provided useful hooks for encoding filenames in general, that would be a great start, and also provide nice ways of dealing with existing systems. Program written to support only absolute filenames? Hack hack, and it does what you want.

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

        Please put your complex plan into a module and not into Perl itself.

        The problem with ANY win32-only code, or any-platform-only code, is that you put the burden of writing portable applications on the programmer. Hence, some abstraction would be nice.

        The abstraction is that file names/paths that are Unicode (UTF-8) strings in Perl should be supported. That'll also be the abstraction that gets supported at the low level when Unix catches up to Win32, surely. That's the only abstraction that makes much sense. Sure, in the short term, there will be awkward steps to try to bridge between 8-bit chars and Unicode, but those are going to be awkard and non-portable and best kept out of the way of those who don't get stuck having to use them.

        - tye