Handling accented characters in filenames on Win32

by Schlika (Initiate)
on Apr 15, 2004 at 20:50 UTC

I am struggling with handling long filenames that contain international characters such as accented vowels.
Here is a code example:

$path = 'c:\temp'; @list = `dir /a:d /b /s $path`;

If a directory in c:\temp contains an international character, then when I try to use the corresponding entry in @list (such as when doing a "new Win32::Perms"), it fails because it cannot find the path specified (and yes, it DOES exist).
Any suggestions on how I could get Perl to handle those characters correctly?
So far I have tried "use utf8" and several "use encoding" but to no avail.

Any help would be appreciated.


Re: Handling accented characters in filenames on Win32
by kvale (Monsignor) on Apr 15, 2004 at 21:16 UTC
    If the offending characters are not Unicode, just 8-bit, you could try use bytes;. It turns off Unicode processing so you can just process strings as a sequence of raw bytes. From the pod:
    $x = chr(400); print "Length is ", length $x, "\n"; # "Length is 1" printf "Contents are %vd\n", $x; # "Contents are 400" { use bytes; print "Length is ", length $x, "\n"; # "Length is 2" printf "Contents are %vd\n", $x; # "Contents are 198.144" }


Re: Handling accented characters in filenames on Win32
by hardburn (Abbot) on Apr 15, 2004 at 20:55 UTC

    What version of perl are you running? 5.8 should automatically handle UTF stuff. If it doesn't work there, you probably found a bug.

      Thanks for the feedback.
      I'm running ActiveState Perl v5.8.3.
      glob also returns rubbish when looking at the filename (prints ÜtÜ instead of été for example), but somehow if I use the paths returned by glob, the handles work...

      Me again,

      It looks like what is coming out of `dir /a:d /s /b $path` is encoded in cp437.
      So if I use use encoding 'cp437';, I can see that the pathname is stored correctly with the accented characters.
      Yet if I try to then use this pathname with any functions (such as a print), it becomes rubbish again...


      In the end, the solution to my problem was to use an internal Perl command rather than a system call to a shell command as suggested by someone else in this thread.
      File::Find replaced 'dir' nicely.

      However, as with glob, File::Find would print funny characters instead of the proper accented characters...
      The trick is then to use Win32::Console and add Win32::Console::OutputCP(1252); somewhere in your code.
      This formats output to the Win32 console to code page 1252 which seems to be the correct type.

      Thanks for the helpful suggestions.

Re: Handling accented characters in filenames on Win32
by Vautrin (Hermit) on Apr 15, 2004 at 21:10 UTC
    Out of curiousity, are the filenames you get when you use glob to get the filenames the same as the ones you get from using a `dir` call? I would replace the `dir` with a call to glob. If glob works, you're not getting the dir properly. If it doesn't, you may get some clues about what is going wrong.

