Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Handling accented characters in filenames on Win32

by Schlika (Initiate)
on Apr 15, 2004 at 20:50 UTC ( #345539=perlquestion: print w/replies, xml ) Need Help??

Schlika has asked for the wisdom of the Perl Monks concerning the following question:


I am struggling with handling long filenames that contain international characters such as accented vowels.
Here is a code example:

$path = 'c:\temp'; @list = `dir /a:d /b /s $path`;

If a directory in c:\temp contains an international character, then when I try to use the corresponding entry in @list (such as when doing a "new Win32::Perms"), it fails because it cannot find the path specified (and yes, it DOES exist).
Any suggestions on how I could get Perl to handle those characters correctly?
So far I have tried "use utf8" and several "use encoding" but to no avail.

Any help would be appreciated.


Replies are listed 'Best First'.
Re: Handling accented characters in filenames on Win32
by kvale (Monsignor) on Apr 15, 2004 at 21:16 UTC
    If the offending characters are not Unicode, just 8-bit, you could try use bytes;. It turns off Unicode processing so you can just process strings as a sequence of raw bytes. From the pod:
    $x = chr(400); print "Length is ", length $x, "\n"; # "Length is 1" printf "Contents are %vd\n", $x; # "Contents are 400" { use bytes; print "Length is ", length $x, "\n"; # "Length is 2" printf "Contents are %vd\n", $x; # "Contents are 198.144" }


Re: Handling accented characters in filenames on Win32
by hardburn (Abbot) on Apr 15, 2004 at 20:55 UTC

    What version of perl are you running? 5.8 should automatically handle UTF stuff. If it doesn't work there, you probably found a bug.

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated


      Thanks for the feedback.
      I'm running ActiveState Perl v5.8.3.
      glob also returns rubbish when looking at the filename (prints ÜtÜ instead of été for example), but somehow if I use the paths returned by glob, the handles work...

      Me again,

      It looks like what is coming out of `dir /a:d /s /b $path` is encoded in cp437.
      So if I use use encoding 'cp437';, I can see that the pathname is stored correctly with the accented characters.
      Yet if I try to then use this pathname with any functions (such as a print), it becomes rubbish again...


      In the end, the solution to my problem was to use an internal Perl command rather than a system call to a shell command as suggested by someone else in this thread.
      File::Find replaced 'dir' nicely.

      However, as with glob, File::Find would print funny characters instead of the proper accented characters...
      The trick is then to use Win32::Console and add Win32::Console::OutputCP(1252); somewhere in your code.
      This formats output to the Win32 console to code page 1252 which seems to be the correct type.

      Thanks for the helpful suggestions.

Re: Handling accented characters in filenames on Win32
by Vautrin (Hermit) on Apr 15, 2004 at 21:10 UTC
    Out of curiousity, are the filenames you get when you use glob to get the filenames the same as the ones you get from using a `dir` call? I would replace the `dir` with a call to glob. If glob works, you're not getting the dir properly. If it doesn't, you may get some clues about what is going wrong.

    Want to support the EFF and FSF by buying cool stuff? Click here.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://345539]
Approved by davido
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2022-05-23 04:29 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (81 votes). Check out past polls.