Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How to stat a file with a Unicode (UTF16-LE) filename in Windows?

by alanhaggai (Initiate)
on Feb 06, 2009 at 05:13 UTC ( #741797=perlquestion: print w/ replies, xml ) Need Help??
alanhaggai has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have been able to create/open files with Unicode (UTF16-LE) filenames under Windows by using Win32API::File. However, I have not yet been able to find a way to stat() such files. How can this be done?

Comment on How to stat a file with a Unicode (UTF16-LE) filename in Windows?
Re: How to stat a file with a Unicode (UTF16-LE) filename in Windows?
by ikegami (Pope) on Feb 06, 2009 at 05:57 UTC
    In chat, you said Win32API::File's CreateFileW + OsFHandleOpen didn't work. ("I have used Win32API::File, but it does not support converting the file handle for wide calls to Perl equivalents.") I downvoted for not showing what you tried because it does work.
    use strict; use warnings; use Encode qw( encode ); use File::stat qw( stat ); use Symbol qw( gensym ); use Win32API::File qw( CreateFileW OsFHandleOpen GENERIC_READ FILE_SHARE_READ OPEN_EXISTING ); { # The file name consists of a black heart (U+2665). my $fn = encode('UCS-2le', "\x{2665}"); my $handle = CreateFileW( $fn, GENERIC_READ, FILE_SHARE_READ, [], OPEN_EXISTING, 0, [] ) or die("CreateFile: $^E\n"); my $fh = gensym(); OsFHandleOpen($fh, $handle, "r") or die("OsFHandleOpen: $!\n"); my $stat = stat($fh); print("atime: ", scalar(localtime($stat->atime())), "\n"); print("mtime: ", scalar(localtime($stat->mtime())), "\n"); print("ctime: ", scalar(localtime($stat->ctime())), "\n"); }
    atime: Fri Feb 6 00:44:39 2009 mtime: Fri Feb 6 00:44:39 2009 ctime: Fri Feb 6 00:44:39 2009
Re: How to stat a file with a Unicode (UTF16-LE) filename in Windows?
by ikegami (Pope) on Feb 06, 2009 at 06:12 UTC

    Of course, calling stat (a unix system call emulation) is a very roundabout way of calling GetFileTime. The work has already been done for you in Win32API::File::Time.

    use strict; use warnings; use Encode qw( encode ); use Win32API::File::Time qw( GetFileTime ); { # The file name consists of a black heart (U+2665). my $fn = encode('UCS-2le', "\x{2665}"); local ${^WIDE_SYSTEM_CALLS} = 1; my ($atime, $mtime, $ctime) = GetFileTime($fn) or die("GetFileTime: $^E\n"); print("atime: ", scalar(localtime($atime)), "\n"); print("mtime: ", scalar(localtime($mtime)), "\n"); print("ctime: ", scalar(localtime($ctime)), "\n"); }
    atime: Fri Feb 6 00:44:39 2009 mtime: Fri Feb 6 00:44:39 2009 ctime: Fri Feb 6 00:44:39 2009

      It is working well. Now I understand why I was not able to stat(). I did not use Symbol, and gensym(). I will read about them. Also, in Windows, internally, which encoding is used for filenames? UTF16-le or UCS-2le?

      Thanks again for sparing your time and for the great code that you have posted.

        Also, in Windows, internally, which encoding is used for filenames? UTF16-le or UCS-2le?

        It is my tenuous understanding that the difference between UTF-16 and UCS-2 is UTF-16 can address characters above 64K and UCS-2 cannot. I haven't seen any support for multi-word characters in Windows, so I believe it's UCS-2. In practice, it doesn't matter which one you use.

        I did not use Symbol, and gensym()

        The documentation for OsFHandleOpen clearly defines what is acceptable, and an undefined lexical isn't one one of those.

        For what it is worth (not much) the MSDN says the following:
        Windows stores the long file names on disk in Unicode. ...The valid character set for these long file names is the NTFS character set, less one character: the colon (':') ...

        I have searched high and low for a definition of "the NTFS character set" but could not find a thing.

        The MSDN also offers conventions (like the Pirate's code) for naming files:
        Use any character in the current code page for a name, except characters in the range 0 through 31 or any character explicitly disallowed by the file system. A name can contain characters in the extended character set (128255). However, it cannot contain the following reserved characters: < > : " / \ |
        This implies single byte characters, which contradicts that above. The phrase "any character explicitly disallowed by the file system" is wonderful when they do not seem to define what they might be.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://741797]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2014-12-29 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (186 votes), past polls