I had a Win32 Perl coding challenge to search a directory(WinXP, NTFS) that may contain files with unicode / wide filenames in addition to ANSI filenames.
Through trial, error, and searches against Perl mailing list archives, it seems apparent that Win32 Perl's builtin directory functions do not support return Win32 unicode / wide filenames. More specifically, the builtin
functions return filenames like "??????_HostID_2006-01-19_213218.xls" when the filename contains unicode / wide characters (same behavior as the DOS "dir" command). The problem is that you can't pass these filenames to something like stat or via OLE to ask Excel to open it.
After much trial and error, I found a solution using Win32::API, the FindFirstFileW, FindNextFileW, and FindClose Win32 API calls -- with an "unpack" workaround suggested by $Bill Luebkert.
use strict; use Win32::API; use Unicode::String; use English; $OUTPUT_AUTOFLUSH=1; $Win32::API::DEBUG = 0; binmode(STDOUT, ":utf8"); use constant ERROR_NO_MORE_FILES => 18; use constant INVALID_HANDLE_VALUE => -1; Win32::API::Struct-> typedef('FILETIME', qw( DWORD dwLowDateTime; DWORD dwHighDateTime; )); # 8 bytes use constant FILE_ATTRIBUTE_READONLY => 0x00000001; use constant FILE_ATTRIBUTE_HIDDEN => 0x00000002; use constant FILE_ATTRIBUTE_SYSTEM => 0x00000004; use constant FILE_ATTRIBUTE_DIRECTORY => 0x00000010; use constant FILE_ATTRIBUTE_ARCHIVE => 0x00000020; use constant FILE_ATTRIBUTE_NORMAL => 0x00000080; use constant FILE_ATTRIBUTE_TEMPORARY => 0x00000100; use constant FILE_ATTRIBUTE_COMPRESSED => 0x00000800; use constant MAX_PATH => 260; Win32::API::Struct-> typedef('WIN32_FIND_DATAW', qw( DWORD dwFileAttributes; FILETIME ftCreationTime; FILETIME ftLastAccessTime; FILETIME ftLastWriteTime; DWORD nFileSizeHigh; DWORD nFileSizeLow; DWORD dwReserved0; DWORD dwReserved1; WCHAR cFileName[520]; WCHAR cAlternateFileName[28]; )); # 4 + 8 x 3 + 4 x 4 + 520 + 28 = 592 bytes # Note: Win32::API::Struct->Typedef should seemingly # allocate 2 bytes per WCHAR, the sizeof('WCHAR'), but it # actually allocates only 1 byte -- same as TCHAR my $FindFirstFile = Win32::API->new('kernel32.dll', 'FindFirstFileW', +'PS', 'N') or die "FindFirstFile: $^E"; my $FindNextFile = Win32::API->new('kernel32.dll', 'FindNextFileW', ' +NS', 'I') or die "FindNextFile $^E"; my $FindClose = Win32::API->new('kernel32.dll', 'FindClose', 'N', +'I') or die "FileClose $^E"; # set your own value here... my $FileSpec = "//?/C:/My Documents/Tool/*.xls"; my $FileInfo = Win32::API::Struct-> new('WIN32_FIND_DATAW'); my $uFileSpec = Unicode::String->new; $uFileSpec->utf8($FileSpec); print "FileSpec = ", $uFileSpec->as_string, "\n"; my $handle = $FindFirstFile-> Call($uFileSpec->utf16le, $FileInfo); if ($handle == INVALID_HANDLE_VALUE) { printf "Error is %d - %s\n", Win32::GetLastError (), Win32::FormatMessage (Win32::GetLastError ()); exit(1); } else { print "FindFirstFile worked\n"; my $count = 1; my $ufn = Unicode::String->new; my $ualtfn = Unicode::String->new; # to get the filename in unicode UTF-16LE format, must # unpack the $FileInfo 'buffer' hash element because # Win32 API doesn't recognize the cFileName hash # element as an UTF-16LE format character string # It treats it as an ASCII Z / null terminated string my ($cFileName, $cAlternateFileName) = unpack( "x44A520A28", $File +Info->{buffer} ); # if length is odd, pad filenames with null byte as utf16le char s +tring byte length must be even # occurs when last UTF-16LE char of filename is ANSI (e.g., "S") a +nd unpack A template strips off following null byte # e.g., "S\0" if (length($cFileName) & 1) { $cFileName .= "\x00"; } if (length($cAlternateFileName) & 1) { $cAlternateFileName .= "\x00"; } $ufn->utf16le($cFileName); $ualtfn->utf16le($cAlternateFileName); print "($count) filename = ", $ufn->as_string, "\n"; print "\talt = ", $ualtfn->as_string, "\n"; while (my $result = $FindNextFile->Call($handle,$FileInfo)) { $count++; ($cFileName, $cAlternateFileName) = unpack( "x44A520A28", $Fil +eInfo->{buffer} ); if (length($cFileName) & 1) { $cFileName .= "\x00"; } if (length($cAlternateFileName) & 1) { $cAlternateFileName .= "\x00"; } $ufn->utf16le($cFileName); $ualtfn->utf16le($cAlternateFileName); print "($count) filename = ", $ufn->as_string, "\n"; print "\talt = ", $ualtfn->as_string, "\n"; } } $FindClose->Call($handle) or die "FindClose $^E"; exit(0);
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Win32 API directory searches that return wide / unicode filenames
by rhesa (Vicar) on Jan 28, 2006 at 15:50 UTC | |
Re: Win32 API directory searches that return wide / unicode filenames
by dsully (Acolyte) on Mar 03, 2006 at 17:01 UTC | |
by BrowserUk (Patriarch) on Mar 03, 2006 at 17:35 UTC | |
by Anonymous Monk on Mar 04, 2006 at 19:26 UTC | |
by BrowserUk (Patriarch) on Mar 04, 2006 at 19:48 UTC | |
Re: Win32 API directory searches that return wide / unicode filenames
by perlslicker (Initiate) on Apr 01, 2006 at 03:46 UTC |
Back to
Cool Uses for Perl