Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Syntactic Confectionery Delight
 
PerlMonks  

Windows NTFS UTF-16LE File-Operations

by mido (Initiate)
on Feb 20, 2012 at 14:51 UTC ( #955080=perlquestion: print w/ replies, xml ) Need Help??
mido has asked for the wisdom of the Perl Monks concerning the following question:

Hello perl monks,

a perl-noob is seeking your wisdom.

I've tried to write a script, which does a directory traversal (using File::Find), gets the mtime of files and moves them.

But i've run into the problem, that some of the UTF-16le encoded filenames use wide-chars, which cannot be interpreted correctly as UTF-8.

I've found some interesting discussions on this problem but no workaround.

I've read that there was a -C switch on perl < 5.8.1 (afair) which told perl to use the windows wide-char syscalls for filesystem stuff (like FindNextW or CreateFileW). This switch does not exist anymore.

Is there anything new on this situation?

Is there an (easy) workaround?

Or should i give up on trying to do this with perl?

Thanks,
mido

Comment on Windows NTFS UTF-16LE File-Operations
Re: Windows NTFS UTF-16LE File-Operations
by Anonymous Monk on Feb 20, 2012 at 15:03 UTC
    Alas, it's still a ridiculous situation. Not counting the various work-arounds using modules from the Win32/Win32API namespace (tye and ikegami usually give answers involving Windows-specific code, see Super Search), you can use Path::Class::Unicode and PerlIO::fse for reasonably portable code.

      Looking at the code of Path::Class::Unicode, it is broken at least on some non-Windows operating systems (well, actually on some filesystems). It seems to assume that all file systems will export their entities as UTF-8, which is a fairly broad assumption given VFAT and NFS, which only since v4 in 2009 makes claims on the encoding.

Re: Windows NTFS UTF-16LE File-Operations
by nikosv (Hermit) on Feb 20, 2012 at 18:59 UTC
    are you sure File::Find reads utf8 anyway, and not the system default codepage?
Re: Windows NTFS UTF-16LE File-Operations
by BrowserUk (Pope) on Feb 20, 2012 at 19:47 UTC

    See Win32::FindFile.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: Windows NTFS UTF-16LE File-Operations
by repellent (Priest) on Feb 20, 2012 at 21:08 UTC
    Would Win32::GetLongPathName() help?
    use Win32; use File::Find; my @paths; find sub { my $path = Win32::GetLongPathName($File::Find::name); # now use Unicode semantics on $path push(@paths, $path) if $path =~ /\x{ABCD}/; }, "my_dir";

      No. Long path names are nothing to do with unicode or wide characters.

      They are the opposite of "short path names" -- the 8.3 FAT compatility fudge that allows the path 'Program Files' to be accessed as 'PROGRA~1'.

      As such, long path names are actually just the normal path names visible in the NTFS file space.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Hmm, it's just that Win32::GetLongPathName() returns the perl string I'd most expect in Win32-land. By "expect", I mean "jives with what I see in Windows Explorer".

        Using Explorer, I created a file "snowman ☃" in a new folder "my_dir". That file was created by renaming an empty text file with "snowman " first, and then copy+pasting the snowman character. Then I ran the following:

        The results tell me that the return string from Win32::GetLongPathName() is then fit for Unicode semantics in Perl. Nevermind the underlying filesystem encoding of NTFS (UTF-16LE ? I don't know), I can now treat the path as characters from then on.

        Sure, long path names are opposite of short path names. What I'm saying is that Win32::GetLongPathName() is handy to get at the characters instead of octets given by File::Find.
Re: Windows NTFS UTF-16LE File-Operations
by Anonymous Monk on Feb 21, 2012 at 02:19 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://955080]
Approved by ww
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2014-04-20 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (487 votes), past polls