Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Just another Perl shrine
 
PerlMonks  

Re: Windows NTFS UTF-16LE File-Operations

by repellent (Priest)
on Feb 20, 2012 at 21:08 UTC ( #955148=note: print w/ replies, xml ) Need Help??


in reply to Windows NTFS UTF-16LE File-Operations

Would Win32::GetLongPathName() help?

use Win32; use File::Find; my @paths; find sub { my $path = Win32::GetLongPathName($File::Find::name); # now use Unicode semantics on $path push(@paths, $path) if $path =~ /\x{ABCD}/; }, "my_dir";


Comment on Re: Windows NTFS UTF-16LE File-Operations
Select or Download Code
Re^2: Windows NTFS UTF-16LE File-Operations
by BrowserUk (Pope) on Feb 20, 2012 at 21:56 UTC

    No. Long path names are nothing to do with unicode or wide characters.

    They are the opposite of "short path names" -- the 8.3 FAT compatility fudge that allows the path 'Program Files' to be accessed as 'PROGRA~1'.

    As such, long path names are actually just the normal path names visible in the NTFS file space.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Hmm, it's just that Win32::GetLongPathName() returns the perl string I'd most expect in Win32-land. By "expect", I mean "jives with what I see in Windows Explorer".

      Using Explorer, I created a file "snowman ☃" in a new folder "my_dir". That file was created by renaming an empty text file with "snowman " first, and then copy+pasting the snowman character. Then I ran the following:

      The results tell me that the return string from Win32::GetLongPathName() is then fit for Unicode semantics in Perl. Nevermind the underlying filesystem encoding of NTFS (UTF-16LE ? I don't know), I can now treat the path as characters from then on.

      Sure, long path names are opposite of short path names. What I'm saying is that Win32::GetLongPathName() is handy to get at the characters instead of octets given by File::Find.

        Hm. Doesn't seem to work for me:

        C:\test\junk>dir
        12/11/2010 05:58 7 acentó.txt
        20/11/2010 09:46 <DIR> ελληνικά
        1 File(s) 7 bytes
        3 Dir(s) 236,893,585,408 bytes free

        C:\test\junk>perl -E"say for glob '*'"
        acent¾.txt
        DC44~1

        C:\test\junk>perl -E"say Win32::GetLongPathName( $_ ) for glob '*'"
        acent¾.txt
        Wide character in print at -e line 1.
        ╬Á╬╗╬╗╬À╬¢╬╣╬║╬¼

        (Code tags deliberately omitted to ensure that you can see exactly what I see on my console.)

        Conversely, Win32::FindFile does work for me:


        C:\test\junk>perl -E"say for glob '*'"
        acent�.txt
        DC44~1

        C:\test\junk>perl -E"say Win32::GetLongPathName( $_ ) for glob '*'"
        acent�.txt
        Wide character in print at -e line 1.
        ελληνικά
        ικά


        C:\test\junk>perl -C0 -MWin32::FindFile -E"say for FindFile( '*' )"
        .
        ..
        acentó.txt

        ελληνικά
        ικά

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://955148]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2014-04-16 04:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (413 votes), past polls