Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese

by salva (Canon)
on Jan 20, 2015 at 11:04 UTC ( [id://1113857]=note: print w/replies, xml ) Need Help??


in reply to Re: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
in thread Seeking help for copying recursive folders having some folder/file names in Chinese or japanese

At the begining of the document it says:
Windows stores filenames in Unicode, encoded in UTF16

That's not completely right. NTFS (as most Unix/Linux file-systems) is encoding-agnostic. It just see filenames as arrays of wchar_t integers that are not required in any way to be valid UTF-16 sequences.

For most C/C++ applications that can handle wchar_t data directly this is a non issue, but for Perl it is because those file names which are not valid UTF-16 are not convertible to UTF-8 and modules like Win32::Unicode that do that conversion internally will fail on them.

Admittedly, for most scripts this is not an issue as no sane application creates (or lets the user create) files with names that are not valid UTF-16. But still malicious or just buggy software may do it.

Update: Well, NTFS is not completely encoding-agnostic because it is case-insensitive. It has the metadata file $UpCase that defines how wchar_t characters are converted to upper case.

  • Comment on Re^2: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
  • Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1113857]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-19 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found