Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Unify windows filenames

by ELISHEVA (Prior)
on Sep 20, 2009 at 18:32 UTC ( [id://796402]=note: print w/replies, xml ) Need Help??


in reply to Unify windows filenames

At present I don't know that there is a good out-of-the box solution to this problem if you really want all platforms and absolute uniqueness.

First, if you really need to verify that two paths point to the same file, there are more factors to consider than just path name syntax. On Win32 systems, every file has several different names. Depending on the application providing input to your program a file might be identified by any of the following:

  • a short path name using 8.3 notation
  • a long "case-preserving" path name
  • one or more UNC path names, e.g. \\MyMachine\C$\Public\Foo.txt or \\Public\Foo.txt
  • a device path names, e.g. \\.\HarddiskVolume1\Public\Foo.txt
  • multiple NT path names, e.g. \DosDevice\C:\Public\Foo.txt, \Device\C:\Public\Foo.txt or \??\C:\Public\Foo.txt
  • a pathname beginning with '\\?\', e.g. \\?\UNC\Public\Foo.txt or \\?\C:\Public\Foo.txt

Note that all of the above paths are 'case preserving' but case insensitive. That means you can safely lower case the entire path name and Win32 will still be able to find the files. In addition XP path names can represent "junction points" (roughly equivalent to hard links). Starting with Vista, symbolic links to files and directories are also supported. *nix systems don't have the huge range of path name syntax, but the same file can still have a variety of names via hard links and symbolic links. On Cygwin systems, you also have to take into account mounted paths: any Win32 drive or directory can be mounted as a *nix path, e.g. "/cygdrive/c/Public/foo.txt" and "C:\Public\foo.txt" might refer to the same file.

Some, but not all, of these issues can be handled in a portable way using Cwd::abs_path. It doesn't handle all of the Win32 path variants and there is a reported bug for mounted Unix drives (see the bugs link on the right on the Cwd page for details) . Using the routine on a mounted drive may fail if changing the current directory to a mounted drive changes the effective GID or UID. Additionally, it relies on File::Spec for path normalization.

If you are only interested in normalizing path names (rather than identifying the "official" name for a file), the solution recommended by most Perl documentation, including perlport is to use the File::Spec module for portability. For portability between *nix and Win32 it appears to be fairly reliable. However, if you start including platforms like Cygwin, VMS, Darwin, and Mac Classic, some of its decisions may not be entirely portable. Path::Class and Path::Classy are both wrappers around File::Spec so many of the same issues apply.

If you do use, File::Spec, you can use File::Spec->canonize() to make path name syntax more regular. File::Spec tends to assume that all paths can and should be converted to the syntax of *nix paths. This can sometimes produce incorrect results:

  • Cygwin: Cygwin is a *nix-alike layer that runs on Win32 systems. Cygwin's implementation of canonize treats '/' as the canonical separator and this can sometimes produce illegal paths. Cygwin supports both POSIX (*nix) and Win32 style paths. Win32 style paths need to keep at least one backslash ('\') in the path or else Cygwin won't be able to recognize that the path is meant to be a Win32 style path. Without the backslash it reads 'C:/foo' as the relative path starting with a directory that just happens to be named 'C:'. Of course, such a directory is unlikely to exist so you'll get path not found errors if you canonize Win32 style paths using the File::Spec::Cygwin module.

  • VMS: VMS path syntax is very different from either Win32 or Unix. For example, A::B:[C.D.E]F.DAT;32 would mean the 32nd version of the F.DAT file found in the directory "C.D.E" (which is /C/D/E in *nix paths) on the node B within the host A.

    Modern VMS systems can mount both case sensitive and case insensitive drives. If you look at the bug list for File::Spec you will see a lot of discussion about how to deal with this but no entirely satisfactory solutions. Another problem has arisen with the introduction of the ODS5 file system. The implementation of File::Spec::VMS also tries to unixify paths before canonizing them. On ODS2 (the older native path syntax used on VMS) there were only a limited number of characters and one could reliably convert back and forth between *nix and VMS style paths. However, ODS5 allows for a much wider range of pathname characters and there is no way to do lossless path conversions. "..." on *nix could mean the ODS5 path ^.^.^., ^..^., or ^.^..

    Even in ODS2 there were conversion problems. The code that converts paths to *nix syntax needs to tell the difference between *nix paths and VMS paths, but there are certain paths that are ambiguous "perl_5.8.10" could mean the *nix executable (or directory) "perl_5.8.10" or it could mean version 10 of the VMS file "perl_5.8". The ambiguity arises because ODS2 uses "." as both a separator between file name and extension and as a separator between file name+ext and file version number.

  • Macs: Mac machines have many of the same problems as Cygwin and VMS. Older versions of Mac (Mac Classic) supported only Apple's native HFS path name syntax which uses ":" as a path name separator. It also has a few other odditities: you can't represent rooted paths without specifying a disk drive; the equivalent of 'a/..' in *nix is ":a::", among others. The newer version of the Mac operating system (known as Darwin or Mac OSX), supports both the older HFS paths and *nix paths.

    perlport classifies Darwin as a *nix-alike, but both path syntaxes are used. In particular, paths being fed to Perl from a user interface application are likely to be in HFS format. There is no 100% reliable way to tell which path is *nix and which is HFS because the HFS path separator ':' is a valid character is *nix file names and the *nix separator '/' is a valid character in HFS file names. That is, 'HD1:May/2009' could be an absolute HFS path identifying a file named 'May/2009' on drive "HD1" or a relative *nix path identifying a file named "2009" in the directory named "HD1:May".

    Depending on how you configure the system and the file system installed on each of your disk partitions, the *nix paths can be either case sensitive or case insensitive. As discussed above by graff and YourMother, File::Spec doesn't seem to take this into account.

Best, beth

Update: added a discussion of Cwd::abs_path.

Replies are listed 'Best First'.
Re^2: Unify windows filenames
by Anonymous Monk on Sep 21, 2009 at 00:30 UTC
    There is a Win32::AbsPath, in the case that Cwd::abs_path() doesn't fit the solution.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://796402]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-25 09:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found