Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Simple Path Cleanup

by wesley.spikes (Initiate)
on May 10, 2009 at 08:13 UTC ( #763111=perlquestion: print w/ replies, xml ) Need Help??
wesley.spikes has asked for the wisdom of the Perl Monks concerning the following question:

I know this is probably fairly simple, but I was unable to find any information about this topic (more likely than not, I just didn't know how to word it such that search engines would share the love).

How can I clean up a path name to sanity check it? As a preliminary, I know I could probably do some crazy regexp, or I could simply chdir then Cwd::cwd() it to get the path, but these options are likely quite time consuming, and the second would not work if the folders don't exist or are inaccessible.

The reason for this request is that I must ensure that I'm not using File::Path::remove_tree on "/" or any other major directory. The paths I'm generating are already fully qualified and are arguably safe, but I'd rather be safe than sorry. :)

Thanks in advance!

EDIT: Sorry for the lack of information in the post. It was 2AM and I thought I had put it in. Basically, I'm concerned about a classic security vulnerability existing where it may be possible to inject a path name that could include the up-directory marker in the path (".."), and by using such a hack, to go up to the root of the drive.

/project_dir/various_folders -- the folders i need to delete
/project_dir/build/myscript.pl

It may be possible under certain conditions for project_dir or a folder name to contain "fn/../../../../../../../" and manage for the script to incidentally remove the root folder.

Comment on Simple Path Cleanup
Re: Simple Path Cleanup
by CountZero (Bishop) on May 10, 2009 at 08:25 UTC
    You will need the realpath-function from Cwd to transform your path through all links and turns and twists to its most simple real form.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Simple Path Cleanup
by generator (Monk) on May 10, 2009 at 08:28 UTC
    I can't speak for others but I don't understand what you are asking.

    What do you mean by:

    "clean up a path name to sanity check it"...

    A more specific question might elicit a quicker response.

    <><

    generator

Re: Simple Path Cleanup
by Your Mother (Canon) on May 10, 2009 at 08:28 UTC

    Play around with Path::Class-

    perl -MPath::Class -le 'print file(+shift)->cleanup' [path]

    For example-

    perl -MPath::Class -le 'print file(+shift)->cleanup' /foo//baz/./foo.t +xt /foo/baz/foo.txt

    It has become one of my favorite families of modules.

      It sounds to me (I could be misunderstanding, though) like the OP wants to avoid removing some major directory. So lets say the input path is something like

      /home/whoever/tmp/foo/mydir

      where mydir is a symlink to /usr/local or some such, then Path::Class's cleanup would leave it as is, and /usr/local would get removed, even though the OP checks $path !~ m#^/usr/local# (presuming appropriate permissions, of course).  AFAIK, Path::Class doesn't handle symlinks, whereas Cwd's realpath would.

        Thank you, that actually jogged me into remembering to check link status before running remove_path. That would've been a pretty big mistake.

        You're right. Good clarification.

Re: Simple Path Cleanup
by ig (Vicar) on May 10, 2009 at 09:22 UTC

    Using cd/cwd will follow soft links in the path and yield a path without soft links. As you say, there are other ways to deterime such path without changing the current working directory, but it is not clear that they will be any faster as the same underlying operations will be required. You may find that the system functions are as fast as any available.

    Hard links are not so easily dealt with. Any path through hard links is equally valid and there will be no easy way to distinguish between a seemingly innocuous path (e.g. /some/irrelevant/file ) and the path of a critical file (e.g. /etc/passwd or / ). With hard links, you do have the certainty that they do not traverse file systems, so you can at least reliably determine what file system a path refers to. This might be adequate if, for example, all the paths you are checking should be on a file system that contains non-critical data only. Otherwise, it is hard to imagine any certain solution other than comparing the inode number of your subject path with the inode numbers of all critical files and directories on the same file system.

      Hard links are not so easily dealt with...

      Luckily, most OSes don't allow hard links to directories.  I've heard Mac OS X (at least "Leopard") is an exception, though...

        It is a long time since I last deluded myself that a hard link to a directory would be a good thing. Even then it was discouraged due to concerns with infinite loops in directory traversal. IMHO it is a good thing that it is more strongly prohibited in most systems these days.

        I did come across a simple algorithm for resolving paths to a "canonical" path even if hard links to directories are allowed: follow all the links to the ultimate directory, then follow the chain of ".." back to the root directory. This assumes only that ".." in every directory is set to the "canonical" parent of that directory, despite any other links that might exist.

        If you are a bit paranoid (or work with people like I was in years gone by, who like to do things like making hard links to directories and other twisted manipulations of the file system) you might also be concerned about multiple mounts of file systems and directories that are possible on some systems, particularly with some loopback file systems. Usually one would only have one read/write mount at a time, but I understand multiple read/write mounts are possible in some cases.

Re: Simple Path Cleanup
by sgifford (Prior) on May 11, 2009 at 03:57 UTC

    What you will need to do to avoid an unsafe file or path name will depend on what sorts of file names you are expecting, and what areas you consider safe. If your filenames are simple, just requiring that they match ^\w+$ should be OK. If they can contain more characters than that, you will need something more complicated, but the basic things are to exclude / and make sure your filename isn't all dots.

    That's likely to be easier than trying to safely open a pathname and then make sure it is safe.

    Also, consider using taint mode in your script. That would require you to sanitize filenames before opening them, halting your program if you forgot. It's a good safety net for any program manipulating the filesystem on behalf of Internet users.

    Hope this helps! by assuming the directory is safe and then figuring out where you really are in the filesystem to make sure it's not what you expect.

Re: Simple Path Cleanup
by ikegami (Pope) on May 11, 2009 at 04:05 UTC

    The reason for this request is that I must ensure that I'm not using File::Path::remove_tree on "/" or any other major directory

    In unix, you can compare the path's dev+inode (as returned by stat/lstat) against those of your major directories (and their parents) to see if they're the same path. This will work for hard links whereas realpath only works with symlinks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://763111]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2014-08-29 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (286 votes), past polls