http://www.perlmonks.org?node_id=999160

I have written a suit of 18 programs in Perl to process data in a folder called 'project/data'. During the development I realized that for each stage of the downstream analysis I'd better have intermediate data distributed into other folders for convenience within the same parent directory 'project', so I created folders such as 'project/test', 'project/regions' and 'project/report', data is sequentially processed in 'test' then in 'regions' then in 'report'.

Now, inside each one of those there is a Perl program that does something or the other with the data. For the sake of trace-ability I moved the programs into one location, 'project/PerlCode'. When these programs were written I have worked with absolute paths so I go passing the entire path qualifier for a program that reads or writes right from 'root' all the way to 'project' all the way to that specific folder where the program does its work.

Apparently, not the nicest of designs to be adopted, for better portability I will like to have this suite of programs create the downstream folders automatically and cohesively, easily done, but above all, the biggest/difficult part is, I want these programs to be oblivious to the directory structure and be contained in their own 'project' folder that regardless of which machine they run on they will just perform out of the box. Maybe this requirement is strongly suggestive of a modification from an absolute path dichotomy to a relative path one.

A potential candidate I have on the table is FindBin, yet it has issues that do with competitive calling (portentially solvable according to the docs). Another approach is to have a MasterScript.pl that calls each of these programs from 'PerlCode' folder, yet, each one of these programs should know which folder it gotta write or read into, how do I achieve this and avoid explicit calling of the entire path and make these programs aware of their immediate surrounding structures regardless of where in the directory tree the folder 'project' is placed ?.

Any tips, suggestions, alternative approaches and quick-fixes thereof are heartily welcome.


David R. Gergen said "We know that second terms have historically been marred by hubris and by scandal." and I am a two y.o. monk today :D, June,12th, 2011...

Replies are listed 'Best First'.
Re: Directory independent processing
by moritz (Cardinal) on Oct 15, 2012 at 19:53 UTC
    and make these programs aware of their immediate surrounding structures regardless of where in the directory tree the folder 'project' is placed ?.

    "regardless" is a rather strong word; if your scripts should really find their project directory on their own, without any helping assumptions or config files, they'd have to recursively search all directories -- in general that's a very bad idea.

    I know several approaches that some programs out there in the wild take (and often they take several of those in combination), listed in no particular order:

    • Assume that the current working directory is the project dir
    • Search the current directory and all of its parents for special files/directories. For example git, the distributed version control system, does that for finding repositories
    • Search in hard-coded places
    • Search for config files in hard-coded places, and read them to determine where to go
    • Search in directories relative to your home directory
    • Search relatively to the installation path of the binary
    • Use environment variables and/or command line options to find project dirs.

    In the end, it all depends on what you want, and on the usualy work flow. Maybe you like some of the options I've listed.

      The programs are in 'project/PerlCode' as well as the other folders 'test', 'data' and 'report', all of these are in 'project' so if I placed the directory 'project' anywhere it will carry with it the folder 'PerlCode' and the rest.

      Assuming that the current working directory is 'project' is a very plausible approach that I have partly implemented. I could probably hard-code the rest of the folders within CWD as they are.

      (Oct, 30th) UPDATE: I have implemented this through Cwd. Trying the other suggestions such as File::Spec threw errors that I could not solve. So through Cwd I basically could achieve creating new folders within 'project', moving files across these folders, and call other scripts that are saved in 'PerlCode' within 'project, it sounds fine and it performs the way I expect it to and ports well ...

      use Cwd; my $dir = getcwd; mkdir("$dir/output", 0777); open (my $fh, "<","$dir/data/file.txt") or die ("$!"); open(my $ofh, ">", $dir/output/outfile.txt) or die("$!"); while(my $line = <$fh>){ #do stuff such as process load into a data structure, regex #write to $ofh }


      David R. Gergen said "We know that second terms have historically been marred by hubris and by scandal." and I am a two y.o. monk today :D, June,12th, 2011...
Re: Directory independent processing
by Tanktalus (Canon) on Oct 16, 2012 at 23:21 UTC

    Generally, I use moritz' second-last listed approach, "Search relatively to the installation path of the binary" (well, of the script).

    use File::Spec; my $project_dir = File::Spec->catdir( File::Spec->rel2abs(__FILE__), F +ile::Spec->updir() );
    I put that in my main script, right near the top, maybe in a BEGIN block if I have to:
    use File::Spec; my $project_dir; BEGIN { $project_dir = File::Spec->catdir( File::Spec->rel2abs(__FILE_ +_), File::Spec->updir() ); }
    and then everything is relative to that. I used to use this to create my @INC path, but have recently been shown rlib which takes care of that for me now :-)

Re: Directory independent processing
by Anonymous Monk on Oct 15, 2012 at 19:45 UTC