http://www.perlmonks.org?node_id=899502

dhaneypood has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have an idea in mind, which I am trying to implement in my research project:

I have a webserver with two directories and their content as follows:


host/foobar/bar/abc.html
host/foobar/bar/images/def.gif
host/foobar/bar/images/ghi.jpg
host/foobar/bar/jkl.php

host/anotherfoobar/foo/bar/abc.html
host/anotherfoobar/foo/bar/images/def.gif
host/anotherfoobar/foo/bar/images/mno.gif
host/anotherfoobar/foo/bar/xyz.php

Essentially, there is this directory "bar" which is present in foobar and anotherfoobar, just that a few of its files or their names may have been changed.

How can I determine that these two directories are similar/same? There may be directories like bar, and I want to compare only upto 4 levels deep(Right to Left).

I was thinking of doing a wget to fetch entire directories foobar and anotherfoobar to compare their contents, more specifically the path of their contents.

And then analyze each directory name from Right to Left, and try to score how similar they are. How feasible is this?

These folders may also be on different host, like host2/anything/something/.../foobar/bar/all files

UPDATE: Checking out File::DirCompare. Thanks eff!