Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Think about Loose Coupling
 
PerlMonks  

Similarities in Different Directories

by dhaneypood (Novice)
on Apr 14, 2011 at 18:10 UTC ( #899502=perlquestion: print w/ replies, xml ) Need Help??
dhaneypood has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have an idea in mind, which I am trying to implement in my research project:

I have a webserver with two directories and their content as follows:


host/foobar/bar/abc.html
host/foobar/bar/images/def.gif
host/foobar/bar/images/ghi.jpg
host/foobar/bar/jkl.php

host/anotherfoobar/foo/bar/abc.html
host/anotherfoobar/foo/bar/images/def.gif
host/anotherfoobar/foo/bar/images/mno.gif
host/anotherfoobar/foo/bar/xyz.php

Essentially, there is this directory "bar" which is present in foobar and anotherfoobar, just that a few of its files or their names may have been changed.

How can I determine that these two directories are similar/same? There may be directories like bar, and I want to compare only upto 4 levels deep(Right to Left).

I was thinking of doing a wget to fetch entire directories foobar and anotherfoobar to compare their contents, more specifically the path of their contents.

And then analyze each directory name from Right to Left, and try to score how similar they are. How feasible is this?

These folders may also be on different host, like host2/anything/something/.../foobar/bar/all files

UPDATE: Checking out File::DirCompare. Thanks eff!

Comment on Similarities in Different Directories
Re: Similarities in Different Directories
by eff_i_g (Curate) on Apr 14, 2011 at 18:38 UTC

    For files on the same server I would use File::DirCompare. Perhaps IPC::PerlSSH will do the trick across servers or you could use regular ol' SSH and gather listings and checksums. Beyond that I'm not sure, other than your approach: transferring the files to one server.

    Update: Or mount a share (if feasible).

Re: Similarities in Different Directories
by jpl (Monk) on Apr 14, 2011 at 21:10 UTC

    If you cannot rely on identical files having identical names, then you'll probably want to use some sort of digest function that gives identical file (contents) identical digest values. Then, at least you can compare only the files with the same digest, since those with differing digests cannot be identical. I haven't used the CPAN digest functions myself, but Gisle Aas is a reliable contributer (to say the least), so Digest::file is a good place to look for something that will do a fine job.

      Yeah, I thought of using digest for the files, in fact, I am already doing that to check for same files with different names, but I am more interested in Directory Structure, more like can I compare two directory paths, and try and conclude they are the similar.

      And its not just one directory I am trying to find a match, I wish I can store a kind of map for some directories in db and search for similar maps.
      Thanks. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://899502]
Approved by eff_i_g
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-04-20 07:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls