Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Comparing files with same name in different directories

by Tuna (Friar)
on Jul 31, 2001 at 22:45 UTC ( [id://101263]=perlquestion: print w/replies, xml ) Need Help??

Tuna has asked for the wisdom of the Perl Monks concerning the following question:

I'm stuck, here. What I'm trying to do is:
1. recurse through 5 different directory structures. Each directory structure contains identically named files.
2. compare each "file1" found with each of the 4 other "file1"'s and so on. I am comparing them for equal line count.

What I have doesn't work the way I want. Running this code in my test environment returns the following

/home/maleah/sa/current/cluster1 file: test1: 12 file: test2: 24 file: config.hosts: 25 /home/maleah/sa/current/cluster2 file: test1: 37 file: test2: 49 file: config.hosts: 50 /home/maleah/sa/current/cluster3 file: test1: 62 file: test2: 74 file: config.hosts: 75 /home/maleah/sa/current/cluster4 file: test1: 87 file: test2: 99 file: config.hosts: 100 /home/maleah/sa/current/cluster5 file: test1: 112 file: test2: 124 file: config.hosts: 125
#!/usr/bin/perl -w use strict; my $prefixDir = "$ENV{HOME}/sa/current"; my @clusterDirs= ("$prefixDir/cluster1", "$prefixDir/cluster2", "$pref +ixDir/cluster3", "$prefixDir/cluster4", "$prefixDir/cluster5"); use vars qw ($configDir $configFile $lineCount $configFile $fileToComp +are @configFileList); foreach $configDir (@clusterDirs) { opendir DIR, "$configDir" or die "Cannot access $configDir: $!\n"; @configFileList = grep /[^.]/, readdir DIR;print "$configDir\n"; foreach $fileToCompare (@configFileList) { open FILE, "$configDir/$fileToCompare" or die "file not available: $ +!\n"; $lineCount++ while <FILE>; print "file: $fileToCompare: $lineCount\n"; } }

Replies are listed 'Best First'.
Re: Comparing files with same name in different directories
by tachyon (Chancellor) on Jul 31, 2001 at 23:00 UTC

    By does not work I presume you mean that your line counter increments ever upward. You just need to zero it after each file. This will work

    #!/usr/bin/perl -w use strict; my $prefixDir = "$ENV{HOME}/sa/current"; my @clusterDirs= ("$prefixDir/cluster1", "$prefixDir/cluster2", "$pref +ixDir/cluster3", "$prefixDir/cluster4", "$prefixDir/cluster5"); use vars qw ($configDir $configFile $lineCount $configFile $fileToComp +are @configFileList); foreach $configDir (@clusterDirs) { opendir DIR, "$configDir" or die "Cannot access $configDir: $!\n"; @configFileList = grep /[^.]/, readdir DIR;print "$configDir\n"; foreach $fileToCompare (@configFileList) { $lineCount = 0; #zero our line count for each file open FILE, "$configDir/$fileToCompare" or die "file not available: $ +!\n"; $lineCount++ while <FILE>; print "file: $fileToCompare: $lineCount\n"; } }

    I would recommend using my over use vars/our as my lexically scopes variables whereas vars and our generate globals. MJD has a great tute at http://perl.plover.com/FAQs/Namespaces.html

    Here is a quick recode using my. No globals are required. I've also snipped a bit of redundant code.

    my $prefixDir = "$ENV{HOME}/sa/current"; my @clusterDirs= qw(cluster1 cluster2 cluster3 cluster4 cluster5); foreach my $configDir (@clusterDirs) { opendir DIR, "$prefixDir/$configDir" or die "Cannot access $prefix +Dir/$configDir: $!\n"; my @configFileList = grep /[^.]/, readdir DIR; print "$configDir\n"; foreach my $fileToCompare (@configFileList) { my $lineCount = 0; open FILE, "$prefixDir/$configDir/$fileToCompare" or die "file + not available: $!\n"; $lineCount++ while <FILE>; close FILE; print "file: $fileToCompare: $lineCount\n"; } }

    In my opinion once variable names get too long they start to obfuscate the meaning of the code and are also more prone to typos. I try for 7 or less chars in a var name. I got a bit carried away and generated a 2D hash to store the results and then print the results in a formatted HTML table for easy comparison:

    #!/usr/bin/perl -w use strict; my $base_dir = "$ENV{HOME}/sa/current"; my @sub_dir = qw(cluster1 cluster2 cluster3 cluster4 cluster5); my (%all_files, %file); foreach my $sub_dir (@sub_dir) { opendir DIR, "$base_dir/$sub_dir" or die "Cannot access $base_dir/ +$sub_dir: $!\n"; my @files = grep /[^.]/, readdir DIR; foreach my $file (@files) { my $line_count = 0; open FILE, "$base_dir/$sub_dir/$file" or die "file not availab +le: $!\n"; $line_count++ while <FILE>; close FILE; $file{$sub_dir}{$file} = $line_count; $all_files{$file}++; } } # print out a nice HTML table print "<table border='2'>\n"; print "<tr>\n <td>File name</td>\n"; print " <td>$_</td>\n" for @sub_dir; print "</tr>\n"; for my $file (keys %all_files) { print "<tr>\n"; print " <td>$file</td>\n"; for my $sub_dir (@sub_dir) { # avoid undefined warnings if file does not exist # in one or more of our subdirs my $file_size = $file{$sub_dir}{$file} || "undef"; print " <td>$file_size</td>\n"; } print "</tr>\n"; } print "</table>\n";

    Hope this helps. cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Comparing files with same name in different directories
by scain (Curate) on Jul 31, 2001 at 22:54 UTC
    So, are the same-name files in the different directories the same, and therefore you expect that the numbers reported after each to be the same? You were not clear on what was in your test set.

    Anyway, a few points:

    • I tend to explicitly close filehandles when I am done with them, so there is no question as to its state
    • would a file test operation, a la -s, work better? Is it really line numbers you are interested in, or identical-ness?
    • and the biggie, $lineCount is not getting reset!
    Scott

    Update to fix typo and add perlfunc link--Ugh, it took about 10 iterations to figure out that %3A is :

Re: Comparing files with same name in different directories
by Cubes (Pilgrim) on Jul 31, 2001 at 22:56 UTC
    You're using $lineCount as a global, and not resetting it between files so it just keeps going up and up and up.
Re: Comparing files with same name in different directories
by Beatnik (Parson) on Aug 01, 2001 at 00:03 UTC
    I hate to be all 'nouveau' with this stuff but...Algorithm::Diff and it's grandaddy diff (as in shell) come to mind.

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
Re: Comparing files with same name in different directories
by Rudif (Hermit) on Aug 01, 2001 at 21:00 UTC
    Tuna, you said

    What I'm trying to do is:
    1. recurse through 5 different directory structures. Each directory structure contains identically named files.

    'Recurse' seems to imply that your directory structures might have subdirectories that you wish to recurse into.

    If this is the case, or even if not, a solution based on File::Find might fit your needs. For example:

    #!perl -w use strict; use File::Find; my $prefixDir = "$ENV{HOME}/sa/current"; my @clusterDirs= ("$prefixDir/cluster1", "$prefixDir/cluster2", "$pref +ixDir/cluster3", "$prefixDir/cluster4", "$prefixDir/cluster5"); for my $dir ( @clusterDirs ) { print "$dir\n"; find( sub { return unless -T; printf "file: $_: %d\n", -s; # file name only # printf "file: $File::Find::name: %d\n", -s; # full path }, $dir); }

    Cheers, Rudif

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://101263]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-05-22 15:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found