Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^8: Comparing Values PER Sub-folder

by omegaweaponZ (Beadle)
on Sep 05, 2012 at 02:49 UTC ( #991728=note: print w/ replies, xml ) Need Help??


in reply to Re^7: Comparing Values PER Sub-folder
in thread Comparing Values PER Sub-folder

One more question!!!! Regarding the final component to this, I'm attempting to locate the best method to compare and contrast per directory, correct? So the print output would be something like this:

Cur dir: /folder/blahA/blah1; Cur file: file.txt; Num Lines: 8
Cur dir: /folder/blahA/blah2; Cur file: file.txt; Num Lines: 8
Cur dir: /folder/blahA/blah3; Cur file: file.txt; Num Lines: 8
Cur dir: /folder/blahA/blah4; Cur file: file.txt; Num Lines: 8
Cur dir: /folder/blahA/blah5; Cur file: file.txt; Num Lines: 8
Cur dir: /folder/blahA/blah6; Cur file: file.txt; Num Lines: 8

Cur dir: /folder/blahB/blah1; Cur file: file.txt; Num Lines: 10
Cur dir: /folder/blahB/blah2; Cur file: file.txt; Num Lines: 10
Cur dir: /folder/blahB/blah3; Cur file: file.txt; Num Lines: 10
Cur dir: /folder/blahB/blah4; Cur file: file.txt; Num Lines: 12
Cur dir: /folder/blahB/blah5; Cur file: file.txt; Num Lines: 10
Cur dir: /folder/blahB/blah6; Cur file: file.txt; Num Lines: 10

So the offending folder of blahB has one text file that doesn't match up to the rest. How can I detect and alert on that? Basically, I want to set True if all num lines are equal to each other per directory and False if they don't


Comment on Re^8: Comparing Values PER Sub-folder
Re^9: Comparing Values PER Sub-folder
by Kenosis (Priest) on Sep 05, 2012 at 04:43 UTC

    Well, I don't know about "the best method," but here's one method:

    use Modern::Perl; my @same = (8) x 8; # an array of eight 8s my @notSame = qw/ 10 10 10 12 10 10 /; say "The array's elements are" . ( sameArrayElements(@same) ? '' : ' not' ) . ' the same.'; sub sameArrayElements { my %hash = map { $_ => 1 } @_; keys %hash == 1 ? 1 : 0; }

    If you send the subroutine sameArrayElements an array, it'll return 1 (true) if the array's elements are identical, and 0 (false) otherwise (even if an empty array is sent). As shown above, the output is:

    The array's elements are the same.

    Try sameArrayElements with @notSame.

    The subroutine uses map to iterate through all elements of the sent array, populating %hash with key/value pairs (the $_ => 1 notation). If the array elements are identical, there should be only one key--and the last line in the subroutine tests for that.

    Thus, if you pushed each numLines from a dir read onto an array, and then sent that array to sameArrayElements, it'll tell you whether all files in that dir have the same number of lines or not. (Be sure the array's empty, e.g, by using my @array;, before pushing values onto it.)

      Makes sense, but I have absolutely no idea how to do that! :) (newbie Perl writer here...) So my code as it stands now is:
      find(\&countLines, $dir); sub countLines { /\.txt$/ or return; my $completePath = $File::Find::name; my $curDir = $File::Find::dir; my $curFile = $_; my @lines = read_file( $curFile ) ; my $numLines = @lines; print "Cur dir: $curDir; Cur file: $curFile; Num Lines: $numLines +\n"; print "The array's elements are" . ( sameArrayElements(@lines) ? '' : ' not' ) . ' the same.'; sub sameArrayElements { my %hash = map { $_ => 1 } @_; keys %hash == 1 ? 1 : 0; } }
      Obviously @lines won't work since its matching up against only 1 array value per file...so how would you bulk each array value into a blank array PER directory....

        The following pieces together the script segments:

        use strict; use warnings; use File::Find; use File::Slurp qw/read_file/; my $startDir = '.'; my %dirLines; find( { wanted => \&countLines, }, $startDir ); for my $dir ( sort keys %dirLines ) { my $sameResults = sameArrayElements( @{ $dirLines{$dir} } ); print "The files in directory '$dir' do" . ( $sameResults ? '' : ' not' ) . " have the same number of lines.\n"; } sub countLines { /\.txt$/ or return; #my $completePath = $File::Find::name; my $curDir = $File::Find::dir; my $curFile = $_; my @fileLines = read_file $curFile; my $numLines = @fileLines; push @{ $dirLines{$curDir} }, $numLines; #say "Cur dir: $curDir; Cur file: $curFile; Num Lines: $numLines"; } sub sameArrayElements { my %hash = map { $_ => 1 } @_; keys %hash == 1 ? 1 : 0; }

        Notice that we now have my %dirLines; at the top. We'll use this hash for a hash of arrays (HoA), where the key will be the directory path and the associated value is an array whose elements are file numLines.

        The following was added to the countLines subroutine:

        push @{ $dirLines{$curDir} }, $numLines;

        $dirLines{$curDir} is our hash and the value of $curDir is used as a key. The enclosing @{ } notation says to treate this as an array (we'll see this later, too), and then we push the value of $numLines onto that array.

        Data::Dumper was used to help visualize our hash's data structure after traversing the directories:

        $VAR1 = { './test/test bbb' => [ 6, 6, 6, 6 ], './test' => [ 6, 2, 6, 1 ], '.' => [ 6, 2, 6, 10000, 1, 63, 10, 15, 6, 21, 647, 10, 5, 28, 407, 2390, 11, 6, 513, 181, 1, 2, 360, 3 ] };

        Curley braces {} mean a hash; square brackets [] mean an array. From the output above, you can see the association between a key (directory path) and an array (a list of file line numbers) as a value.

        The next step is to process our hash, iterating through its keys--one at a time--and that starts as follows:

        for my $dir ( sort keys %dirLines ) { my $sameResults = sameArrayElements( @{ $dirLines{$dir} } ); ...

        Each sorted key of %dirLines is assigned to $dir. Next is the same notation seen earlier, viz., @{ $dirLines{$dir} }. The directory's associated array of numLines is sent to sameArrayElements to check for element sameness. The following line prints the result of this check:

        The files in directory '.' do not have the same number of lines. The files in directory './test' do not have the same number of lines. The files in directory './test/test bbb' do have the same number of li +nes.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://991728]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (12)
As of 2014-09-02 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (22 votes), past polls