Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^10: Comparing Values PER Sub-folder

by omegaweaponZ (Beadle)
on Sep 05, 2012 at 20:20 UTC ( #991937=note: print w/ replies, xml ) Need Help??


in reply to Re^9: Comparing Values PER Sub-folder
in thread Comparing Values PER Sub-folder

Makes sense, but I have absolutely no idea how to do that! :) (newbie Perl writer here...) So my code as it stands now is:

find(\&countLines, $dir); sub countLines { /\.txt$/ or return; my $completePath = $File::Find::name; my $curDir = $File::Find::dir; my $curFile = $_; my @lines = read_file( $curFile ) ; my $numLines = @lines; print "Cur dir: $curDir; Cur file: $curFile; Num Lines: $numLines +\n"; print "The array's elements are" . ( sameArrayElements(@lines) ? '' : ' not' ) . ' the same.'; sub sameArrayElements { my %hash = map { $_ => 1 } @_; keys %hash == 1 ? 1 : 0; } }
Obviously @lines won't work since its matching up against only 1 array value per file...so how would you bulk each array value into a blank array PER directory....


Comment on Re^10: Comparing Values PER Sub-folder
Download Code
Re^11: Comparing Values PER Sub-folder
by Kenosis (Priest) on Sep 05, 2012 at 21:34 UTC

    The following pieces together the script segments:

    use strict; use warnings; use File::Find; use File::Slurp qw/read_file/; my $startDir = '.'; my %dirLines; find( { wanted => \&countLines, }, $startDir ); for my $dir ( sort keys %dirLines ) { my $sameResults = sameArrayElements( @{ $dirLines{$dir} } ); print "The files in directory '$dir' do" . ( $sameResults ? '' : ' not' ) . " have the same number of lines.\n"; } sub countLines { /\.txt$/ or return; #my $completePath = $File::Find::name; my $curDir = $File::Find::dir; my $curFile = $_; my @fileLines = read_file $curFile; my $numLines = @fileLines; push @{ $dirLines{$curDir} }, $numLines; #say "Cur dir: $curDir; Cur file: $curFile; Num Lines: $numLines"; } sub sameArrayElements { my %hash = map { $_ => 1 } @_; keys %hash == 1 ? 1 : 0; }

    Notice that we now have my %dirLines; at the top. We'll use this hash for a hash of arrays (HoA), where the key will be the directory path and the associated value is an array whose elements are file numLines.

    The following was added to the countLines subroutine:

    push @{ $dirLines{$curDir} }, $numLines;

    $dirLines{$curDir} is our hash and the value of $curDir is used as a key. The enclosing @{ } notation says to treate this as an array (we'll see this later, too), and then we push the value of $numLines onto that array.

    Data::Dumper was used to help visualize our hash's data structure after traversing the directories:

    $VAR1 = { './test/test bbb' => [ 6, 6, 6, 6 ], './test' => [ 6, 2, 6, 1 ], '.' => [ 6, 2, 6, 10000, 1, 63, 10, 15, 6, 21, 647, 10, 5, 28, 407, 2390, 11, 6, 513, 181, 1, 2, 360, 3 ] };

    Curley braces {} mean a hash; square brackets [] mean an array. From the output above, you can see the association between a key (directory path) and an array (a list of file line numbers) as a value.

    The next step is to process our hash, iterating through its keys--one at a time--and that starts as follows:

    for my $dir ( sort keys %dirLines ) { my $sameResults = sameArrayElements( @{ $dirLines{$dir} } ); ...

    Each sorted key of %dirLines is assigned to $dir. Next is the same notation seen earlier, viz., @{ $dirLines{$dir} }. The directory's associated array of numLines is sent to sameArrayElements to check for element sameness. The following line prints the result of this check:

    The files in directory '.' do not have the same number of lines. The files in directory './test' do not have the same number of lines. The files in directory './test/test bbb' do have the same number of li +nes.
      This is fantastic! Although my output looks like its only showing me the final sub-folder's lines. I actually don't need this as each final sub-folder contains just 1 txt file. I want just one level up's directory to check for all sub-directories in that previous directory.

      But I'm not seeing this....What am I doing wrong here? $curDir is showing me that final /test/test/BBB directory when I just want /test/test....but I need EVERY sub-folder from the root here of test .... Could a regular expression help out here to filter out everything after a the last / ?

      So I'm seeing this:

      /test/test1/bb1' do have the same number of lines /test/test1/bb2' do have the same number of lines /test/test1/bb3' do have the same number of lines /test/test1/bb4' do have the same number of lines /test/test1/bb5' do have the same number of lines /test/test1/cc1' do have the same number of lines /test/test2/cc2' do have the same number of lines /test/test2/cc3' do have the same number of lines /test/test2/cc4' do have the same number of lines /test/test2/cc5' do have the same number of lines

      So each folder here has 1 txt file. But I want to see this output instead:

      /test/test1' do have the same number of lines /test/test2' do not have the same number of lines

      Or something of the like. I think I see what's happening that curdir is what's screwing this up. I just need this to be one level up and this would work beautiful

        You showed the following example output:

        /test/test1' do have the same number of lines /test/test2' do not have the same number of lines

        Were you looking to have the script count file lines only in those two directories w/o descending into any enclosed directories, or did you want the script to have an output as above even after descending into all enclosed directories?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://991937]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2014-10-20 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (73 votes), past polls