http://www.perlmonks.org?node_id=968177


in reply to Re^2: Reading in two text files
in thread Reading in two text files

consume() shlorps up the whole file and builds a hash out of it.

Each entry in the hash is an independent, nameless array that contains the appropriate data for each line.

Now you have two "phone books" of names of filesystems, each of which uses the same names (such as '/dev/hd4', etc.). So that means you can use that name to pull out the relevant statistics for each of the two files. Let me see if I can make this simpler: instead of using the anonymous arrays, let's use a nested hash. If you were writing all this down on paper, you might make a table for each machine that had the filesystem names as rows, and the fields (total, used, and free) and the columns.

To do something like that in Perl, we'd rewrite consume() to do this instead:

sub consume { my ($filehandle) = @_; my %result; while ( defined($_ = <$filehandle>) ) { chomp; next unless /dev/; ($mount_point, $total_space, $used_space, $free_space) = split +; $result{$mount_point}{total} = $total_space; $result{$mount_point}{used} = $used_space; $result{$mount_point}{free)} = $free_space; } return %result; }
See how we set that up? The mount point looks up a place in the hash that contains another hash nested inside it, and we use the words 'total', 'used', and 'free' to store the relevant numbers in that nested hash. So now your calculations if (say) you wanted to list the differences would look like this:
# Assume first machine is the more important one and we want to be + sure we check # all its filesystems. (We can't guarantee we looked at all of the + second machine's # filesystems because this just uses the same keys as machine 1 to + look at machine 2. # There might be more filesystems with different names.) my @unmatched; # Section 1: matched on both. foreach my $filesystem_name (sort keys %first_machine) { print "$filesystem_name: "; my $has_differences; foreach my $type (qw(total free used)) { if (exists $second_machine{$filesystem_name}) { # Filesystem mounted on both machines my $difference = $first_machine{$filesystem_name}{$typ +e} - $second_machine{$filesystem_name}{$type}; if ($difference) { print "$type: $difference "; $has_differences = 1; } print "\n"; # finish the line and output it delete $second_machine{$filesystem_name}; } else { push @unmatched, "$filesystem_name: "; foreach my $kind (qw(total free used)) { $unmatched[-1] .= $first_machine{$unmatched}{$kind +} . " "; } $unmatched[-1] .= "\n"; } } # Second section: unmatched on first machine. if (@unmatched) { print @unmatched; } # Third section: unmatched on second machine. if (keys %second_machine) { # unprocessed filesystems on 2 not on 1. print "Unmatched filesystems on machine 2:\n"; foreach my $unmatched (sort keys %second_machine) { print "$unmatched "; foreach my $kind (qw(total free used)) { print $second_machine{$unmatched}{$kind}, " "; } print "]\n"; } }
The first section looks for items in the second table that match the ones in the first, and prints the comparison between the two. Note that delete() in there: that throws away items in the second hash that we've already processed (we could add a 'matched' field to the hash if it was particularly expensive to re-create the items, but that's not the case here). If we dont find a match, we concatenate the record back together and add it to the @unmatched array, all ready to print.

We check that array after we finish the pass over the first machine's filesystem to see if we had any unmatched machine 1 filesystems, and just print them all if there are any.

When we get to the third loop, anything that matched the first system that was in the second system's table has been dropped, so if there's anything left, that means it's something not matched on the second machine. We format and print those as well.

Any other kinds of analysis fall into your balliwick rather than mine, but that should provide you with a starting point. I switched the implementation because the anonymous arrays are a little harder to understand if you're just getting started.