http://www.perlmonks.org?node_id=966934

Spartan4ever has asked for the wisdom of the Perl Monks concerning the following question:

I am new to Perl and have a question about handling the output for reading and writing two separate files. I am reading in Unix/AIX/Linux dflog files and them performing calculations on the space utilization of the individual file systems. The section of my code reads as follows:
open( firstfile, "DFLOG1.txt"); open( secondfile, "DFLOG2.txt"); while (<firstfile>) { chomp; @blines=grep(/dev/, $_); foreach $bline (@blines) { @BRead = split(" ", $bline); $bmount = $BRead[0]; $btotdsk = $BRead[1]; $buseddsk = $BRead[2]; $bfreedsk = $BRead[3]; } while (<secondfile>) { chomp; @elines=grep(/dev/, $_); foreach $eline (@elines) { @ERead = split(" ", $eline); $emount = $ERead[0]; $etotdsk = $ERead[1]; $euseddsk = $ERead[2]; $efreedsk = $ERead[3]; } print (OUTFILE " ,$bmount, $btotdsk, $busedsk, $bfreedsk, $etotdsk, $e +useddsk, $efreedsk\n");
Where I am having difficulty is getting the lines of the two files to sync. Depending on where I place my print statement I either get the last filesystem entry from the first file being compared to all the filesystems in the second or vice verse. Any insight as to how to correct my issue is greatly appreciated.

Replies are listed 'Best First'.
Re: Reading in two text files
by stevieb (Canon) on Apr 24, 2012 at 21:55 UTC

    I'm not trying to be sarcastic or ignorant, but to get effective feedback from the PerlMonks community, please peruse this link.

    I've quickly reformatted your post within <P> and <code> tags so that it is easier read:

    OP:

    I am new to Perl and have a question about handling the output for reading and writing two separate files. I am reading in Unix/AIX/Linux dflog files and them performing calculations on the space utilization of the individual file systems. The section of my code reads as follows:

    open( firstfile, "DFLOG1.txt"); open( secondfile, "DFLOG2.txt"); while (<firstfile>) { chomp; @blines=grep(/dev/, $_); foreach $bline (@blines) { @BRead = split(" ", $bline); $bmount = $BRead[0]; $btotdsk = $BRead1; $buseddsk = $BRead2; $bfreedsk = $BRead3; } while (<secondfile>) { chomp; @elines=grep(/dev/, $_); foreach $eline (@elines) { @ERead = split(" ", $eline); $emount = $ERead[0]; $etotdsk = $ERead1; $euseddsk = $ERead2; $efreedsk = $ERead3; } print (OUTFILE " ,$bmount, $btotdsk, $busedsk, $bfreedsk, $etotdsk, $e +useddsk, $efreedsk\n");

    Where I am having difficulty is getting the lines of the two files to sync. Depending on where I place my print statement I either get the last filesystem entry from the first file being compared to all the filesystems in the second or vice verse. Any insight as to how to correct my issue is greatly appreciated.

    /OP

      Provide (within <code> </code> tags) some of your input, and let us know what you would like the result to be. To me, the word 'sync' can mean a few things. An example of what you have to work with and an example of what you expect would be beneficial.

Re: Reading in two text files
by pemungkah (Priest) on Apr 25, 2012 at 01:34 UTC
    I'm totally guessing, because I don't have sample data, but I think you want to do something like this:
    Open both files Read each line from the first file Split it into mount point, total space, used space, free space Do the same for the second file Compare matching mount points to see the differences.
    So here's a skeleton program. Notice that the difference between your program and mine is that I'm recording the data in a pair of hashes so I can cross-check them later. I also switched the open() to use the three-arg form to make it clear you mean to read these files. The code to read them is identical, so I pushed it down into a subroutine that creates the hash from the file and then gives it back to the caller.
    open( my $firstfile, '<', "DFLOG1.txt"); open( my $secondfile, '<', "DFLOG2.txt"); my %first_machine = consume($firstfile); my %second_machine = consume($secondfile); foreach my $mount_point (keys %first_machine) { if ( exists $second_machine{$mount_point} ) { # Perform calculations here. # $first_machine{$mount_point}->[0] is the total # $first_machine{$mount_point}->[1] is the used # $first_machine{$mount_point}->[2] is the free # Similar for $second_machine. # Print here after doing calculation. } else { print "No mount point corresponding to $mount_point on the sec +ond machine.\n"; } } sub consume { my ($filehandle) = @_; my %result; while ( defined($_ = <$filehandle>) ) { chomp; next unless /dev/; ($mount_point, $total_space, $used_space, $free_space) = split +; $result{$mount_point} = [$total_space, $used_space, $free_spa +ce]; } return %result; }

      It is recommended that we use the three arg use of open, but we should also check for failure:

      open my $firstfile, '<', 'DFLOG1.txt' or die "Can't open file $firstfile: $!";

      Nice work on the code beyond my nit :)

      Thank you for your reply and code modification. As I said I am new to Perl and am not familiar with some of the code you provided.

      I am including a sample interval of the files I am evaluating.

      First file:

      00:00:01 Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hd4 262144 102228 159916 39% / /dev/hd2 2359296 2243384 115912 96% /usr /dev/hd9var 1048576 199528 849048 20% /var /dev/hd3 1048576 8240 1040336 1% /tmp /dev/hd1 2097152 41140 2056012 2% /home /proc - - - - /proc /dev/hd10opt 524288 206640 317648 40% /opt /dev/ts1000 262144 716 261428 1% /usr/local /dev/ts1001 6291456 5967256 324200 95% /banktools /dev/ts1002 786432 448 785984 1% /stage /dev/ap1001 20971520 11231912 9739608 54% /oracle /dev/ap1002 36700160 11310284 25389876 31% /ora01/oradata /dev/ap1003 36700160 13372456 23327704 37% /ora02/oradata /dev/ap1004 31457280 5995712 25461568 20% /ora03/oradata /dev/ap1005 20971520 11067000 9904520 53% /ora04/oradata /dev/ap1006 26214400 23716476 2497924 91% /ora05/oradata /dev/ap1007 26214400 15031656 11182744 58% /ora06/oradata /dev/ap1008 20971520 17307236 3664284 83% /ora01/orabkup /dev/ap1009 209715200 35552472 174162728 17% /ora01/oraflash

      Second file:

      23:00:00 Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hd4 262144 102484 159660 40% / /dev/hd2 2359296 2243384 115912 96% /usr /dev/hd9var 1048576 197148 851428 19% /var /dev/hd3 1048576 8928 1039648 1% /tmp /dev/hd1 2097152 40956 2056196 2% /home /proc - - - - /proc /dev/hd10opt 524288 206820 317468 40% /opt /dev/ts1000 262144 716 261428 1% /usr/local /dev/ts1001 6291456 6093220 198236 97% /banktools /dev/ts1002 786432 448 785984 1% /stage /dev/ap1001 20971520 11312864 9658656 54% /oracle /dev/ap1002 36700160 11310284 25389876 31% /ora01/oradata /dev/ap1003 36700160 13372456 23327704 37% /ora02/oradata /dev/ap1004 31457280 5995712 25461568 20% /ora03/oradata /dev/ap1005 20971520 11067000 9904520 53% /ora04/oradata /dev/ap1006 26214400 23716476 2497924 91% /ora05/oradata /dev/ap1007 26214400 15031656 11182744 58% /ora06/oradata /dev/ap1008 20971520 17307236 3664284 83% /ora01/orabkup /dev/ap1009 209715200 35552472 174162728 17% /ora01/oraflash

      You guessed correctly in understanding my desire to read each line of both files and compare the matching mount points and produce a single line of output to a .csv file for each mount point, that includes the total, used and free disk for both the beginning file and end file.

      I have a question regarding the subroutine consume that you created. Does it read in a single line of the first file and perform the formatting before reading in a single line of the second file, or are all lines of the first file read and formatted before the second file is called?

      The reason I ask is due to the fact that my desired output is a single line of data for each mount point listed that would include the total, used and free space of each file.

      Sorry if my question is confusing. I have only been using Perl for two weeks and have been self taught.

      Thanks for your cooperation.

        consume() shlorps up the whole file and builds a hash out of it.

        Each entry in the hash is an independent, nameless array that contains the appropriate data for each line.

        Now you have two "phone books" of names of filesystems, each of which uses the same names (such as '/dev/hd4', etc.). So that means you can use that name to pull out the relevant statistics for each of the two files. Let me see if I can make this simpler: instead of using the anonymous arrays, let's use a nested hash. If you were writing all this down on paper, you might make a table for each machine that had the filesystem names as rows, and the fields (total, used, and free) and the columns.

        To do something like that in Perl, we'd rewrite consume() to do this instead:

        sub consume { my ($filehandle) = @_; my %result; while ( defined($_ = <$filehandle>) ) { chomp; next unless /dev/; ($mount_point, $total_space, $used_space, $free_space) = split +; $result{$mount_point}{total} = $total_space; $result{$mount_point}{used} = $used_space; $result{$mount_point}{free)} = $free_space; } return %result; }
        See how we set that up? The mount point looks up a place in the hash that contains another hash nested inside it, and we use the words 'total', 'used', and 'free' to store the relevant numbers in that nested hash. So now your calculations if (say) you wanted to list the differences would look like this:
        # Assume first machine is the more important one and we want to be + sure we check # all its filesystems. (We can't guarantee we looked at all of the + second machine's # filesystems because this just uses the same keys as machine 1 to + look at machine 2. # There might be more filesystems with different names.) my @unmatched; # Section 1: matched on both. foreach my $filesystem_name (sort keys %first_machine) { print "$filesystem_name: "; my $has_differences; foreach my $type (qw(total free used)) { if (exists $second_machine{$filesystem_name}) { # Filesystem mounted on both machines my $difference = $first_machine{$filesystem_name}{$typ +e} - $second_machine{$filesystem_name}{$type}; if ($difference) { print "$type: $difference "; $has_differences = 1; } print "\n"; # finish the line and output it delete $second_machine{$filesystem_name}; } else { push @unmatched, "$filesystem_name: "; foreach my $kind (qw(total free used)) { $unmatched[-1] .= $first_machine{$unmatched}{$kind +} . " "; } $unmatched[-1] .= "\n"; } } # Second section: unmatched on first machine. if (@unmatched) { print @unmatched; } # Third section: unmatched on second machine. if (keys %second_machine) { # unprocessed filesystems on 2 not on 1. print "Unmatched filesystems on machine 2:\n"; foreach my $unmatched (sort keys %second_machine) { print "$unmatched "; foreach my $kind (qw(total free used)) { print $second_machine{$unmatched}{$kind}, " "; } print "]\n"; } }
        The first section looks for items in the second table that match the ones in the first, and prints the comparison between the two. Note that delete() in there: that throws away items in the second hash that we've already processed (we could add a 'matched' field to the hash if it was particularly expensive to re-create the items, but that's not the case here). If we dont find a match, we concatenate the record back together and add it to the @unmatched array, all ready to print.

        We check that array after we finish the pass over the first machine's filesystem to see if we had any unmatched machine 1 filesystems, and just print them all if there are any.

        When we get to the third loop, anything that matched the first system that was in the second system's table has been dropped, so if there's anything left, that means it's something not matched on the second machine. We format and print those as well.

        Any other kinds of analysis fall into your balliwick rather than mine, but that should provide you with a starting point. I switched the implementation because the anonymous arrays are a little harder to understand if you're just getting started.