Re: Performing Mathematical Operation on Specific Column of text File

The first thing to recognize is that your program will need to see every line of the input before it begins writing the output, since it has to see all the frequencies before it can calculate their average so it can be applied to each line. So you'll need to either:

1. Loop through the input file twice, accumulating numbers on the first time through, then making the edits and printing the output on the second time through.

2. Loop through the input file once, but save each line in an array while you do the calculations, so that your second loop can go through the array instead of hitting the filesystem again.

I'd say the second solution will generally be the best unless the file is so large that putting it all in an array will cause memory problems. So let's do that. This loops through the file, adding the frequencies to an accumulator ($total) when a line matches the pattern, keeping track of how many ($howmany) frequencies it adds. Those two values will be divided to get the mean. It also saves each line in an array, along with a flag ($fixlater) to show whether that line is one containing a frequency. That way when I loop through the lines again, I don't have to split the ones that don't contain a frequency to change; the other lines can just be printed out.

This does literally what you said: subtracts the mean from each frequency. Since the mean is negative, subtracting it actually adds a positive, which may or may not be what you really want. If it's not, try to describe what you really want in more detail, and give us a couple examples of input and output frequencies.

#!/usr/bin/env perl
use 5.010; use strict; use warnings;

my @lines;
my $total = 0;
my $howmany = 0;
my $output_separator = "\t";  # your choice

while(<DATA>){
    chomp;
    my($n, $f) = (split)[0,8];
    my $fixlater;
    if ($n =~ /\A[A-Z]{2}\.[A-Z]{4}\Z/ and
            $f =~ /\A-\d\.\d{4}\Z/ ){
        $total += $f;
        $howmany++;
        $fixlater = 1;
    }
    push @lines, [$fixlater,$_];
}
my $mean = $total/$howmany;
for (@lines){
    if($_->[0]){ # fix this line
        my @f = (split ' ', $_->[1]);
        $f[8] = sprintf "%0.4f", $f[8] - $mean;
        say join $output_separator, @f;
    } else {
        say $_->[1];
    }
}

__DATA__
 MCCC processed: unknown event at: Tue, 14 Oct 2014 12:02:26 CST 
 station, mccc delay,    std,    cc coeff,  cc std,   pol   , t0_times
+  ,
 delay_times
 ZJ.GRAW    -0.7964    0.0051    0.9690    0.0139    0  GRAW.BHZ   301
+.1263 -1.8041
 ZJ.KNYN     -0.7065    0.0072    0.9760    0.0133    0  KNYN.BHZ 301.
+3372    -1.9249
 ZJ.LEON      0.9675    0.0072    0.9548    0.0292    0  LEON.BHZ 301.
+2611    -0.1749
 ZJ.RKST     -0.2061    0.0114    0.9404    0.0383    0  RKST.BHZ 301.
+3500    -1.4374
 ZJ.SHRD      0.4382    0.0051    0.9542    0.0351    0  SHRD.BHZ 301.
+7360    -1.1791
 ZJ.SPLN      0.3033    0.0051    0.9785    0.0126    0  SPLN.BHZ 301.
+0760    -0.6541
Mean_arrival_time:   300.1187 
No weighting of equations. 
Window:   2.23   Inset:   1.17  Shift:   0.25 
Variance: 0.00645   Coefficient: 0.96215  Sample rate:   40.000 
Taper:   0.28 
Phase: P        
PDE    2013  7 15 14  6 58.00   -60.867   -25.143   31.0  0.0  7.3
[download]

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Comment on Re: Performing Mathematical Operation on Specific Column of text File Download Code

Replies are listed 'Best First'.
Re^2: Performing Mathematical Operation on Specific Column of text File by Bama_Perl (Acolyte) on May 15, 2015 at 19:52 UTC
Hi Aaron, I think this approach may be more complicated than it's worth. I think I am going to try another approach, in which I will loop through a list of files, extract the 9th column of times, sum the ninth column, provide a counter to count the number of lines in the column that match the conditions I need, then find the mean by taking the total(sum) and dividing by that counter. The logic is provided below: `$total = 0; $count = 0; for ($j = 2; $j < @tableb; $j++) { chomp ($tableb[$j]); ($netsta,$delay_time) = (split /\s+/,$tableb[$j])[1,9]; ($net,$sta) = (split /\./, $netsta)[0,1]; if ($net eq "ZJ") { $count = $count + 1; $total = $total + $delay_time; $mean = $total/$count; print $mean, "\n"; }` [download] The for loop is looping through a file called $tableb, and if $net in the first column equals "ZJ", add to the counter, and then add the delay_time. Then I want to get the mean, and I output the mean. When printing out the mean, I get: `-0.9188 -1.0063 -0.585466666666667 -0.705775 -0.80838 -0.80595 -0.722071428571429 -0.6714 -0.773888888888889 -0.84067 -0.9097 -0.958375 -0.7386 -0.7877 -0.784433333333333 -0.69155 -0.78836 -0.779766666666667 -0.820314285714286 -0.8476 -0.802544444444444 -0.88008 -0.9104 -0.916916666666666 -0.962815384615385 -1.0093` [download] where the LAST value before each line break (-0.958375 and -1.0093) are the means that I need -- they are the total means. Now the question I have is, how do I extract that last value, set it to a variable, and then later on, subtract it from the $delay_time when I need to print it out (which isn't provided here)? TLDR: When printing out the means, the means iteratively add up, and then I need to extract the final mean once the for loop is finished looping through each column. That final mean (the total mean) will then be sent to a variable to be used for subtraction purposes later. Does that make sense? I apologize if it's not clear. One option I found was using the -1 option to extract the last line of each output. Would that work?	[reply] [d/l] [select]
Re^3: Performing Mathematical Operation on Specific Column of text File by aaron_baugher (Curate) on May 16, 2015 at 02:07 UTC
I'm not clear on everything you're trying to do. But to your main question: how to save the last calculated mean so it can be used after the loop, just make sure you declare the variable outside the loop before it starts, like this: `my $mean; my $total = 0; my $count = 0; for ($j = 2; $j < @tableb; $j++) { # do other calcuations if(this_line_matches()){ $mean = $title/$count; # use $mean from outside the loop } } print $mean; # now contains the last value calculated inside the loop` [download] If you don't actually need to calculate the mean for each loop, you could move that calculation to after the loop and only do it once. Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l]


Think about Loose Coupling
	PerlMonks