Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Performing Mathematical Operation on Specific Column of text File

by aaron_baugher (Curate)
on May 14, 2015 at 23:34 UTC ( [id://1126712]=note: print w/replies, xml ) Need Help??


in reply to Performing Mathematical Operation on Specific Column of text File

The first thing to recognize is that your program will need to see every line of the input before it begins writing the output, since it has to see all the frequencies before it can calculate their average so it can be applied to each line. So you'll need to either:

1. Loop through the input file twice, accumulating numbers on the first time through, then making the edits and printing the output on the second time through.

2. Loop through the input file once, but save each line in an array while you do the calculations, so that your second loop can go through the array instead of hitting the filesystem again.

I'd say the second solution will generally be the best unless the file is so large that putting it all in an array will cause memory problems. So let's do that. This loops through the file, adding the frequencies to an accumulator ($total) when a line matches the pattern, keeping track of how many ($howmany) frequencies it adds. Those two values will be divided to get the mean. It also saves each line in an array, along with a flag ($fixlater) to show whether that line is one containing a frequency. That way when I loop through the lines again, I don't have to split the ones that don't contain a frequency to change; the other lines can just be printed out.

This does literally what you said: subtracts the mean from each frequency. Since the mean is negative, subtracting it actually adds a positive, which may or may not be what you really want. If it's not, try to describe what you really want in more detail, and give us a couple examples of input and output frequencies.

#!/usr/bin/env perl use 5.010; use strict; use warnings; my @lines; my $total = 0; my $howmany = 0; my $output_separator = "\t"; # your choice while(<DATA>){ chomp; my($n, $f) = (split)[0,8]; my $fixlater; if ($n =~ /\A[A-Z]{2}\.[A-Z]{4}\Z/ and $f =~ /\A-\d\.\d{4}\Z/ ){ $total += $f; $howmany++; $fixlater = 1; } push @lines, [$fixlater,$_]; } my $mean = $total/$howmany; for (@lines){ if($_->[0]){ # fix this line my @f = (split ' ', $_->[1]); $f[8] = sprintf "%0.4f", $f[8] - $mean; say join $output_separator, @f; } else { say $_->[1]; } } __DATA__ MCCC processed: unknown event at: Tue, 14 Oct 2014 12:02:26 CST station, mccc delay, std, cc coeff, cc std, pol , t0_times + , delay_times ZJ.GRAW -0.7964 0.0051 0.9690 0.0139 0 GRAW.BHZ 301 +.1263 -1.8041 ZJ.KNYN -0.7065 0.0072 0.9760 0.0133 0 KNYN.BHZ 301. +3372 -1.9249 ZJ.LEON 0.9675 0.0072 0.9548 0.0292 0 LEON.BHZ 301. +2611 -0.1749 ZJ.RKST -0.2061 0.0114 0.9404 0.0383 0 RKST.BHZ 301. +3500 -1.4374 ZJ.SHRD 0.4382 0.0051 0.9542 0.0351 0 SHRD.BHZ 301. +7360 -1.1791 ZJ.SPLN 0.3033 0.0051 0.9785 0.0126 0 SPLN.BHZ 301. +0760 -0.6541 Mean_arrival_time: 300.1187 No weighting of equations. Window: 2.23 Inset: 1.17 Shift: 0.25 Variance: 0.00645 Coefficient: 0.96215 Sample rate: 40.000 Taper: 0.28 Phase: P PDE 2013 7 15 14 6 58.00 -60.867 -25.143 31.0 0.0 7.3

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

  • Comment on Re: Performing Mathematical Operation on Specific Column of text File
  • Download Code

Replies are listed 'Best First'.
Re^2: Performing Mathematical Operation on Specific Column of text File
by Bama_Perl (Acolyte) on May 15, 2015 at 19:52 UTC
    Hi Aaron, I think this approach may be more complicated than it's worth. I think I am going to try another approach, in which I will loop through a list of files, extract the 9th column of times, sum the ninth column, provide a counter to count the number of lines in the column that match the conditions I need, then find the mean by taking the total(sum) and dividing by that counter. The logic is provided below:
    $total = 0; $count = 0; for ($j = 2; $j < @tableb; $j++) { chomp ($tableb[$j]); ($netsta,$delay_time) = (split /\s+/,$tableb[$j])[1,9]; ($net,$sta) = (split /\./, $netsta)[0,1]; if ($net eq "ZJ") { $count = $count + 1; $total = $total + $delay_time; $mean = $total/$count; print $mean, "\n"; }
    The for loop is looping through a file called $tableb, and if $net in the first column equals "ZJ", add to the counter, and then add the delay_time. Then I want to get the mean, and I output the mean. When printing out the mean, I get:
    -0.9188 -1.0063 -0.585466666666667 -0.705775 -0.80838 -0.80595 -0.722071428571429 -0.6714 -0.773888888888889 -0.84067 -0.9097 -0.958375 -0.7386 -0.7877 -0.784433333333333 -0.69155 -0.78836 -0.779766666666667 -0.820314285714286 -0.8476 -0.802544444444444 -0.88008 -0.9104 -0.916916666666666 -0.962815384615385 -1.0093
    where the LAST value before each line break (-0.958375 and -1.0093) are the means that I need -- they are the total means. Now the question I have is, how do I extract that last value, set it to a variable, and then later on, subtract it from the $delay_time when I need to print it out (which isn't provided here)? TLDR: When printing out the means, the means iteratively add up, and then I need to extract the final mean once the for loop is finished looping through each column. That final mean (the total mean) will then be sent to a variable to be used for subtraction purposes later. Does that make sense? I apologize if it's not clear. One option I found was using the -1 option to extract the last line of each output. Would that work?

      I'm not clear on everything you're trying to do. But to your main question: how to save the last calculated mean so it can be used after the loop, just make sure you declare the variable outside the loop before it starts, like this:

      my $mean; my $total = 0; my $count = 0; for ($j = 2; $j < @tableb; $j++) { # do other calcuations if(this_line_matches()){ $mean = $title/$count; # use $mean from outside the loop } } print $mean; # now contains the last value calculated inside the loop

      If you don't actually need to calculate the mean for each loop, you could move that calculation to after the loop and only do it once.

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126712]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-24 02:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found