average a column in tab-delimited file

garyboyd has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I would like to average the data for certain rows in a tab-delimited file, for example the file looks like this:

Name10128    1A    33.2    A    Test    0.71565    -0.011379692
Name4382    1A    34.3    A    Test    0.9043    -0.035366577
Name1635    1A    34.9    A    Test    0.836    -0.053808688
Name10267    1A    34.9    A    Test    0.04575625    0.050878715
Name10039    1A    34.9    A    Test    0.7606    -0.011123925
Name22270    1A    44.7    A    Test    0.047    0.122532651
Name22285    1A    44.7    A    Test    0    0.298871748
Name22701    1A    44.7    A    Test    0.045    0.05038769
Name10054    1A    46.4    A    Test    0    0.641618497
[download]

I want to scan down column3 and where there are identical entries, for example 34.9 and 44.7 I want to average the values in columns 6 and 7. So for the three rows that have 34.9 in column3, I would need to average the column6 values 0.836, 0.04575625 and 0.7606 and then replace those values in column6 with the average value, and do the same in column 7 for the values found there. What would be the most sensible way to proceed with this?

Comment on average a column in tab-delimited file Download Code

Replies are listed 'Best First'.
Re: average a column in tab-delimited file by GrandFather (Saint) on Jan 27, 2012 at 10:47 UTC
What have you tried? How would you do it on paper? When you do it on paper what are the steps you actually follow? Are the rows sorted by column 3, or at least are all the matching rows adjacent? What value does column 1 get for the combined rows? Are the contents of the remaining columns really invariant? True laziness is hard work	[reply]
Re: average a column in tab-delimited file by fisher (Priest) on Jan 27, 2012 at 10:39 UTC
Well, the general algorithm might look like this: First, open your file using open(). Then, in cycle, you read your file line by line, split it using split() or s///, find values of your interest, do calculations and form a content for new file in an array of lines. After cycle is done, just close this file, maybe rename it for backup version, then open it for write and push there your array with newly formed lines.	[reply]
Re^2: average a column in tab-delimited file by naturalsciences (Beadle) on Jan 27, 2012 at 17:20 UTC
Yep, you just can't start reading columns in a file. Cause there ain't any. There ain't no lines in a file actually either. There is no spoooon. Well that is - a text file in your computer should be just a big string of values. Those lines you see in your text editor are there only because once in a while in this big string there are line break characters, so your text editor can know when to make a break. No program can just bam go and start reading "columns" from this. So you have to make those columns yourself. As fisher said - the basic algorithm is to take your file line by line, split those lines(by whatever separates your values(whitespace,comma or tab), then you take each fourth(counting from zero) element from those split lines(you were looking for an average of third column if i remember correctly. So if you append those fourth elements in an array, print out somewhere or whatnot, then you've got your third column.	[reply]
Re^3: average a column in tab-delimited file by naturalsciences (Beadle) on Jan 27, 2012 at 17:28 UTC
To make life easier for you I will post a piece of code in here. Earlier today I had to transpose a large text file so I blurted out this piece of code. (Takes the rows from a file and makes them into a column) It is a rather stupid code. (I'm a noob). For example it would not be prudent to actually load all the file into an array, but better to do it line by line in a while($line=<>) loop. etc. But Being a rather stupid code it should also be quite readily understandable and demonstrates the idea of getting columns out from your lines pretty well. `#!/usr/bin/perl -w use strict; use warnings; my @data=<>; my @column=(); for (my $count=3;$count<=4215;$count++) { foreach (@data) { my @row=split(/\t/, $_); push (@column,$row[$count]); } print "@column\n"; @column=() }` [download] As you can see this piece of code was used to extract columns 3 to 4215. You only need to extract one single column right now. Enjoy!	[reply] [d/l]
Re: average a column in tab-delimited file by Anonymous Monk on Jan 27, 2012 at 10:35 UTC
What would be the most sensible way to proceed with this? Just do it :) any way you know how :) FWIW, you should already know two ways to handle it :) reformatting tab delimited file, Combining 3 files	[reply]
Re: average a column in tab-delimited file by sundialsvc4 (Abbot) on Jan 27, 2012 at 13:57 UTC
Umm, maybe (seriously...) the way to do it is to put it into a database (e.g. an SQLite file), where the problem then becomes: `SELECT COL1, COL2, AVERAGE(COL3), AVERAGE(COL4) FROM MEH GROUP BY COL1, COL2;` I am, once again, being perfectly serious. Consider how many other results you might also need to obtain against this data, either now or in the near future, which would be reduced to “a query, not a program,” if the format of this disk file were, instead of being delimited text, SQLite, with all of the power that the aforesaid public-domain tool brings to bear. Since (IIRC...) SQLite already knows how to suck a delimited-file in, to make a table out of it, you literally might not need to “program” anything at all to accomplish this entire task.	[reply]
Re: average a column in tab-delimited file by Anonymous Monk on Jan 28, 2012 at 00:41 UTC
Should output look like this? `Name10128 1A 33.2 A Test 0.715650000 -0.011 +379692 Name4382 1A 34.3 A Test 0.904300000 -0.035 +366577 Name1635 1A 34.9 A Test 0.547452083 -0.004 +684633 Name10267 1A 34.9 A Test 0.547452083 -0.004 +684633 Name10039 1A 34.9 A Test 0.547452083 -0.004 +684633 Name22270 1A 44.7 A Test 0.030666667 0.1572 +64030 Name22285 1A 44.7 A Test 0.030666667 0.1572 +64030 Name22701 1A 44.7 A Test 0.030666667 0.1572 +64030 Name10054 1A 46.4 A Test 0.000000000 0.6416 +18497` [download]	[reply] [d/l]
Re^2: average a column in tab-delimited file by garyboyd (Acolyte) on Jan 30, 2012 at 10:36 UTC
thanks for all the suggestions, and yes the output should look like the output that anonymous monk posted above. I can read the file into an array, but I wasn't sure how to proceed through the array to average columns 6 and 7. `#!/usr/bin/perl use strict; use warnings; use Getopt::Long; my $infile; my @fields; my $i; my $j; GetOptions ( "infile=s" => \$infile, ); open INFILE, "<$infile" or die $!; #open RES, ">result.txt" or die $!; while (<INFILE>){ @fields = split(/\t+/, $_); my ($name, $chr, $location, $gen, $dom, $pval, $fst) = @fields +[0..6]; }` [download]	[reply] [d/l]
Re^2: average a column in tab-delimited file by garyboyd (Acolyte) on Jan 31, 2012 at 12:10 UTC
once I have my values in an array, how do I find average of all similar values for column 3? eg how do I find all of those that have 34.9 and average the result in column 6?	[reply]
Re^3: average a column in tab-delimited file by Anonymous Monk on Jan 31, 2012 at 14:51 UTC
Please explain in words, how you would do it, using paper and pencil	[reply]
Re^4: average a column in tab-delimited file by garyboyd (Acolyte) on Jan 31, 2012 at 15:36 UTC
Re^5: average a column in tab-delimited file by Cristoforo (Curate) on Feb 02, 2012 at 01:09 UTC
Re^5: average a column in tab-delimited file by Cristoforo (Curate) on Feb 04, 2012 at 20:50 UTC
Re^5: average a column in tab-delimited file by Anonymous Monk on Feb 05, 2012 at 16:22 UTC

Back to Seekers of Perl Wisdom