Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

average of column

by linseyr (Acolyte)
on Sep 12, 2012 at 14:51 UTC ( #993236=perlquestion: print w/ replies, xml ) Need Help??
linseyr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I would like to calculate the average of column 4 from my input files.
#!/usr/local/bin/perl #OPEN UP THE FILE open (MYFILE, '148-N-pvalue0.01_peaks.xls'); #READ THROUGH THE FILE while (my $line = <MYFILE>) { @arr1 = split('\t',$line); @length = @arr1[3]; $numbers = join(", ",@length); print $numbers; #print @length,","; #$avg = avg($length); }
But the problem is the array. How do I get commas between the numbers? And at the end calculate the average from the array? Thanks!

Comment on average of column
Download Code
Re: average of column
by toolic (Chancellor) on Sep 12, 2012 at 15:01 UTC
    use warnings; use strict; use Acme::Tools qw(avg); my @nums; while (<DATA>) { push @nums, (split)[3]; } print avg(@nums), "\n"; __DATA__ a b c 3 d e f 4 h j k 5

    See also:

Re: average of column
by daxim (Chaplain) on Sep 12, 2012 at 15:01 UTC
    Can you please provide some sample input? Your code confuses me.

    Without having seen it the input, the code that fits the problem description is:

    perl -F'\t' -lanE'$sum += $F[3]; END { say $sum/$. }'
      The input is a tab delimited file like:
      chr start end length chr1 10 50 40 chr2 20 80 60
      I want to get the average length of all lines.

        In your OP, you show the following:

        open (MYFILE, '148-N-pvalue0.01_peaks.xls'); #READ THROUGH THE FILE while (my $line = <MYFILE>) { ...

        opening an Excel spread sheet this way will not give you access to the data you want. If you're are trying to parse an Excel spread sheet, consider using a module, like Spreadsheet::ParseExcel for the job. Otherwise, you can use "Save As..." within Excel to save the data as tab-delimited in a text file, e.g., "148-N-pvalue0.01_peaks.txt", which you can then open and process as above.

        toolic shows how to obtain the last column's average, which includes using Acme::Tools. Importantly, two other pragmas begin the solution:

        use warnings; use strict;

        Consider always beginning your scripts with these, as they'll preemptively catch any problematic areas in your scripts--likely saving you many headaches.

Re: average of column
by BillKSmith (Chaplain) on Sep 12, 2012 at 16:24 UTC

    I like toolic's solution, but if you prefer not to use a module:

    use strict; use warnings; my $sum; my $count; while (<DATA>) { my $length = (split)[3]; next if $length eq 'length'; $count++; $sum += $length; } my $average_length = $sum/$count; print $average_length, "\n"; __DATA__ chr start end length chr1 10 50 40 chr2 20 80 60
    Bill
Re: average of column
by pvaldes (Chaplain) on Sep 12, 2012 at 18:18 UTC

    Arg, that's very similar to the other solutions but not so elegant. Well... another way to skin this cat

    #!/usr/bin/perl use strict; use warnings; use List::Util qw(sum); my @array = (); while (<DATA>) { next if /start/; chomp; my (undef,undef,undef,$col4,undef) = split (/\s*/, $_,5); push (@array, $col4); } my $sum = sum(@array); print $sum,"\/"; my $n = scalar(@array); print $n,"\n"; print "average = ", $sum/$n,"\n"; __DATA__ chr start end length foo e r t 6 r 4 A 6 4 f 4 3 4 5 f L 0 L 2 f 0 0 0 1 0 3 4 5 6 7
Re: average of column
by frozenwithjoy (Curate) on Sep 13, 2012 at 05:27 UTC
    The problem has already been solved, but if your lab is on fire and you need a one-liner to get you a quick answer:
    perl -MList::Util=sum -E '<>; say sum( map{ (split /\t/)[-1] } <>);' f +ile.tsv

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://993236]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2014-07-14 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (253 votes), past polls