http://www.perlmonks.org?node_id=995021


in reply to compare values while value on other column remains the same

In your sample, the first three fields always match when the fourth (ID) field matches. If that's always the case, you could concat those fields into your key. If it's not, you might need to save the lines into an array keyed on the ID field, and then print them out. For now, I assumed they'd be identical as in the sample, and just printed out the IDs and averages.

I also assumed that you wanted to average the absolute differences. In other words, if one was 50 100 and the next was 100 50, those would average a difference of 50, not zero (from averaging 50 and -50). I also assumed from your code that you want to divide the numbers by 1000. When I did that, none of the averages exceeded 0.2, so I printed all greater than 0.02 so I'd have some output. Hopefully this will give you some ideas:

#!/usr/bin/env perl use Modern::Perl; sub sum { my $t = shift; return $t unless @_; return $t + sum( @_); } my %k; while(<DATA>){ chomp; my @w = split; next unless @w == 6; push @{$k{$w[3]}}, abs($w[4]-$w[5])/1000; } for (keys %k){ my $avg = sum(@{$k{$_}})/@{$k{$_}}; say "$_ $avg" if $avg > 0.02; } __DATA__ chr1 15865 15915 cg13869341 908 913 chr1 18827 18877 cg14008030 688 776 chr1 29407 29457 cg12045430 43 70 chr1 29407 29457 cg12045430 43 88 chr1 29407 29457 cg12045430 43 16 chr1 29425 29475 cg20826792 57 70 chr1 29425 29475 cg20826792 57 88 chr1 29425 29475 cg20826792 57 16 chr1 29435 29485 cg00381604 33 70 chr1 29435 29485 cg00381604 33 88 chr1 29435 29485 cg00381604 33 16 chr1 68849 68899 cg20253340 560 593 chr1 69591 69641 cg21870274 791 809 chr1 91550 91600 cg03130891 55 84

Aaron B.
Available for small or large Perl jobs; see my home node.