### Re: calculate average in sliding windows

by marfabe (Initiate)
 on May 23, 2012 at 13:58 UTC

in reply to calculate average in sliding windows

thanks to all...but I still have problems in the sliding window part
Maybe I didn't explain it ok. What I want is to group the first column of the file in windows of 10 kbases (DNA positions) and calculate the average of column 2 for each window.

BrowserUK and Eliya: your idea works fine but it doesn't take into account the positions in column 1...and I'm stuck trying to use it.
Any idea?
Thanks!

Re^2: calculate average in sliding windows
by Eliya (Vicar) on May 23, 2012 at 14:27 UTC
but it doesn't take into account the positions in column 1

Sorry, I'm still not sure what you want (and it's a little diffficult to infer it from code which - as you state - doesn't work).

Are you saying the positions in column 1 are meant to represent indices in some sparse array, with all unspecified positions (such as 12498249..12512574, etc.) implicitly having a value of zero, or what?

I think it would help if you could come up with some simplified (but sufficient/complete) sample input (say using a window size of 3) together with the desired output values.

Thanks Eliya
Here's the simplified sample set

```position     value
1    1
10    3
30    1
40    2
60    2

And here the output for window size of 30

```1-30   1.666666667
31-60  2

Ok, thanks.  So maybe like this?

```#!/usr/bin/perl -lw
use strict;
use constant WINDOW => 30;

my \$sum = 0;
my \$n   = 0;
my \$p   = WINDOW;

while (<DATA>) {
my (\$pos, \$val) = split;
if (eof) {  # corner case
\$sum += \$val;
\$n++;
}
if (\$pos > \$p or eof) {
print \$sum / \$n if \$n > 0;
\$sum = 0;
\$n = 0;
\$p += WINDOW while \$pos > \$p;
}
\$sum += \$val;
\$n++;
}

__DATA__
1    1
10    3
30    1
40    2
60    2

Output:

```1.66666666666667
2

(Upd: fixed handling of corner case)

Node Type: note
