Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

calculate average in sliding windows

by marfabe (Initiate)
on May 23, 2012 at 09:35 UTC ( [id://971993]=perlquestion: print w/replies, xml ) Need Help??

marfabe has asked for the wisdom of the Perl Monks concerning the following question:

Hi all!
I have a 2 columns file (position, value) and I want to create a sliding window of 10000 and calculate the average for each window.
My data is

position value 12498248 1.033560 12512575 0.566400 12513687 0.632940 12520106 0.232530 12523984 0.247300 12525383 1.487640

So far, I tried this code...but it doesn't work..

#!usr/bin/perl use warnings; use strict; use List::Util 'sum'; print ' ############### Establish windows of 10kb and calculate the average Recomb Rate ############### '; my $recomb_all = $ARGV[0]; my @recomb_all = @{get_contents ($recomb_all)}; #Testing content print "line 1 column 2 ::: $recomb_all[0][1]\n"; my $length = (scalar @recomb_all) - 1; my @window; my $window_size = 10000; my $window_step = 10000; my @average; for (my $i = 0; ($i + $window_size - $window_step) <= $recomb_all[$len +gth][0]; $i += $window_step) { my $e = $i + $window_size; if ($e > $recomb_all[$length][0]) { $e = $recomb_all[$length][0]; push (@window, $recomb_all[$length][1]); print "@window\n"; push (@average, $recomb_all[$length][0], my $av = average (@wi +ndow)); } } #print "@window\n"; #print "@average\n"; exit; ####################### #SUBROUTINES ####################### sub get_contents { my ( $file ) = @_; my @contents; open (FH, "< $file") or die "I cannot open $file file: $!\n"; my $x = 0; while (<FH>) { next unless /\S/; next if /^\s*[A-Z]+/i; @{$contents[$x]} = split /\s+/; ++$x; } return (\@contents); } # ---------- end of subroutine get_contents ---------- sub average { my @array = @_; return sum(@array) / @array; }

I would really appreciate any help! Thanks!

Replies are listed 'Best First'.
Re: calculate average in sliding windows
by moritz (Cardinal) on May 23, 2012 at 09:45 UTC

    In what way does your code not work? Are blue flames coming out of your computer when you run it?

    I would recommend to use a much smaller sliding window for debugging, so that you can see if the values coming out are correct.

Re: calculate average in sliding windows
by BrowserUk (Patriarch) on May 23, 2012 at 09:51 UTC

    Maybe this will help?

    #! perl -slw use strict; use Data::Dump qw[ pp ]; use List::Util qw[ sum ]; use constant WINDOW => 10; my @data = map int( rand 10 ), 1 .. 30; my @mAves; my @slide; for my $dp ( @data ) { push @slide, $dp; shift @slide if @slide > WINDOW; push @mAves, sum( @slide ) / WINDOW if @slide == WINDOW; } print join ' ', @data; print join ' ', @mAves; __END__ C:\test>junk 3 8 8 2 5 5 0 3 8 3 2 3 2 3 5 4 8 7 5 4 6 3 1 6 3 6 0 7 3 3 4.5 4.4 3.9 3.3 3.4 3.4 3.3 4.1 4.5 4.2 4.3 4.7 + 4.7 4.6 4.9 4.7 4.9 4.1 4.1 3.9 3.8

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      A less computationally intensive variant would be to directly maintain a $sum, to which you add/subtract individual values (instead of push/shift on an array, from which you then recompute the sum every time):

      #!/usr/bin/perl -lw use strict; use constant WINDOW => 10; #my @data = map int( rand 10 ), 1 .. 30; my @data = qw(3 8 8 2 5 5 0 3 8 3 2 3 2 3 5 4 8 7 5 4 6 3 1 6 3 6 0 7 +3 3); my @mAves; my $sum = 0; for my $i (0..$#data) { $sum += $data[$i]; $sum -= $data[$i-WINDOW()] if $i >= WINDOW; push @mAves, $sum / WINDOW if $i >= WINDOW-1; } print join ' ', @data; print join ' ', @mAves; __END__ 3 8 8 2 5 5 0 3 8 3 2 3 2 3 5 4 8 7 5 4 6 3 1 6 3 6 0 7 3 3 4.5 4.4 3.9 3.3 3.4 3.4 3.3 4.1 4.5 4.2 4.3 4.7 + 4.7 4.6 4.9 4.7 4.9 4.1 4.1 3.9 3.8
Re: calculate average in sliding windows
by jwkrahn (Abbot) on May 23, 2012 at 11:48 UTC
    Establish windows of 10kb and calculate the average Recomb Rate

    What does the "kb" refer to in this instance?    10,000 bits?    10,000 bytes?    10,000 bogons?



    my @recomb_all = @{get_contents ($recomb_all)}; ... sub get_contents { ... return (\@contents); } # ---------- end of subroutine get_contents ----------

    Why return a reference to @contents to dereference it to copy to @recomb_all and not just copy it:

    my @recomb_all = get_contents ($recomb_all); ... sub get_contents { ... return @contents; } # ---------- end of subroutine get_contents ----------


    my $length = (scalar @recomb_all) - 1;

    $length is not actually the "length" of any thing, you are using it as the last index of the array @recomb_all, which could more simply be written as $#recomb_all or -1.



    @{$contents[$x]} = split /\s+/;

    There is no need for an $x variable there:

    push @contents, [ split ];
      ...definitely bogons...LOL!
Re: calculate average in sliding windows
by marfabe (Initiate) on May 23, 2012 at 13:58 UTC

    thanks to all...but I still have problems in the sliding window part
    Maybe I didn't explain it ok. What I want is to group the first column of the file in windows of 10 kbases (DNA positions) and calculate the average of column 2 for each window.

    BrowserUK and Eliya: your idea works fine but it doesn't take into account the positions in column 1...and I'm stuck trying to use it.
    Any idea?
    Thanks!

      but it doesn't take into account the positions in column 1

      Sorry, I'm still not sure what you want (and it's a little diffficult to infer it from code which - as you state - doesn't work).

      Are you saying the positions in column 1 are meant to represent indices in some sparse array, with all unspecified positions (such as 12498249..12512574, etc.) implicitly having a value of zero, or what?

      I think it would help if you could come up with some simplified (but sufficient/complete) sample input (say using a window size of 3) together with the desired output values.

        Thanks Eliya
        Here's the simplified sample set

        position value 1 1 10 3 30 1 40 2 60 2

        And here the output for window size of 30

        1-30 1.666666667 31-60 2

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://971993]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-29 06:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found