Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

How do I find peaks in noisy data?

by tamaguchi (Pilgrim)
on Apr 26, 2006 at 23:09 UTC ( #545901=perlquestion: print w/ replies, xml ) Need Help??
tamaguchi has asked for the wisdom of the Perl Monks concerning the following question:

I have to make a primitive pattern identifier..
There is a list of values the values are but most of them are insignificant random of noice of various magnitude.
Like this for example 101, 203, 321, 45, 67, 156 etc.
or for example 521, 313, 31, 145, 167, 56 etc.

Among this values are "islands" of values where the values are much higher then the surrounding noice. In this fashion:

101, 203, 321, 45, 67, 156, 203, 502, 899, 2003, 5007, 8020, 7301, 5030, 3045, 1243, 567, 321, 234, 45, 123 453 etc..

I have to make some algorithm to identify this "islands" which stick up from the surrounding noice. The noice could be at different levels so I can not just sum up values over a general noice level.v Do you have any ideas how to do this in the best way? I would be very happy if you knew about a book about how to do similar things?

Than you very much for your help.

2006-04-27 Retitled by GrandFather, as per Monastery guidelines
Original title: 'primitive pattern identifier'

Replies are listed 'Best First'.
Re: How do I find peaks in noisy data?
by kvale (Monsignor) on Apr 26, 2006 at 23:50 UTC
    Statisitcally, your task is to identify a signal in the presence of noise. If your signal is always much larger than the noise, the task is fairly easy.

    One way to handle this is to first calculate the mean and standard deviation (sd) of your values. If the peaks are relatively rare, then the sd will correspond closely to the noise sd. Then pick a threshold, say mean + 3*sd, and flag every sample greater than that as a possible event. From there, you can look at the possible events and decide if they correspond to the signal model you are looking for, e.g., peak height, peak width, shape, etc. If the noise is time-varying, just calculate mean and sd over intervals short enough so that the noise doesn't vary too much, and look for possible events within those intervals using the local mean and sd.

    If you have a really good statistical model of the signal and noise, you could create a Kalman filter or Bayesian inference model to detect these signals automatically. But I would try the simple method above first.


Re: How do I find peaks in noisy data?
by BrowserUk (Pope) on Apr 26, 2006 at 23:52 UTC

    Sounds like you need a high-pass filter. Specifying the cut-off as a percentage and normalising the values should allow for variable s/n ratios.

    #! perl -slw use strict; use List::Util qw[ max ]; sub islands { my( $percent, $dataRef ) = @_; my $max = max @$dataRef; my $filtered = join '', map{ ( $_ / $max ) > $percent ? 1 : 0 } @$dataRef; printf "%4.2f : %s\n", $percent, $filtered; } my @data = split ' ', do{ local $/; <DATA> }; islands $_, \@data for map{ $_ / 20 } 1 .. 20; =comment c:\test>545901 0.05 : 0000000111111111100001 0.10 : 0000000011111111000000 0.15 : 0000000001111111000000 0.20 : 0000000001111110000000 0.25 : 0000000000111110000000 0.30 : 0000000000111110000000 0.35 : 0000000000111110000000 0.40 : 0000000000111100000000 0.45 : 0000000000111100000000 0.50 : 0000000000111100000000 0.55 : 0000000000111100000000 0.60 : 0000000000111100000000 0.65 : 0000000000011000000000 0.70 : 0000000000011000000000 0.75 : 0000000000011000000000 0.80 : 0000000000011000000000 0.85 : 0000000000011000000000 0.90 : 0000000000011000000000 0.95 : 0000000000010000000000 1.00 : 0000000000000000000000 =cut __DATA__ 101 203 321 45 67 156 203 502 899 2003 5007 8020 7301 5030 3045 1243 567 321 234 45 123 453

    You could enhance that by adding a minimum width to exclude transient spikes like the one shown at the end of the 5% line. Turning the 0s & 1s back into either groups of values or index ranges is fairly easy depending upon what you need to do with the islands once isolated.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How do I find peaks in noisy data?
by spiritway (Vicar) on Apr 27, 2006 at 02:32 UTC

    OK, so then my question is, how do you know when you have valid data, instead of just noise? That is, you might have input with a fairly small variation, where the peaks aren't all that much higher than the valleys. Would you be able to distinguish whether you had any good data among the noise? If so, how? What makes your values good or bad?

    There needs to be some means of determining what your good input is. It could be the absolute magnitude of your values (a threshold value, say 500 in this case). However, you said that this wouldn't work - the noise levels, and presumable the levels of the good data, would be variable, so 500 might be within the noise range sometimes.

    So another means might be if you knew how many good points you're supposed to have. In that case, simply order the points by value, and grab the top X% of them as "good", and ignore the rest. However, this requires that you know how many good points you're supposed to have.

    If this is something that is somewhat periodic, you could use an FFT to get a set of frequencies out of the data; then apply the Kronecker delta function ($kron= $val>$threshold? $val: 0). That basically means, get rid of the frequencies that are not strongly represented in the data. Then do the inverse FFT to get your data back, minus the noise caused by unwanted frequencies. This would only work if you knew your data should be hovering around some definite set of frequencies, and not be all over the spectrum. Check out MathWorld for much more information on FFT, the Kroneker delta, and all that good stuff.

    Finally, you can only do so much with bad data. You may need to take a look at the data source itself, if possible, and modify it to try to eliminate the noise at the very beginning.

Re: How do I find peaks in noisy data?
by TedPride (Priest) on Apr 27, 2006 at 06:29 UTC
    Not that hard. If you want an absolute cutoff (500) rather than percentage (6% of max), just change the lines assigning values to $cutoff.
    use strict; use warnings; my @n = (101, 203, 321, 45, 67, 156, 203, 502, 899, 2003, 5007, 8020, +7301, 5030, 3045, 1243, 567, 321, 234, 45, 123, 453); my $cutoff = .06; my $max = 0; for (@n) { $max = $_ if $_ > $max; } $cutoff = $max * $cutoff; for (@n) { $_ = 0 if $_ < $cutoff; print $_ ? '-' x ($_ * 50 / $max) : '', "\n"; }
    --- ----- ------------ ------------------------------- -------------------------------------------------- --------------------------------------------- ------------------------------- ------------------ ------- ---
    EDIT: Hmm, you're right, the background noise could theoretically vary, as well as the height of the peaks. I guess a more complicated algorithm is needed. Perhaps the cutoff level could be set to some percentage of the closest peak, then any islands remaining would be removed if they had a bad width-height aggregate (this would be the complicated part). The idea is to set the cutoff low enough so you don't lose much of the edges of the islands, but also to remove any islands which are just background noise.

    Then any parts which have been removed can be analyzed to see if there's a regular pattern. If there is, that pattern is extrapolated and removed from the remaining islands.

    Altogether too much work, imho :) I'm sure there is a utility written in C or C++ somewhere that does this already, probably faster and better than anything we can write on the spur of the moment. Unless what you're trying to do is something simple like automatically trimming sound files, I'd personally just get a third-party utility and write a wrapper for it.

Re: How do I find peaks in noisy data?
by zaken7 (Novice) on Apr 27, 2006 at 08:47 UTC
    Hi, To bring 'islands' out of the noise you need to sum the data repeatedly over itself. The islands will become more prominent and you should be able to clearly see them over a threshold. You should also normalise the data first because the noise may be at different levels, spectrum analysis? thats the direction I would try anyway.
Re: How do I find peaks in noisy data?
by swampyankee (Parson) on Apr 27, 2006 at 15:45 UTC

    This can be considered a signal processing problem; have you tried CPAN? If that doesn't work, you could try ACM Algorithm 379 or from STATLIB or SPIB.


    "Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. "
    —G. Steele
Re: How do I find peaks in noisy data?
by dokkeldepper (Friar) on Apr 29, 2006 at 12:26 UTC


    one important point is missing so far.

    Most answers more or less assume implicitely that

    1. there is a signal in the data
    2. the data is stationary
    3. the data has a distribution that has a finite variance

    As general as you stated the question none of these assumption needs to be fulfilled. For example a random walk with drift can produce extremely large values very fast. Or if the underlying data generating process has a distribution without a finite variance (f.e. Cauchy-Distribution) then the usual signal extraction mechanisms fail. The most prominent examples are stock returns which possibly have no finite variance.

    To make my complaints operational, you first need some good reason to assume that your data has some particular parametric distribution. Think about it. Then one can try to estimate the parameters and can identify the most unprobable values as "islands".

    Another way would try to do this nonparametrically. Usually one needs much more data for these techniques.

    Check: R

    and the related R-modules on CPAN

    Without some idea about the nature of the data generating process there are to much whens and ifs to consider to give some usable perl code for your question.

Re: How do I find peaks in noisy data?
by nothingmuch (Priest) on May 02, 2006 at 00:21 UTC
    compute the derivative of the wave pattern, and then whenever it is, say, over the last 4 samples by average larger than say .25 then you probably have a candidate for a peak. Keep looking till it stops rising, then try to find something with a proportionate declination in the derivative function.

    If the noise is random you can use something like ent. This way you can look at long sample window and decide if it's got any islands or not in it.

    This can get false islands in, but is also more sensitive than

    There are more complicated algorithms, like the dolby stuff but that's probably overkill.

    zz zZ Z Z #!perl

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://545901]
Approved by sweetblood
Front-paged by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2016-08-27 20:04 GMT
Find Nodes?
    Voting Booth?
    The best thing I ever won in a lottery was:

    Results (387 votes). Check out past polls.