|laziness, impatience, and hubris|
Re: How to compare 2 wav files.by toma (Vicar)
|on May 29, 2002 at 04:59 UTC||Need Help??|
Here is another approach that hasn't been mentioned yet. No AI, no feature extraction, no FFT, and it will work great! It is quite slow, though.
Imagine that you have a bunch of sound samples. These samples are either positive or negative. If you take another copy of the samples and slide them across the original samples, they will line up at one particular instant in time. A nice way to get the computer to "see" this moment is to multiply the samples from each waveform together, sample by sample. Then, add up the products. This works because the lined-up samples will all turn into positive numbers (instead of the random mix of positive and negative numbers when they don't line up.) All these positive numbers will add up to a really big positive number, which is called a correlation spike.
This algorithm of sliding the samples across each other, multiplying them sample-by-sample, and adding them together is called convolution. It works great but requires a large amount of computer power. If you have a really hot machine, or you are patient, it should work fine.
A short-cut for this procedure takes advantage of the Fast Fourier Transform (FFT). This amazing algorithm allows you to transform a convolution in the time domain into a multiplication in the frequency domain. To get the benefits of this algorithm, you will need to learn about window functions, the effect of sample rates, and some other gory details. It will be *much* faster, but also much more work to learn how to use.
To solve your particular problem, you will need to convolve each possibly new sample against each song in your collection, looking for a match. For the FFT algorithm, you can compute the FFT of each song only once, and store it. These stored FFT samples are multiplied by the FFT of the new song. You get the same value for the correlation spike when you multiply the FFTs as when you do the convolution. If you get the huge spike in either the time or the frequency domain, the songs are the same.
The sliding FFT that jcwren mentions solves an important problem with the FFT. Imagine that two songs start with identical notes, but they are played in a different order. An ordinary FFT cannot distinguish between these two songs. The sliding FFT will fix this. The sliding FFT is yet more complicated than the ordinary FFT, so I wouldn't recommend it as a first project in signal processing.
For doing this type of number-crunching in perl I use the PDL modules. They are well worth the trouble to install.
Update: See Analyzing WAV Files with Perl for FFT usage.
It should work perfectly the first time! - toma