Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Comparing Two Audio Files

by Anonymous Monk
on Jun 11, 2007 at 22:38 UTC ( #620590=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

There was much discussion on this a month or so ago, but not quite for what I'm looking for. I'm looking for a way to compare two audio files (mostly voice data) for a percentage match. The system dials a phone number and gets an audio message, it then compares it to the original audio file. The problem is that the timing will not match up exactly and there will invariably be some noise in the sample over the phone lines.

Replies are listed 'Best First'.
Re: Comparing Two Audio Files
by Zaxo (Archbishop) on Jun 12, 2007 at 00:33 UTC

    There is a mathematical technique called convolution which represents the similarity of two signals. In some contexts, it's called the correlation function. It's fairly easy to calculate with perl tools.

    PDL is ideal for this. Take the FFT (Fast Fourier Transform) of each set of audio data. Multiply the transformed datasets term by term. Take the inverse FFT of the product dataset and you have the correlation function.

    If the two signals are pretty much identical, the correlation function should approximate a delta function, having a strong spike at zero and vanishing elsewhere. For real world data, some scaling, filtering, and normalization might be desirable.

    For this to work, the audio data should be raw signed numbers. You will likely need to decode to that for most popular audio formats. PDL::Audio may be of help.

    After Compline,

Re: Comparing Two Audio Files
by Joost (Canon) on Jun 11, 2007 at 22:53 UTC
    Are ware talking repeated recorded messages or just recordings of people saying the same thing? In other words, are you looking for something that matches bad recordings of the same source data or are you trying to do natural spoken language recognition?

    I'm not an expert on either, but I'd probably take a stab at problem A (matching noisy recordings) by first applying a low-pass filter to get rid of most of the noise, then downsample to some really low sample-rate, then find the peaks in the recording and see if the timing of the peaks matches any of the pre-determined messages.

    That would probably only work if you have a fairly limited number of messages, but at least it's reasonably easy to implement using standard command-line driven audio tools for the conversions and then using something like Audio::SndFile (disclaimer: I wrote it) to parse the data and find the peaks.

Re: Comparing Two Audio Files
by Anonymous Monk on Jun 14, 2007 at 00:53 UTC
    Thank you. I will have to read up on both of these methods, I hope that I can understand them. The system is a phone dialing mechanism that checks for a proper routing message and there are a limited number of messages. I want to test against each possible message to find which one it is and then also report if it is an unknown message.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://620590]
Approved by Joost
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2018-05-27 18:06 GMT
Find Nodes?
    Voting Booth?