Trial and Error

In the work process, we had to try multiple approaches to some of the problems that we encountered. We developed two distinct solutions for pitch detection and the correlation of the signals. Next is a description of the methods that we decided to exclude from the final solution because of difficulties with the integration. Even though we found better implementations, remembering these concepts might be useful in the future.

The initial implementation of the Pitch Detector used the Voice Detection object in Matlab. Although the object handles most of the detection on a lower level of abstraction, we tried to dig a little deeper in the algorithms that it uses.

Pitch Detector Program

The diagram shows the data flow in the VAD object. First, the signal is filtered and some metadata is attached to it. Then, the VAD finds the Fourier transform and the power of the signal in the frequency domain.

The prior and posterior Signal to Noise Ratio is calculating by constantly updating the probabilities of speech and, respectively, the absence of speech. The DFT coefficients are taken as independent Gaussian Random Variables.

Something else that we really wanted to do was utilize the matrix form a spectrogram to identify notes. Ideally this will cut down on a lot of the code we ended up writing because what we got out had too many overlaying notes to be able to remake into our own. Eventually we settled on using an average pitch detector to just select the primary frequency given a certain window length inputted by the user.

Furthermore, these distributions are fed into a recursive algorithm which updates the probability of the present state of the system, using the previous state and the bias between the predicted and the observed SNR.

The image on the right shows the signal in the time domain (yellow) and the probability of voice (blue). The degree of accuracy of the VAD was satisfiable, but it did not output the time at which the pitches occurred and we could not use it for cropping the audio files.

Cross Correlator

The other module which we had to exclude from the final solution was the Cross Correlator. Although we used some of its algorithms, the part where we calculated to Magnitude Square coherence did not help us with the matching of the signals because it could only output the similarities in the frequency domain. In a similar manner as with the Pitch Detector, we were getting useful output, but it was impossible to incorporate it because there was no time associated with the given matching frequencies. The picture below shows a sample graph of the Squared Magnitude Coherence of a song and a dog bark.

Denoising - Custom Filters

Earlier in our project, we were playing around with denoising techniques to make clearer signals. Our first attempt at denoising was to custom design filters for each animal video using MATLAB’s FilterDesigner. The problem here was that we had a large number of files and designing custom filters would take up too much time. The filtering process this way is also hard to automate.

To the left is an example of denoised bark audio we produced with this method.

Denoising - FFT Spectral Subtraction

Our second attempt at denoising signals focused on ease of automation. We tried out the method of FFT spectral subtraction. Essentially, it involves taking a frequency-domain representation of noise within a signal and subtracting this from the original signal's frequency-domain representation. Below there is a diagram of this process. With our algorithm, we tried to detect the longest quiet period within an audio section and use that as our noise model. However, we chose to not use this method, because it gave inconsistent results due to non-noise sounds making it into the pure-noise sections of our signals.