Tuesday, November 11, 2014

Spectrogram Interest Points: Shazaam

After some time, I decided to write another blog post. This time I want to talk about what one can do with local interest points in a spectrogram.

An interest point in a spectrogram is a point of high magnitude in time and frequency. Normally we use points that are a local maximum in a small region.
In that way these points group around interesting audio events in the spectrogram. One use of these features is audio indexing and retrieval. For example, the Shazaam app records audio using your phone. It proceeds to extract local features from the audio and compares them against a data base
of songs, indexed in advance. The app is capable of naming a large variety of songs from noisy recordings.



The algorithm for indexing is extracting such interest points from the spectrogram and continues to
hash combinations of these points. Therefore, the algorithm uses one of the points in a region as an anchor point and measures the offset from the anchor point to other points in a neighborhood. These combinations are to index the audio file. For an unseen song, the hashes are extracted using a sliding window and the app searches for matches with the pre recorded hashes. The image above shows local interest points in red and some combinatorial hashes in yellow grouping around a dolphin whistle. We
used the same features to build a dolphin whistle detector running on an underwater wearable computer.

The algorithm is quite robust to noise and shows good indexing performance on music data.