Friday, August 15, 2014

ICASSP: Pattern Discovery in Dolphin Whistles

Abstract of my ICASSP 2014 paper on Dolphin Communication Mining:

"The study of dolphin cognition involves intensive research of animal vocalizations. Marine mammalogists commonly study a specific sound type known as the whistle found in dolphin communication. However, one of the main problems arises from noisy underwater environments. Often waves and splash noises will partially distort the whistle making analysis or extraction difficult. Another problem is discovering fundamental units that allow research of the composition of whistles. We propose a method for whistle extraction from noisy underwater recordings using a probabilistic approach. Furthermore, we investigate discovery algorithms for fundamental units using a mixture of hidden Markov models. We evaluate our findings with a marine mammalogist on data collected in the field. Furthermore, we have evidence that our algorithms enable researchers to form hypotheses about the composition of whistles."

Symbolic Aggregate approXimation: A symbolic time series representation

I used this time series representation some years ago for a lot for my research. I still think it is an elegant way of representing time series. You can use this easy to use algorithm to convert a one dimensional time series into a string. Given a time series you split it into w equally sized segments and estimate the sample mean in each segment. So we end up with a time series of length w. We then divide the Y axis into k regions using split points or thresholds. Assigning a unique symbol to each of the regions we can
check in which region each sample mean falls into and read of the symbol. So we end with a string of size w.

You can see the performance on multiple time series data sets in the original paper. Furthermore, there is a very efficient way on how to index massive data sets using this representation.