Thursday, February 14, 2013

A nice linear algebra library for C++

Normally when coding in C++ you will use the interface of LAPACK or the boost library uBlas. However these are not really nice to use. I found an alternative and after spending an afternoon with it I must say I am impressed. The library is called Armadillo. The nice things are the initialization methods provided, the element access as well as the general operators. For example if I was to create
a 2 x 2 matrix with some values I can simply do this:

matrix << 1 << 2 << endr
<< 0 << 1 << endr;

Accessing an element is simply:

matrix(i, j)

And operations can be performed as:

X =  A * B
X = A + B

The functionality seems to include everything I needed so far:

det(X)
inv(X)

In the end there is a speed comparison in which armadillo is always the fastest.

Monday, February 11, 2013

Thoughts on EM - Clustering: Growing A Mixture

Since I am still working a lot on my clustering algorithms, especially Expectation Maximization for Gaussian Mixture Models, I will share another thought on how to improve EM's performance. In general the problem is that EM will converge on a local maximum. That menas when we start with initial bad gaussians, we will end up on a local maximum close to where we started. Solutions are run k-means 10 x and use the clusters as the inital means (like weka) or cluster hierarchical first (see older blog post). The problem with k-means is that it is localized, too. The problem with the hierarchical version is that it is slow. In the end the initialization was slower for me then the actual clustering. Something I found today is "growing" the mixture incrementally [1].

Just to remind you EM is a two stage algorithm. First we calculate the probabilities
for each data point using the current model parameters (E -step, expectation), and then use these probability estimates to refine the parameters (M-step, maximization). Now the trick is to split the gaussian with the highest rank (highest weight or prior) after each M-step and delete all components
with a too small prior or a too small variance. We repeat that process until we reach the desired k.

[1] Kevin Murphy: Machine Learning: a Probabilistic Perspective, MIT Press, 2012

Monday, February 4, 2013

Conditional Random Fields: I finally got it!

As a combination of trying again and again and the classes I take I was able to wrap my head around why CRFs are great and how they work. At first I was a bit sceptic since I love my HMMs but maybe in the future I give it a try. I think the main reason I got what these models do is that there is a very deep discussion on Markov Random Fields in our Probabilistic Graphical Models lecture (by Jim Rehg) CS 8803 at Georgia Tech. Since I will go a bit deeper in the model itself in a second here is the important bit. The first difference is that Random Fields are a distribution over potentials which are just any positive function defined over a clique of variables in the model. The joint probability is then defined as a scaled multiplication of these potentials. In Hidden Markov Models we have a graph structure over probability distributions instead. The difference of a Conditional Random Field to a Markov Random Field is that it models the conditional probability instead of the joint probability. The only difference however seems to be the definition of the scaler.

Now to the details. In contrast to a Bayes Network or a Hidden Markov Model a Random field is a undirected model. So instead of talking about a parent child relationship we talk about interaction between potentials. A potential is a function defined over variables of a clique. Just as a reminder,
in Computer Science a clique is a fully connected subgraph. For example if we were to label each frame in a sequence we could use the model below. The Y are the labels over time and X is the whole time series.

The potential function is defined as a three- clique over the last label, the current label and the whole time series.

While the transition probability (label sequence) is the same as in Hidden Markov Models as it models a
first order Markov Chain, the observations are fully accessible. In a Hidden Markov Model the observations have to be independent by definition. So now can define more powerful observations by designing potential functions that take previous values into account. Furthermore, we can use the labels in order to behave differently from label to label. The joint probability is then defined over these potentials. However for labeling the sequence we are not interested in the joint probability but in the conditional as P(Y | X) instead of P(Y, X). The conditional probability is then defined as:

And this is a Conditional Random Field. In my opinion and from what I read, this model is more powerful then Hidden Markov Models but has some drawbacks. The first one is that in order to train it, we need fully labeled data. While in a Hidden Markov Model, we can automatically distribute the data over the states by inferring the label sequence, this is not possible here. The reason is that instead of a Expectation Maximisation training, most people seem to use a Gradient Based optimization on the log-likelihood. Acquiring labels on that level might not be possible in all cases. The other thing I read is that these models need a lot more training data for training and the training is much slower. In the end that means use cases where only a few training instances can be collected this model won't be able to converge.

Magic Summon: Building false positive free Gestures from just negative examples

Thad and I have a new publication in the Journal Of Machine Learning Research. The nugget for me in this paper is that we were able to build a tool that automatically suggests gestures that won't false trigger in everyday life. The paper can be found in Volume 14. I attached the abstract for everyone interested:

"Gestures for interfaces should be short, pleasing, intuitive, and easily recognized by a computer. However, it is a challenge for interface designers to create gestures easily distinguishable from users' normal movements. Our tool MAGIC Summoning addresses this problem. Given a specific platform and task, we gather a large database of unlabeled sensor data captured in the environments in which the system will be used (an "Everyday Gesture Library" or EGL). The EGL is quantized and indexed via multi-dimensional Symbolic Aggregate approXimation (SAX) to enable quick searching. MAGIC exploits the SAX representation of the EGL to suggest gestures with a low likelihood of false triggering. Suggested gestures are ordered according to brevity and simplicity, freeing the interface designer to focus on the user experience. Once a gesture is selected, MAGIC can output synthetic examples of the gesture to train a chosen classifier (for example, with a hidden Markov model). If the interface designer suggests his own gesture and provides several examples, MAGIC estimates how accurately that gesture can be recognized and estimates its false positive rate by comparing it against the natural movements in the EGL. We demonstrate MAGIC's effectiveness in gesture selection and helpfulness in creating accurate gesture recognizers.