Showing posts with label gesture. Show all posts
Showing posts with label gesture. Show all posts

Monday, February 4, 2013

Magic Summon: Building false positive free Gestures from just negative examples

Thad and I have a new publication in the Journal Of Machine Learning Research. The nugget for me in this paper is that we were able to build a tool that automatically suggests gestures that won't false trigger in everyday life. The paper can be found in Volume 14. I attached the abstract for everyone interested:

"Gestures for interfaces should be short, pleasing, intuitive, and easily recognized by a computer. However, it is a challenge for interface designers to create gestures easily distinguishable from users' normal movements. Our tool MAGIC Summoning addresses this problem. Given a specific platform and task, we gather a large database of unlabeled sensor data captured in the environments in which the system will be used (an "Everyday Gesture Library" or EGL). The EGL is quantized and indexed via multi-dimensional Symbolic Aggregate approXimation (SAX) to enable quick searching. MAGIC exploits the SAX representation of the EGL to suggest gestures with a low likelihood of false triggering. Suggested gestures are ordered according to brevity and simplicity, freeing the interface designer to focus on the user experience. Once a gesture is selected, MAGIC can output synthetic examples of the gesture to train a chosen classifier (for example, with a hidden Markov model). If the interface designer suggests his own gesture and provides several examples, MAGIC estimates how accurately that gesture can be recognized and estimates its false positive rate by comparing it against the natural movements in the EGL. We demonstrate MAGIC's effectiveness in gesture selection and helpfulness in creating accurate gesture recognizers.

Wednesday, January 2, 2013

Building a Gesture Recognizer - Part 2 Hidden Markov Models

In the last post on gesture recognition, we used a simple instance based recognizer. The distance measure we used is Dynamic Time Warping. This distance accounts for warping in the gesture.

A more prominent and realistic recognizer is using Hidden Markov Models.
Many Gesture and Speech or Handwritten recognition systems use these models.

A Hidden Markov Model is a probabilistic model. As Dynamic Time Warping, it models
time series while accounting for warping in the gesture.
In order to get an intuition on what a Hidden Markov Model is and how it works I will give
a brief introduction. I will not go over the math behind it since other people did a very good job at it [1].
A Hidden Markov Model is a probabilistic state machine. Each state has two probability distributions.
A continues one that models the data and a discrete one modeling the probability of transitioning
from a state to another. The continues one is called observation probability since it models the observed data. The probability distribution is a gaussian centered around a mean that discribes the data under that state best. For example in the image below we can see a three state Hidden Markov Model.
The time series has clearly three distinct parts. We model each part as one state, while the mean of the
state is the mean estimated from that part. The goal is to find the maximum likelihood state sequence
given the data and the model. We can infer which frame of the time series belongs to which
state using the observation probabilities and the transition probabilities.



How to perform that inference is described in the Rabiner paper on Hidden Markov Models on page 263 "Solution to Problem 2". The procedure's name is "Viterbi decoding". Once we have the state sequence we can calculate the probability of the time series being generated by a particular model.

For a gesture recognizer we build multiple of these models, one for each gesture. Then we collect a training set and use it to estimate the parameters of that model. During recognition we simply pick the model describing the data best.

In most cases Hidden Markov Models outperform Dynamic Time Warping solutions in accuracy and
speed. If you want to build a gesture recognizer using Hidden Markov Models you can use GART or
GT2K built by the research group I am currently working in. Both are build on the Hidden Markov Model Toolkit 3 (HTK3) so you have to install it first. I also recommend reading the HTK3 Book if you are interested in Hidden Markov Models in general. GT2K is a collection of shell scripts wrapping
HTK and it can be downloaded from the Contextual Computing Group. GART is a Java wrapper which is downloadable at the Contextual Computing Group, too.


[1] Lawrence Rabiner: "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", Proceedings of the IEEE Vol 77, No 2, 1989

Thursday, November 1, 2012

Building a Gesture Recognizer - Part 1 Dynamic Time Warping

I recently noticed a lot of people want to do "something with gestures" but don't know how to start.
So I thought writing a two part tutorial on how to recognize gestures. The first part will
describe the problems of gesture recognition and will describe a very simple algorithm, while
the second part will give the insights in what most gesture recognition systems currently use.

The goal in this tutorial is to understand a common problem in gesture recognition called warping
and to build a simple, distance based gesture recognizer.

So you want to recognize gestures. For simplicity let us assume you want to recognize gestures
performed with a mouse. If you record the position of your mouse over time, you will
have a two dimensional time series. In the picture below you can see an example.
I recorded a gesture and plotted the x and y position separately.
TOP: a recorded gesture, CENTER: the position in x - axis,  BOTTOM: the position in y - axis.

If we want to build a recognizer we have to collect data first. For example if we have four different
gestures we could record ten users performing each gesture twenty times. So the resulting data set includes 800 recordings. Now one problem is that the examples for a gesture will not
exactly look the same. For example some users will perform the gesture faster and slower resulting
in an effect called "warping". One solution to this problem is to align the time series. An alignment
is the association of each point in one time series to another point in the other time series. 


Two examples of the same gesture (only x - axis). One example
is warped. The purple lines show the alignment of the two time series.

In order to compare two warped time series we will use Dynamic Time Warping.
Dynamic Time Warping compares two time series while accounting for the warp using Dynamic Programming. In general the warping is resolved by accounting for added samples (the second example is performed slower) or deleting samples (the second example is performed faster. 

The distance is calculated by constructing an M x N matrix V, where M is the length of one time series X = x1... xM and N the length of another time series Y = y1 ... yN. In our case xi and yj are two dimensional vectors (the position of the mouse). In each matrix position v(i,j)
we minimize the distance comparing the time series up to position i in X and position j in Y by the following recursion:






while v(0,0) is initialized as 0 and v(i,0) as well as v(0, j) are initialized as infinity.
Now that we have a distance measure we can implement a very simple recognizer.
We use the Nearest Neighbor Classifier using the distance described above.
For a gesture we have recorded we will calculate the Dynamic Time Warping distance
to each example in our pre recorded gesture set. The gesture associated with the example
that has the minimum distance is the recognized gesture. 


Two gestures shown in the distance space from a new example. 
One important question to ask is "What happens when the new example is not a gesture".
In that case we apply a threshold on the gesture. If the distance is too high (above the threshold)
we will not consider it as anything.

So we built a very simple recognizer. The implementation should be straight forward. In on of the next posts I will walk through gesture recognition using Hidden Markov Models. Since Hidden Markov Models are not easy to implement we will use an existing Toolkit called HTK3. I will write a step by step tutorial. 


Tuesday, December 6, 2011

Hey, what have you done in your thesis ?


People keep asking me about what I did in my diploma thesis. So I decided to write a short post about the story behind "Motion Gestures: false positive prediction and prevention". A lot of you will know motion gestures from the Nintendo Wii or the "shake to shuffle" gesture on Apple's iPhone. By shaking the phone user's can switch in the music player's shuffle mode. Those of you that have the "shake to shuffle" functionality on and quickly descended a stair while reading an email once will know this problem: The functionality triggers accidentally and the player will switch to shuffle mode.  This kind of error is called a false positive. When building a motion gesture system for everyday use one will always face the problem of "accidentally triggering functionality".
False positives: left - the intended gesture (moving phone up / down) 
right - gesture triggered from up and down movement while walking


Finding these errors is not an easy task. Neither for interaction designers nor for trained pattern recognition experts. For both it is hard to tell gestures triggering falsely in everyday life from gestures that don't. In most cases these errors will get noticed very late in development which will increase expenses or in the worst case in the final product where it will cause user dissatisfaction. Long story short. In my thesis I describe how to build a fast data base of movements from everyday life. Designers can check their gestures against this database. The result of this search is a number indicating whether to keep or to discard a gesture because it will false trigger in everyday life. Thats basically it! For a more technical description I will cite the abstract to my thesis:

"False positives are a common problem for interfaces that rely on gesture recognition. Often a gesture can seem fine in development but is found to trigger accidentally during an initial deployment of the interface, restarting development and increasing expense. This works discusses fast methods to predict and prevent the false positive rate of a gesture recognition system. Furthermore it introduces MAGIC 2.0, a technique for false positive prediction and prevention that can be used interactively during the interface design process. To ground my research, I implemented MAGIC 2.0 as a web application and developed a gesture interface using sensors on common Android mobile phone platforms. I use iSAX to enable interactive searching (2 sec/example) of a large database (1,500,000 sec) of everyday user movements on a standard workstation to determine if a candidate gesture will trigger accidentally during use of an interface. Furthermore I introduce a novel algorithm called template SAX to speed up Nearest Neighbour search. In order to evaluate those methods I perform a user-independent study that suggests that the number of matches to this Everyday Gesture Library (EGL) database is indeed predictive of a candidate gesture's suitability. I compare iSAX and tSAX to Hidden Markov Models (HMMs) and Nearest Neighbor with respect to accuracy and speed for the EGL search. Using iSAX on the EGL, I also develop a ``garbage'' class and show that including this class in recognition reduces errors. Last but not least a parameter sensibility study for the iSAX search method is performed. From the results I developed an automatic parameter tuning method for SAX based searches ( iSAX, tSAX)."