Sunday, February 28, 2016

Speed Bake-Off Tensorflow / (Numpy | SciPy | Matplotlib) / OpenCV

I recently got interested into scaling some of the algorithms from my PhD thesis. Especially the feature learning part. I used a K-Means to learn a convolutional codebook over audio spectrogram.
Why you can use K-Means to learn something like a convolutional neural net is described in: Coates, Lee and NG as well as my thesis. Anyways in my thesis experiments the bottle neck was the feature learning. So I need a fast clustering implementation as well as a fast convolution in order to resolve the speed issues. The three main implementations I look at are:
  • Google Tensor Flow
  • SciKit Learn
  • OpenCV
All three provide a python interface but use a low level, speedy C or Fortran implementation under the hood. My goal is to evaluate these three libraries for their performance (speed) under the following conditions:

The library is fastest when testing on a single Macbook. It can use the available parallel devices such as a GPU and multi threading. Furthermore, the library should be fastest on medium sized problems.

I choose these conditions since I plan to design programs that animal communication researchers can use in the field. Meaning, if you are to analyse dolphin communication on a boat in the Bahamas or  
you observe Birds in the rainforest, the only thing you might have is your laptop with no internet connection.

Convolution


The convolution test is based on the three functions:


  • Tensorflow: tensorflow.nn.conv2d
  • SciPy: scipy.signal.convolve2d
  • OpenCV: cv2.filter2d
My test is to compute the edge image using a sobel filter in x-direction and on in y-direction:




I am using the following image and expect the following edge image:





The results in seconds:
  • Tensorflow: 0.125 [s]
  • SKLearn:     0.049 [s]
  • OpenCV:     0.019 [s] 
Here it looks like the OpenCV implementation is the fastest. So openCV it is. For the clustering there are a lot of libraries and even the sklearn implementation is very fast.

Update:

After Himanshu's comment I chose to check Tensorflow not including the variable assignment
and then it took 0.021 seconds. However, the image copying into the variable and the session setup
matter in my use case, too. And it is still lower than Open CV. It is also interesting that for these problems, the speed between the libraries is not that different. However, I belief that for larger problems, the tensor flow version that can run on a cluster will show way better performance. Also I don't know right now if the current tensor flow version works with opencl.

Wednesday, February 17, 2016

Word Embeddings

Since my grad school interest was focused on machine learning for perception,  I did not notice a class of methods called word embeddings. However, recently I got interested more into text mining so I started to read up on these method and implement some.
A word embedding maps words into a multi dimensional euclidean space in which semantically similar words are close. In other words, each word in your dictionary is represented by a multi dimensional vector.

A word embedding can capture many semantics. For example, on the word2vec webpage,
an embedding from google, it is noticed that:

  1. "vector('king') - vector('man') + vector('woman') is close to vector('queen')"
  2. "vector('Paris') - vector('France') + vector('Italy') [...] is very close to vector('Rome')"

In other words, the representation is capturing concepts such as gender and captial city.
The two most prominent methods so far seem to be word2vec (by google), glove (by stanford's nlp group). Furthermore, there is a very recent combination of the two called swivel (again by google).

All the methods are based on the idea that the usage of a word gives insight into the words meaning or that similar words are used in a similar context. Here context can be defined as a small neighbourhood. For example, a context definition could be defined as the words to the left of the target word and the three words to the right.

Google's Word2Vec obtains the vectors by using a simple neural net that predicts the target word from it's context (Continuous Bag Of Words) or vice versa (Skip Gram). As usual the neural net can be trained using stochastic gradient descent (back propagation). The neural net can be seen as learning a representation for words (word vectors) and a representation for contexts (context vectors).
Glove sloves a similar problem. However, instead of predicting the actual context around a word, glove learns vectors predictive of the a global coocurance matrix, extracted from the complete corups. Glove is trained using adagrad. In general we solve an optimisation problem of the form:




Basically the two methods differ in the way f(.) and C are defined. C is the target function and we aim to minimise the difference between the prediction and the target, scaled by a function of the co-occurrence count f(.). For word2vec the target function is the pointwise mutual information between two words and for glove it is the log  co-occurrence count. 

Both method come with several implementations already. Semantic similarity can be used for several 
NLP tasks. For example, sentiment analysis or query expansion.