Sunday, February 28, 2016

Speed Bake-Off Tensorflow / (Numpy | SciPy | Matplotlib) / OpenCV

I recently got interested into scaling some of the algorithms from my PhD thesis. Especially the feature learning part. I used a K-Means to learn a convolutional codebook over audio spectrogram.
Why you can use K-Means to learn something like a convolutional neural net is described in: Coates, Lee and NG as well as my thesis. Anyways in my thesis experiments the bottle neck was the feature learning. So I need a fast clustering implementation as well as a fast convolution in order to resolve the speed issues. The three main implementations I look at are:
  • Google Tensor Flow
  • SciKit Learn
  • OpenCV
All three provide a python interface but use a low level, speedy C or Fortran implementation under the hood. My goal is to evaluate these three libraries for their performance (speed) under the following conditions:

The library is fastest when testing on a single Macbook. It can use the available parallel devices such as a GPU and multi threading. Furthermore, the library should be fastest on medium sized problems.

I choose these conditions since I plan to design programs that animal communication researchers can use in the field. Meaning, if you are to analyse dolphin communication on a boat in the Bahamas or  
you observe Birds in the rainforest, the only thing you might have is your laptop with no internet connection.


The convolution test is based on the three functions:

  • Tensorflow: tensorflow.nn.conv2d
  • SciPy: scipy.signal.convolve2d
  • OpenCV: cv2.filter2d
My test is to compute the edge image using a sobel filter in x-direction and on in y-direction:

I am using the following image and expect the following edge image:

The results in seconds:
  • Tensorflow: 0.125 [s]
  • SKLearn:     0.049 [s]
  • OpenCV:     0.019 [s] 
Here it looks like the OpenCV implementation is the fastest. So openCV it is. For the clustering there are a lot of libraries and even the sklearn implementation is very fast.


After Himanshu's comment I chose to check Tensorflow not including the variable assignment
and then it took 0.021 seconds. However, the image copying into the variable and the session setup
matter in my use case, too. And it is still lower than Open CV. It is also interesting that for these problems, the speed between the libraries is not that different. However, I belief that for larger problems, the tensor flow version that can run on a cluster will show way better performance. Also I don't know right now if the current tensor flow version works with opencl.


  1. Interesting. Maybe the amortized cost of running a batch of images instead of just one will change the results?

    1. It might, also you can run several filters stacked in one convolution by designing your tensor. It might be faster if you run a lot of filters. Sobel only uses two so the example might be biased in a similar matter. It might also be that tensor flow can not use the macbooks GPU while I know that opencv does. In general I think tensor flow is way faster with the server implementation that is not handed out :)