Daniel@World: computer vision

Showing posts with label computer vision. Show all posts

Tuesday, July 9, 2019

WebRTC and Tensorflow.js

WebCam image classification on web page

I will give a quick tutorial on how to connect a webcam in html5/WebRTC to Tensorflow.js for image classification. We will load a pre trained mobile net and then pass it frames from the camera. First, we define our html page and specify a video element in which we later stream the video from the camera,
a label field and also start the WebRtc + neural network code. The code can be found at this gist.

<!DOCTYPE html>
<html>
  <head>
    <title> Hello WebRTC </title>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.0.0/dist/tf.min.js"></script>
  </head>
  <body>
    <video id="cam" width="224" height="224" autoplay playsinline></video> <br/>
    <script src="camnet.js"></script><br/>
    <font color="red" size="5"> LABEL: </font><br/>
    <div id="label"> </div>
  </body>
</html>

Then we download the pretrained mobile net and initialize the webcam.

 async function init() {
    try {
 const net = await tf.loadLayersModel(MOBILENET_MODEL_PATH);
        const constraints = window.constraints = {audio: false, video: true};
     const stream = await navigator.mediaDevices.getUserMedia(constraints);
     onSuccess(stream, net);
    } catch (e) {
     onError(e);
    }
}

The camera stream can be retrieved by the getUserMedia function. The onError method simply writes an error to the console. If we are successful, we get the video element from our dom and bind the stream to the video. We then start the detection loop with a method called onFrame.

function onSuccess(stream, net) {    
    const video = document.querySelector('video');
    const videoTracks = stream.getVideoTracks();
    console.log('Got stream with constraints:', constraints);
    console.log(`Using video device: ${videoTracks[0].label}`);
    window.stream = stream;
    video.srcObject = window.stream;
    onFrame(video, net);
}

onFrame's inner function processFrame is an infinite recursion. We grab the video frame by frame and push it into a classify method along with the neural network, the video element as well as the label element.

function onFrame(video, net) {
    var label_element = document.getElementById('label');
    console.log(net.summary());
    async function processFrame() {
 classify(video, label_element, net)
        requestAnimationFrame(processFrame);          
    }
    processFrame();
}

The last method transforms the camera image into a tensor, normalizes the color and then construct a batch with only one example. Based on the prediction from the mobile net, we extract the best class and write it into the label element.

async function classify(img_element, label_element, net) {
    const img = tf.browser.fromPixels(img_element).toFloat();
    const offset = tf.scalar(127.5);
    const normalized = img.sub(offset).div(offset);
    const batched = normalized.reshape([1, IMAGE_SIZE, IMAGE_SIZE, 3]);
    const prediction = await net.predict(batched).data();
    var max_i = 0;
    var max_v = prediction[0];
    for (let i = 0; i < prediction.length; i++) {
 if(prediction[i] > max_v) {
     max_v = prediction[i];
     max_i = i;
 }
    }
    const label = IMAGENET_CLASSES[max_i];
    if (max_v > 0.5) {
 label_element.innerHTML = label + " [" + max_v + "]";
    }
}

Saturday, November 3, 2018

Exploring Super Resolution With Neural Networks: Blade Runner Style

In Blade Runner there is a scene in which Deckard (the protagonist) inspects one of the replicant's (Loean's) photos. The scene features an insane zoom which this blog post estimates the final magnification at a factor of 667.9

While this is science fiction (obviously) there where some breakthroughs in super resolution recently. Basically we use machine learning models to upscale an image without gaining pixelated artefacts.
I used this code to upscale the same photo from the movie using the code that is around using the Lasagne neural network framework.

The top image shows a reflection in a cabinet that is itself a reflection in the round mirror.
The bottom image shows the first zoom in Deckard performs.

Below is the original image (top) and the result (bottom) I got when running the 4 times upscale neural net on the image from Blade Runner.

If you zoom into the mirror in the reconstructed image (open in new tab to see it in it's original resolution) and you use your imagination, I believe you can see a face in it. I do not believe

it is actually there but that it is a reconstruction artefact, but who knows ...

Sunday, February 28, 2016

Speed Bake-Off Tensorflow / (Numpy | SciPy | Matplotlib) / OpenCV

I recently got interested into scaling some of the algorithms from my PhD thesis. Especially the feature learning part. I used a K-Means to learn a convolutional codebook over audio spectrogram.
Why you can use K-Means to learn something like a convolutional neural net is described in: Coates, Lee and NG as well as my thesis. Anyways in my thesis experiments the bottle neck was the feature learning. So I need a fast clustering implementation as well as a fast convolution in order to resolve the speed issues. The three main implementations I look at are:

Google Tensor Flow
SciKit Learn
OpenCV

All three provide a python interface but use a low level, speedy C or Fortran implementation under the hood. My goal is to evaluate these three libraries for their performance (speed) under the following conditions:

The library is fastest when testing on a single Macbook. It can use the available parallel devices such as a GPU and multi threading. Furthermore, the library should be fastest on medium sized problems.

I choose these conditions since I plan to design programs that animal communication researchers can use in the field. Meaning, if you are to analyse dolphin communication on a boat in the Bahamas or

you observe Birds in the rainforest, the only thing you might have is your laptop with no internet connection.

Convolution

The convolution test is based on the three functions:

Tensorflow: tensorflow.nn.conv2d
SciPy: scipy.signal.convolve2d
OpenCV: cv2.filter2d

My test is to compute the edge image using a sobel filter in x-direction and on in y-direction:

I am using the following image and expect the following edge image:

The results in seconds:

Tensorflow: 0.125 [s]
SKLearn: 0.049 [s]
OpenCV: 0.019 [s]

Here it looks like the OpenCV implementation is the fastest. So openCV it is. For the clustering there are a lot of libraries and even the sklearn implementation is very fast.

Update:

After Himanshu's comment I chose to check Tensorflow not including the variable assignment
and then it took 0.021 seconds. However, the image copying into the variable and the session setup
matter in my use case, too. And it is still lower than Open CV. It is also interesting that for these problems, the speed between the libraries is not that different. However, I belief that for larger problems, the tensor flow version that can run on a cluster will show way better performance. Also I don't know right now if the current tensor flow version works with opencl.

Wednesday, July 16, 2014

SLIC: Simple super pixels

After searching for an easy algorithm to compute super pixels I found a cool method segment images based on color using a modified K-Means version called Simple Linear Iterative Clustering or SLIC. The input to the algorithm is a set of vectors describing each pixel in the images as a vector:

v = [l a b x y]

The first three entries represent the color in LAB space. The last two the position in the image. The goal of the algorithm is to segment the image into k more or less equally sized regions of same color, called super pixels. The algorithm is initialized by sampling the image k times uniformly. Based on these initial centers one runs a local K-Means for each center. So a clusters influence is in a limited region.

The other novelty of the algorithm is the distance measure that scales the color and position.