## Tuesday, July 9, 2019

### WebRTC and Tensorflow.js

 WebCam image classification on web page

I will give a quick tutorial on how to connect a webcam in html5/WebRTC to Tensorflow.js for image classification. We will load a pre trained mobile net and then pass it frames from the camera. First, we define our html page and specify a video element in which we later stream the video from the camera,
a label field and also start the WebRtc + neural network code. The code can be found at this gist.

<!DOCTYPE html>
<html>
<title> Hello WebRTC </title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.0.0/dist/tf.min.js"></script>
<body>
<video id="cam" width="224" height="224" autoplay playsinline></video> <br/>
<script src="camnet.js"></script><br/>
<font color="red" size="5"> LABEL: </font><br/>
<div id="label"> </div>
</body>
</html>


Then we download the pretrained mobile net and initialize the webcam.
 async function init() {
try {
const net = await tf.loadLayersModel(MOBILENET_MODEL_PATH);
const constraints = window.constraints = {audio: false, video: true};
const stream = await navigator.mediaDevices.getUserMedia(constraints);
onSuccess(stream, net);
} catch (e) {
onError(e);
}
}


The camera stream can be retrieved by the getUserMedia function. The onError method simply writes an error to the console. If we are successful, we get the video element from our dom and bind the stream to the video. We then start the detection loop with a method called onFrame.
function onSuccess(stream, net) {
const video = document.querySelector('video');
const videoTracks = stream.getVideoTracks();
console.log('Got stream with constraints:', constraints);
console.log(Using video device: \${videoTracks[0].label});
window.stream = stream;
video.srcObject = window.stream;
onFrame(video, net);
}


onFrame's inner function processFrame is an infinite recursion. We grab the video frame by frame and push it into a classify method along with the neural network, the video element as well as the label element.
function onFrame(video, net) {
var label_element = document.getElementById('label');
console.log(net.summary());
async function processFrame() {
classify(video, label_element, net)
requestAnimationFrame(processFrame);
}
processFrame();
}

The last method transforms the camera image into a tensor, normalizes the color and then construct a batch with only one example. Based on the prediction from the mobile net, we extract the best class and write it into the label element.
async function classify(img_element, label_element, net) {
const img = tf.browser.fromPixels(img_element).toFloat();
const offset = tf.scalar(127.5);
const normalized = img.sub(offset).div(offset);
const batched = normalized.reshape([1, IMAGE_SIZE, IMAGE_SIZE, 3]);
const prediction = await net.predict(batched).data();
var max_i = 0;
var max_v = prediction[0];
for (let i = 0; i < prediction.length; i++) {
if(prediction[i] > max_v) {
max_v = prediction[i];
max_i = i;
}
}
const label = IMAGENET_CLASSES[max_i];
if (max_v > 0.5) {
label_element.innerHTML = label + " [" + max_v + "]";
}
}