## Saturday, September 28, 2019

### Recursive Neural Networks

 A Recursive Neural Network

Recursively Neural Networks can learn to structure data such as text or images hierarchically. The network can be seen to work similar to a context free grammar and the output is a parse t
The figure below (from this paper) shows examples of parse trees for an image (top) and a sentence (bottom).

A recursive neural network for sequences always merges two consecutive input vectors and predicts if these two vectors can be merged. If so, we replace the two vectors with best merging score with the hidden vector responsible for the prediction. If done in a recursive manner, we can construct a parse tree.

Formally we can construct the network as follows. Given an input sequence $x = x_1 ... x_T, x_i \in \mathbb{R}^D$ then for two neighboring inputs $x_t, x_{t + 1}$ we predict a hidden representation:

$h_t = \sigma(x_t W_l + b_l)$
$h_{t + 1} = \sigma(x_{t + 1} W_r + b_r)$

The hidden layer used for replacement is computed from the concatenation of the two previous hidden layers: $h=\sigma([h_t, h_{t + 1}] W_h + b_h)$. Finally, the prediction is simply another layer predicting the merging score.

For the implementation we define the neural network with the merging decision first. Given a node with the parent representations (for leaf nodes that are the input vectors) we build the hidden representations and then compute the merging score.
class NeuralGrammar(nn.Module):
'''
Given a tree node we project the left child and right child into
a feature space cl and cr. The concatenated [cl, cr] is projected
into the representation of that node. Furthermore, we predict
the label from the hidden representation.
'''

def __init__(self):
super(NeuralGrammar, self).__init__()
self.left       = nn.Linear(IN, H, bias=True)
self.right      = nn.Linear(IN, H, bias=True)
self.parent     = nn.Linear(2 * H, IN, bias=True)
self.projection = nn.Linear(IN, 1, bias=True)

def forward(self, node):
l = node.left_child.representation
r = node.right_child.representation
y = node.label
x = torch.cat([torch.relu(self.left(l)), torch.relu(self.right(r))], 0)
p = torch.tanh(self.parent(x))
score = self.projection(p)
return (p, score)


Now we can implement our node class which also handles the parsing, by greedily merging up to a tree. A node in the tree holds it's representation as well as a left and a right child. In other words, each node represents a merging decision with the two children being the nodes merged and the parent representation being the hidden layer in the merger grammar. Parsing a tree involves merging the vectors with the highest score and replacing the nodes with their parent node.

class TreeNode:
'''
A tree node in a neural grammar.
Represented as a binary tree. Each node is also represented by
an embedding vector.
'''

def __init__(self, representation=None):
self.representation = representation
self.left_child     = None
self.right_child    = None

def greedy_tree(cls, x, grammar):
'''
Greedily merges bottom up using a feature extractor and grammar
'''
T, D   = x.shape
leafs  = [cls(x[i, :]) for i in range(T)]
while len(leafs) >= 2:
max_score       = float('-inf')
max_hypothesis  = None
max_idx         = None
for i in range(1, len(leafs)):
# difference of reconstruction of the center of the span of the subtree in the spectrogram is the score
# inspired by continuous bag of words
hypothesis = cls(leafs[i - 1].start, leafs[i].stop)
hypothesis.left_child     = leafs[i - 1]
hypothesis.right_child    = leafs[i]
representation, score     = grammar(hypothesis)
hypothesis.representation = representation
if score > max_score:
max_score      = score
max_hypothesis = hypothesis
max_idx        = i
leafs[max_idx - 1:max_idx + 1] = [max_hypothesis]
return leafs[0]


During learning, we collect the scores of the subtrees and use labels to decide if a merge was correct or not.

## Thursday, August 1, 2019

### Code Along Books: Bitcoin, Go Game, Ray Tracers

 A scene rendered with your own ray tracer

I like learning by doing. In my opinion the only real self test whether you understood something is a working prototype. For example, when learning more about the details of deep learning or xgboost, I implemented my own auto differentiation and gradient boosted trees.

Recently I found several books that guide you through implementations of fairly complex pieces of software. After finishing a chapter it feels like making progress on a project. So it is easier to follow along and you don't have to search for all the references needed yourself. You still have the benefit from learning by doing.

The book based projects I found are:
1. A ray tracer [1]
2. A go AI [2]
3. A bitcoin implementation [3]
AI in lisp and I wanted to implement my prolog in scala.

The previous projects on neural networks were based on Goodfellow's Deep Learning book [5] and many other references such as online classes so it was not as easy to code along. The same is true for the gradient boosting implementation which was mostly based on papers and books.

However the other projects simply involved writing code and the book was the only thing required to get into the topic. The quality all of these books show is that it is not a collection of separated code pieces that explain a topic but the topic's are explained by a coherent code based that you develop yourself form scratch.

### 1. A Ray Tracer

The book "The Ray Tracer Challenge: A Test Driven Guide to Your First 3D Rendered" [1], is giving you no actual code but behavior specifications (cucumber). The code has to be written by yourself.
Harder parts include pseudo code. The book covers the basic math library and then uses the defined primitives to construct more and more complex features.
The end result is a full ray tracer with shadows reflections and multiple primitives.

### 2. A Go AI

In the next book: "Deep Learning And The Game Of Go" [2], you build a go AI using the code provided by the authors. The books walks you through the implementation of the go board and rules first. Then the authors walk through classic game AI algorithms such as monte carlo tree search. In the end the authors introduce game state evaluation using convolutional neural networks using Keras.
The resulting system resembles an early version of AlphaGO

### 3. Bitcoin

The book "Programming Bitcoin" [3] is a mix between the first two. Lot's of code is provided in the form of skeletons but the reader still has to fill in the blanks for several assignments. The book
covers everything from basic cryptography to scripting bitcoin. The end result is a full bitcoin
implementation.

### Wrap up

The great thing about all the books mentioned is that all these books do cover the primitives needed too. You don't need to rely on a library but the authors walk you front to back.

My code bases are all on github. Some are work in progress [WIP] since
i did not finish the books yet. For example, in books 2 and 3 I only went through the first chapters:

### REFERENCES:

[1] Buck, "The Ray Tracer Challenge: A Test Driven Guide to Your First 3D Rendered", Pragmatic Bookshelf, 2019
[2] Pumperla, Furguson, "Deep Learning And The Game Of Go", Manning, 2019
[3] Song, "Programming Bitcoin: Learn How To Program Bitcoin From Scratch", Oreilly, 2019
[4] Norvig: "Paradigmes of Artificial Intelligence Programming: Case Studies in Lisp", Morgan Kaufmann, 1992
[5] Ian Goodfellow, "Deep Learning", MIT Press, 2016

## Tuesday, July 9, 2019

### WebRTC and Tensorflow.js

 WebCam image classification on web page

I will give a quick tutorial on how to connect a webcam in html5/WebRTC to Tensorflow.js for image classification. We will load a pre trained mobile net and then pass it frames from the camera. First, we define our html page and specify a video element in which we later stream the video from the camera,
a label field and also start the WebRtc + neural network code. The code can be found at this gist.

<!DOCTYPE html>
<html>
<title> Hello WebRTC </title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.0.0/dist/tf.min.js"></script>
<body>
<video id="cam" width="224" height="224" autoplay playsinline></video> <br/>
<script src="camnet.js"></script><br/>
<font color="red" size="5"> LABEL: </font><br/>
<div id="label"> </div>
</body>
</html>


 async function init() {
try {
const constraints = window.constraints = {audio: false, video: true};
onSuccess(stream, net);
} catch (e) {
onError(e);
}
}


The camera stream can be retrieved by the getUserMedia function. The onError method simply writes an error to the console. If we are successful, we get the video element from our dom and bind the stream to the video. We then start the detection loop with a method called onFrame.
function onSuccess(stream, net) {
const video = document.querySelector('video');
const videoTracks = stream.getVideoTracks();
console.log('Got stream with constraints:', constraints);

### References

[1] GAME ENGINE BLACK BOOK: WOLFENSTEIN 3D, 2ND EDITION, 2018