In this post, I will describe the recursive autoencoder and my implementation in TensorFlow 2.0 using the new eager execution functionality and the gradient tape.

### What is a recursive autoencoder?

Figure from the paper[1] explaining auto encoders |

As with the regular autoencoder we build a neural network that decides if we merge two inputs. Instead of having a target, in the autoencoder case we simply reconstruct the input and merge the nodes with the lowest reconstruction error in a greedy manner.

### Implementation

First, we define the auto encoder:

def merge_encoder(n_in): a = tf.keras.layers.Input(n_in) b = tf.keras.layers.Input(n_in) c = tf.keras.layers.Concatenate()([a,b]) h = tf.keras.layers.Dense(n_in)(c) o = tf.keras.layers.Dense(n_in * 2)(h) merge = tf.keras.models.Model(inputs=[a, b], outputs=[h, c, o]) return mergeThe model has two inputs that should be merged. These will be concatenated and fed through a dense layer resulting in the hidden representation. The hidden representation is directly used to reconstruct the input. In order to keep track of the merges, I define a node class.

class Node: def __init__(self, i, embedding, score, l = None, r = None): self.i = i self.score = score self.embedding = embedding self.left = l self.right = r def print(self, offset=""): print("{} {} {} {}".format(offset, self.i, self.score, np.mean(self.embeding))) if self.left is not None and self.right is not None: self.left.print(offset + "\t") self.right.print(offset + "\t") def merge(self, other, merger): merged = merger([self.embedding, other.embedding]) h = merged[0] c = merged[1] y = merged[2] score = tf.nn.l2_loss(y - c) + self.score + other.score return Node(-1, h, score, self, other) def ts2leafs(ts): return [Node(i,ts[i], tf.constant(0.0)) for i in range(0, len(ts))]A merge holds the embedding, an id for all leaf nodes as well as the merge score. Merging nodes will involve computing the hidden representation for the new node and computing the score from the reconstruction error. At the beginning every vector in a sequence is simply a leaf node. Now we can greedily merge the sequences by finding the minimum possible reconstruction error. We keep merging until only one node is left.

def merge(x, m): if len(x) > 1: min_loss = float('inf') min_node = None min_i = 0 min_j = 0 for i in range(len(x)): for j in range(len(x)): if i < j: node = x[i].merge(x[j], m) if node.score < min_loss: min_node = node min_loss = node.score min_i = i min_j = j next_x = x.copy() next_x[min_i] = min_node next_x = [next_x[idx] for idx in range(0, len(x)) if idx != min_j] return merge(next_x, m) else: return x[0]Now we can learn the model using TensorFlow's gradient tape which keeps track of all computations during our merges, compute the gradient and update the model.

m = merge_encoder(10) x = ts2leafs([np.ones((1, 10)) * i for i in range(0, 10)] + [np.ones((1, 10)) * i for i in range(0, 10)]) optimizer = tf.keras.optimizers.Adam() for i in range(0, 100): with tf.GradientTape(watch_accessed_variables=True) as tape: tape.watch(m.variables) node = merge(x, m) print(node.score) g = tape.gradient(node.score, m.variables) optimizer.apply_gradients(zip(g, m.variables)) m.save('merger.h5')