In my
last post, I described recursive neural networks and how they can be used in PyTorch.
In this post, I will describe the recursive autoencoder and my implementation in TensorFlow 2.0 using the new eager execution functionality and the gradient tape.
What is a recursive autoencoder?
|
Figure from the paper[1] explaining auto encoders |
As with the regular autoencoder we build a neural network that decides if we merge two inputs. Instead of having a target, in the autoencoder case we simply reconstruct the input and merge the nodes with the lowest reconstruction error in a greedy manner.
Implementation
First, we define the auto encoder:
def merge_encoder(n_in):
a = tf.keras.layers.Input(n_in)
b = tf.keras.layers.Input(n_in)
c = tf.keras.layers.Concatenate()([a,b])
h = tf.keras.layers.Dense(n_in)(c)
o = tf.keras.layers.Dense(n_in * 2)(h)
merge = tf.keras.models.Model(inputs=[a, b], outputs=[h, c, o])
return merge
The model has two inputs that should be merged. These will be concatenated and fed through a dense layer resulting in the hidden representation.
The hidden representation is directly used to reconstruct the input.
In order to keep track of the merges, I define a node class.
class Node:
def __init__(self, i, embedding, score, l = None, r = None):
self.i = i
self.score = score
self.embedding = embedding
self.left = l
self.right = r
def print(self, offset=""):
print("{} {} {} {}".format(offset, self.i, self.score, np.mean(self.embeding)))
if self.left is not None and self.right is not None:
self.left.print(offset + "\t")
self.right.print(offset + "\t")
def merge(self, other, merger):
merged = merger([self.embedding, other.embedding])
h = merged[0]
c = merged[1]
y = merged[2]
score = tf.nn.l2_loss(y - c) + self.score + other.score
return Node(-1, h, score, self, other)
def ts2leafs(ts):
return [Node(i,ts[i], tf.constant(0.0)) for i in range(0, len(ts))]
A merge holds the embedding, an id for all leaf nodes as well as the merge score.
Merging nodes will involve computing the hidden representation for the new node and
computing the score from the reconstruction error. At the beginning every vector in a sequence
is simply a leaf node.
Now we can greedily merge the sequences by finding the minimum possible reconstruction error.
We keep merging until only one node is left.
def merge(x, m):
if len(x) > 1:
min_loss = float('inf')
min_node = None
min_i = 0
min_j = 0
for i in range(len(x)):
for j in range(len(x)):
if i < j:
node = x[i].merge(x[j], m)
if node.score < min_loss:
min_node = node
min_loss = node.score
min_i = i
min_j = j
next_x = x.copy()
next_x[min_i] = min_node
next_x = [next_x[idx] for idx in range(0, len(x)) if idx != min_j]
return merge(next_x, m)
else:
return x[0]
Now we can learn the model using TensorFlow's gradient tape which keeps track of all
computations during our merges, compute the gradient and update the model.
m = merge_encoder(10)
x = ts2leafs([np.ones((1, 10)) * i for i in range(0, 10)] + [np.ones((1, 10)) * i for i in range(0, 10)])
optimizer = tf.keras.optimizers.Adam()
for i in range(0, 100):
with tf.GradientTape(watch_accessed_variables=True) as tape:
tape.watch(m.variables)
node = merge(x, m)
print(node.score)
g = tape.gradient(node.score, m.variables)
optimizer.apply_gradients(zip(g, m.variables))
m.save('merger.h5')
My code can be found in this gist
REFERENCES