## Monday, June 25, 2018

### Keras: Convolutional LSTM

Stacking recurrent layers on top of convolutional layers can be used to generate sequential output (like text) from structured input (like images or audio) [1].

If we want to stack an LSTM on top of a convolutional layers, we can simply do so, but we need to
reshape the output of the convolutions to the LSTM's expected input. The code below shows an implementation in Keras:

T = 128
D = 64

N_FILTERS  = 32
LSTM_T     = int(T / 8)
LSTM_D     = int(D / 2)
LSTM_STATE = 128
POOL       = (8, 2)

i = Input(shape=(T, D, 1))                         # (None, 128, 64, 1)
x = Conv2D(N_FILTERS, (3, 3), padding = 'same')(i) # (None, 128, 64, 32)
x = MaxPooling2D(pool_size=POOL)(x)                # (None, 16, 32, 32)
x = Reshape((LSTM_T, LSTM_D * N_FILTERS))(x)       # (None, 16, 1024)
x = LSTM(LSTM_STATE, return_sequences=True)(x)     # (None, 16, 128)

model = Model(i, x)
model.summary()


In this example we want to learn the convolutional LSTM on sequences of length 128 with 64 dimensional samples. The first layer is a convolutional layer with 32 filters. So the outputs are 32
sequences, one for each filter. We pool the sequences with a (8, 2) window. So our 32 sequences are now of size (128 / 8 = 16,  64 / 2 = 32). Now we have to combine the dimensions and the filter responses into a single dimension of size (32 * 32 = 1024) so we can feed a sequence into the LSTM which requires a rank 2 ( or 3 with batch) tensor with the first dimension being the time step and the second each frame. Finally we add the LSTM layer.