Monday, June 25, 2018

Keras: Convolutional LSTM

Stacking recurrent layers on top of convolutional layers can be used to generate sequential output (like text) from structured input (like images or audio) [1].

If we want to stack an LSTM on top of a convolutional layers, we can simply do so, but we need to
reshape the output of the convolutions to the LSTM's expected input. The code below shows an implementation in Keras:
 
T = 128
D = 64

N_FILTERS  = 32
LSTM_T     = int(T / 8)
LSTM_D     = int(D / 2)
LSTM_STATE = 128
POOL       = (8, 2)

i = Input(shape=(T, D, 1))                         # (None, 128, 64, 1)   
x = Conv2D(N_FILTERS, (3, 3), padding = 'same')(i) # (None, 128, 64, 32)
x = MaxPooling2D(pool_size=POOL)(x)                # (None, 16, 32, 32) 
x = Reshape((LSTM_T, LSTM_D * N_FILTERS))(x)       # (None, 16, 1024) 
x = LSTM(LSTM_STATE, return_sequences=True)(x)     # (None, 16, 128)     


model = Model(i, x)
model.summary()


In this example we want to learn the convolutional LSTM on sequences of length 128 with 64 dimensional samples. The first layer is a convolutional layer with 32 filters. So the outputs are 32
sequences, one for each filter. We pool the sequences with a (8, 2) window. So our 32 sequences are now of size (128 / 8 = 16,  64 / 2 = 32). Now we have to combine the dimensions and the filter responses into a single dimension of size (32 * 32 = 1024) so we can feed a sequence into the LSTM which requires a rank 2 ( or 3 with batch) tensor with the first dimension being the time step and the second each frame. Finally we add the LSTM layer.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.