Slides presented during the Datascience Meetup @Sentiance. Based on the following paper:
"Improving Language Modeling using Densely Connected Recurrent Neural Networks".
See http://www.fredericgodin.com/publications/ for more info.
7. Fréderic Godin - Skip, residual and densely connected RNN architectures
Stacking recurrent neural networks
7
t=1 t=2 t=3 t=4
word1 word2 word3 word4
Deep in time
...Deep
in height
8. Fréderic Godin - Skip, residual and densely connected RNN architectures
Vanishing gradients
- When updating the weights using backpropagation, the
gradient tends to vanish with every neuron it crosses
- Often caused by the activation function
8
9. Fréderic Godin - Skip, residual and densely connected RNN architectures
Backpropagating through stacked RNNs
9
t=1 t=2 t=3 t=4
word1 word2 word3 word4
Backpropagation in time
...
Back-
propagation
in height
10. Fréderic Godin - Skip, residual and densely connected RNN architectures
Mitigating the vanishing gradient problem
In time: Long Short-Term Memory (LSTM)
10
In height:
̶ Many techniques exist in convolutional neural networks
̶ This talk: can we apply them in RNNs?
Key equation to model
depth in time
12. Fréderic Godin - Skip, residual and densely connected RNN architectures
Skip connection
12
Layer 2
Merge 1,2
Out 1
A direct connection between 2
non-consecutive layers
- No vanishing gradient
- 2 main flavors
- Concatenative skip
connections
- Additive skip connections
Layer 3
Layer 1
13. Fréderic Godin - Skip, residual and densely connected RNN architectures
(Concatenative) skip connection
13
Concatenate output of previous
layer and skip connection
Advantage:
Provides the output of first layer
to third layer without altering it
Disadvantage:
Doubles the input size
Layer 2
Out 2
Out 1
Layer 3
Layer 1
Out 1
14. Fréderic Godin - Skip, residual and densely connected RNN architectures
Additive skip connection (Residual connection)
Originates from image
classification domain
Residual connection is defined as:
14
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
“Residue”
Out 1 + 2 Layer 2 Out 1
15. Fréderic Godin - Skip, residual and densely connected RNN architectures
Residual connections do not
make sense in RNNs
Layer 2 also depends on h(t-1)
15
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
Additive skip connection (Residual connection)
in RNN
Additive skip connection
Out 1 + 2 Layer 2 Out 1
h(t-1) ht
y
x
16. Fréderic Godin - Skip, residual and densely connected RNN architectures 16
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
Additive skip connection
Sum output of previous layer and
skip connection
Advantage:
Input size to next layer does not
increase
Disadvantage:
Can create noisy input to next layer
17. Fréderic Godin - Skip, residual and densely connected RNN architectures
Densely connecting layers
Add a skip connection between every
output and every input of every layer
Advantage:
- Direct paths between every layer
- Hierarchy of features as input to
every layer
Disadvantage: (L-1)*L connections
17
Layer 2
Out 2
Out 1
Layer 3
Layer 1
Out 1
Out 3
Layer 4
Out 2Out 1
19. Fréderic Godin - Skip, residual and densely connected RNN architectures
Language modeling
Building a model which captures statistical characteristics of
a language:
In practice: predicting next word in a sentence
19
26. Fréderic Godin - Skip, residual and densely connected RNN architectures
Conclusion
Densely connecting all layers improves language modeling
performance
Avoids vanishing gradients
Creates hierarchy of features, available
to each layer
We use six times fewer parameters to obtain the same result
as a stacked LSTM
26
27. Fréderic Godin - Skip, residual and densely connected RNN architectures
Q&A
Also more details in our publication:
Fréderic Godin, Joni Dambre & Wesley De Neve
“Improving Language Modeling using Densely Connected
Recurrent Neural Networks”
https://arxiv.org/abs/1707.06130
27
28. Fréderic Godin
Ph.D. Researcher Deep Learning
IDLab
E frederic.godin@ugent.be
@frederic_godin
www.fredericgodin.com
idlab.technology / idlab.ugent.be