SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Skip, residual and densely
connected RNN architectures
Frederic Godin - Ph.D. Researcher
Department of Electronics and Information Systems
IDLab
Fréderic Godin - Skip, residual and densely connected RNN architectures
Who is Fréderic?
Ph.D. Reseacher Deep Learning @ IDLab
Main interests:
̶ Sequence models
̶ Hybrid RNN/CNN models
Major application domain: Natural Language Processing
̶ Noisy data (E.g., Twitter data)
̶ Parsing tasks (E.g., Named Entity Recognition)
Minor application domain: Computer Vision
̶ Lung cancer detection (Kaggle competition 7th/1972)
(http://blog.kaggle.com/2017/05/16/data-science-bowl-2017-predicting-lung-cancer-solution-write-up-team-deep-breath/)
2
Fréderic Godin - Skip, residual and densely connected RNN architectures
Agenda
1. Recurrent neural networks
2. Skip, residual and dense connections
3. Dense connections in practice
3
Recurrent neural networks
4
Fréderic Godin - Skip, residual and densely connected RNN architectures
Recurrent neural networks
̶ Neural network with a cyclic connection
̶ Has memory
̶ Models variable-length sequences
5
Fréderic Godin - Skip, residual and densely connected RNN architectures 6
t=1 t=2 t=3 t=4
word1 word2 word3 word4E.g.:
Unfolded recurrent neural network
Fréderic Godin - Skip, residual and densely connected RNN architectures
Stacking recurrent neural networks
7
t=1 t=2 t=3 t=4
word1 word2 word3 word4
Deep in time
...Deep
in height
Fréderic Godin - Skip, residual and densely connected RNN architectures
Vanishing gradients
- When updating the weights using backpropagation, the
gradient tends to vanish with every neuron it crosses
- Often caused by the activation function
8
Fréderic Godin - Skip, residual and densely connected RNN architectures
Backpropagating through stacked RNNs
9
t=1 t=2 t=3 t=4
word1 word2 word3 word4
Backpropagation in time
...
Back-
propagation
in height
Fréderic Godin - Skip, residual and densely connected RNN architectures
Mitigating the vanishing gradient problem
In time: Long Short-Term Memory (LSTM)
10
In height:
̶ Many techniques exist in convolutional neural networks
̶ This talk: can we apply them in RNNs?
Key equation to model
depth in time
Skip, residual and dense
connections
11
Fréderic Godin - Skip, residual and densely connected RNN architectures
Skip connection
12
Layer 2
Merge 1,2
Out 1
A direct connection between 2
non-consecutive layers
- No vanishing gradient
- 2 main flavors
- Concatenative skip
connections
- Additive skip connections
Layer 3
Layer 1
Fréderic Godin - Skip, residual and densely connected RNN architectures
(Concatenative) skip connection
13
Concatenate output of previous
layer and skip connection
Advantage:
Provides the output of first layer
to third layer without altering it
Disadvantage:
Doubles the input size
Layer 2
Out 2
Out 1
Layer 3
Layer 1
Out 1
Fréderic Godin - Skip, residual and densely connected RNN architectures
Additive skip connection (Residual connection)
Originates from image
classification domain
Residual connection is defined as:
14
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
“Residue”
Out 1 + 2 Layer 2 Out 1
Fréderic Godin - Skip, residual and densely connected RNN architectures
Residual connections do not
make sense in RNNs
Layer 2 also depends on h(t-1)
15
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
Additive skip connection (Residual connection)
in RNN
Additive skip connection
Out 1 + 2 Layer 2 Out 1
h(t-1) ht
y
x
Fréderic Godin - Skip, residual and densely connected RNN architectures 16
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
Additive skip connection
Sum output of previous layer and
skip connection
Advantage:
Input size to next layer does not
increase
Disadvantage:
Can create noisy input to next layer
Fréderic Godin - Skip, residual and densely connected RNN architectures
Densely connecting layers
Add a skip connection between every
output and every input of every layer
Advantage:
- Direct paths between every layer
- Hierarchy of features as input to
every layer
Disadvantage: (L-1)*L connections
17
Layer 2
Out 2
Out 1
Layer 3
Layer 1
Out 1
Out 3
Layer 4
Out 2Out 1
Densely connected layers
in practice
18
Fréderic Godin - Skip, residual and densely connected RNN architectures
Language modeling
Building a model which captures statistical characteristics of
a language:
In practice: predicting next word in a sentence
19
Fréderic Godin - Skip, residual and densely connected RNN architectures
Example architecture
20
word2 word3 word4 word5
word1 word2 word3 word4
...
Classification layer
LSTM
LSTM
Embedding
layer
Fréderic Godin - Skip, residual and densely connected RNN architectures
Training details
21
Stochastic Gradient Descent with learning scheme
Uniform initialization [-0.05:0.05]
Dropout with probability 0.6
Fréderic Godin - Skip, residual and densely connected RNN architectures
Experimental results
22
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
Stacked LSTM
200 2 5M 100.9
200 3 5M 108.8
350 2 9M 87.9
Densely Connected LSTM
200 2 9M 80.4
200 3 11M 78.5
200 4 14M 76.9
Lower perplexity is better
Fréderic Godin - Skip, residual and densely connected RNN architectures
Character-to-word language modeling
23
word2 word3 word4 word5
word1 word2 word3 word4
...
Classification layer
LSTM
LSTM
Highway layer
ConvNet
Embedding layer
Fréderic Godin - Skip, residual and densely connected RNN architectures
Experimental results
24
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
CharCNN (Kim et al. 2016) 650 2 19M 78.9
Densely Connected LSTM
200 3 11M 78.5
200 4 14M 76.9
Densely Connected CharCNN* 200 4 20M 74.6
*Not published
Lower perplexity is better
Conclusion
25
Fréderic Godin - Skip, residual and densely connected RNN architectures
Conclusion
Densely connecting all layers improves language modeling
performance
Avoids vanishing gradients
Creates hierarchy of features, available
to each layer
We use six times fewer parameters to obtain the same result
as a stacked LSTM
26
Fréderic Godin - Skip, residual and densely connected RNN architectures
Q&A
Also more details in our publication:
Fréderic Godin, Joni Dambre & Wesley De Neve
“Improving Language Modeling using Densely Connected
Recurrent Neural Networks”
https://arxiv.org/abs/1707.06130
27
Fréderic Godin
Ph.D. Researcher Deep Learning
IDLab
E frederic.godin@ugent.be
@frederic_godin
www.fredericgodin.com
idlab.technology / idlab.ugent.be

Weitere ähnliche Inhalte

Was ist angesagt?

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksYoonho Lee
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)홍배 김
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLRtaeseon ryu
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
210523 swin transformer v1.5
210523 swin transformer v1.5210523 swin transformer v1.5
210523 swin transformer v1.5taeseon ryu
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsGDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsColin Barré-Brisebois
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flowsGrigoris C
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1MoonLightMS
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Pr083 Non-local Neural Networks
Pr083 Non-local Neural NetworksPr083 Non-local Neural Networks
Pr083 Non-local Neural NetworksTaeoh Kim
 
SinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageSinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageJishnu P
 
Graph Neural Networks.pptx
Graph Neural Networks.pptxGraph Neural Networks.pptx
Graph Neural Networks.pptxKumar Iyer
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 

Was ist angesagt? (20)

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
210523 swin transformer v1.5
210523 swin transformer v1.5210523 swin transformer v1.5
210523 swin transformer v1.5
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsGDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Pr083 Non-local Neural Networks
Pr083 Non-local Neural NetworksPr083 Non-local Neural Networks
Pr083 Non-local Neural Networks
 
SinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageSinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural Image
 
Graph Neural Networks.pptx
Graph Neural Networks.pptxGraph Neural Networks.pptx
Graph Neural Networks.pptx
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 

Ähnlich wie Skip, residual and densely connected RNN architectures

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networksfgodin
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptxYanhuaSi
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESREPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESRamnandan Krishnamurthy
 
Resnet.pdf
Resnet.pdfResnet.pdf
Resnet.pdfYanhuaSi
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Benyamin Moadab
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...fgodin
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 

Ähnlich wie Skip, residual and densely connected RNN architectures (20)

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
06svenss
06svenss06svenss
06svenss
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image Description
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networks
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESREPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
 
Resnet.pdf
Resnet.pdfResnet.pdf
Resnet.pdf
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...
 
Human parsing
Human parsingHuman parsing
Human parsing
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 

Kürzlich hochgeladen

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Kürzlich hochgeladen (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Skip, residual and densely connected RNN architectures

  • 1. Skip, residual and densely connected RNN architectures Frederic Godin - Ph.D. Researcher Department of Electronics and Information Systems IDLab
  • 2. Fréderic Godin - Skip, residual and densely connected RNN architectures Who is Fréderic? Ph.D. Reseacher Deep Learning @ IDLab Main interests: ̶ Sequence models ̶ Hybrid RNN/CNN models Major application domain: Natural Language Processing ̶ Noisy data (E.g., Twitter data) ̶ Parsing tasks (E.g., Named Entity Recognition) Minor application domain: Computer Vision ̶ Lung cancer detection (Kaggle competition 7th/1972) (http://blog.kaggle.com/2017/05/16/data-science-bowl-2017-predicting-lung-cancer-solution-write-up-team-deep-breath/) 2
  • 3. Fréderic Godin - Skip, residual and densely connected RNN architectures Agenda 1. Recurrent neural networks 2. Skip, residual and dense connections 3. Dense connections in practice 3
  • 5. Fréderic Godin - Skip, residual and densely connected RNN architectures Recurrent neural networks ̶ Neural network with a cyclic connection ̶ Has memory ̶ Models variable-length sequences 5
  • 6. Fréderic Godin - Skip, residual and densely connected RNN architectures 6 t=1 t=2 t=3 t=4 word1 word2 word3 word4E.g.: Unfolded recurrent neural network
  • 7. Fréderic Godin - Skip, residual and densely connected RNN architectures Stacking recurrent neural networks 7 t=1 t=2 t=3 t=4 word1 word2 word3 word4 Deep in time ...Deep in height
  • 8. Fréderic Godin - Skip, residual and densely connected RNN architectures Vanishing gradients - When updating the weights using backpropagation, the gradient tends to vanish with every neuron it crosses - Often caused by the activation function 8
  • 9. Fréderic Godin - Skip, residual and densely connected RNN architectures Backpropagating through stacked RNNs 9 t=1 t=2 t=3 t=4 word1 word2 word3 word4 Backpropagation in time ... Back- propagation in height
  • 10. Fréderic Godin - Skip, residual and densely connected RNN architectures Mitigating the vanishing gradient problem In time: Long Short-Term Memory (LSTM) 10 In height: ̶ Many techniques exist in convolutional neural networks ̶ This talk: can we apply them in RNNs? Key equation to model depth in time
  • 11. Skip, residual and dense connections 11
  • 12. Fréderic Godin - Skip, residual and densely connected RNN architectures Skip connection 12 Layer 2 Merge 1,2 Out 1 A direct connection between 2 non-consecutive layers - No vanishing gradient - 2 main flavors - Concatenative skip connections - Additive skip connections Layer 3 Layer 1
  • 13. Fréderic Godin - Skip, residual and densely connected RNN architectures (Concatenative) skip connection 13 Concatenate output of previous layer and skip connection Advantage: Provides the output of first layer to third layer without altering it Disadvantage: Doubles the input size Layer 2 Out 2 Out 1 Layer 3 Layer 1 Out 1
  • 14. Fréderic Godin - Skip, residual and densely connected RNN architectures Additive skip connection (Residual connection) Originates from image classification domain Residual connection is defined as: 14 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 “Residue” Out 1 + 2 Layer 2 Out 1
  • 15. Fréderic Godin - Skip, residual and densely connected RNN architectures Residual connections do not make sense in RNNs Layer 2 also depends on h(t-1) 15 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 Additive skip connection (Residual connection) in RNN Additive skip connection Out 1 + 2 Layer 2 Out 1 h(t-1) ht y x
  • 16. Fréderic Godin - Skip, residual and densely connected RNN architectures 16 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 Additive skip connection Sum output of previous layer and skip connection Advantage: Input size to next layer does not increase Disadvantage: Can create noisy input to next layer
  • 17. Fréderic Godin - Skip, residual and densely connected RNN architectures Densely connecting layers Add a skip connection between every output and every input of every layer Advantage: - Direct paths between every layer - Hierarchy of features as input to every layer Disadvantage: (L-1)*L connections 17 Layer 2 Out 2 Out 1 Layer 3 Layer 1 Out 1 Out 3 Layer 4 Out 2Out 1
  • 19. Fréderic Godin - Skip, residual and densely connected RNN architectures Language modeling Building a model which captures statistical characteristics of a language: In practice: predicting next word in a sentence 19
  • 20. Fréderic Godin - Skip, residual and densely connected RNN architectures Example architecture 20 word2 word3 word4 word5 word1 word2 word3 word4 ... Classification layer LSTM LSTM Embedding layer
  • 21. Fréderic Godin - Skip, residual and densely connected RNN architectures Training details 21 Stochastic Gradient Descent with learning scheme Uniform initialization [-0.05:0.05] Dropout with probability 0.6
  • 22. Fréderic Godin - Skip, residual and densely connected RNN architectures Experimental results 22 Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 Stacked LSTM 200 2 5M 100.9 200 3 5M 108.8 350 2 9M 87.9 Densely Connected LSTM 200 2 9M 80.4 200 3 11M 78.5 200 4 14M 76.9 Lower perplexity is better
  • 23. Fréderic Godin - Skip, residual and densely connected RNN architectures Character-to-word language modeling 23 word2 word3 word4 word5 word1 word2 word3 word4 ... Classification layer LSTM LSTM Highway layer ConvNet Embedding layer
  • 24. Fréderic Godin - Skip, residual and densely connected RNN architectures Experimental results 24 Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 CharCNN (Kim et al. 2016) 650 2 19M 78.9 Densely Connected LSTM 200 3 11M 78.5 200 4 14M 76.9 Densely Connected CharCNN* 200 4 20M 74.6 *Not published Lower perplexity is better
  • 26. Fréderic Godin - Skip, residual and densely connected RNN architectures Conclusion Densely connecting all layers improves language modeling performance Avoids vanishing gradients Creates hierarchy of features, available to each layer We use six times fewer parameters to obtain the same result as a stacked LSTM 26
  • 27. Fréderic Godin - Skip, residual and densely connected RNN architectures Q&A Also more details in our publication: Fréderic Godin, Joni Dambre & Wesley De Neve “Improving Language Modeling using Densely Connected Recurrent Neural Networks” https://arxiv.org/abs/1707.06130 27
  • 28. Fréderic Godin Ph.D. Researcher Deep Learning IDLab E frederic.godin@ugent.be @frederic_godin www.fredericgodin.com idlab.technology / idlab.ugent.be