SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
 Detecting Misleading
Headlines in Online News 

Hands-on Experiences on Attention-based RNN
Kunwoo Park
24th June 2019
IBS deep learning summer school
Who am I
• Kunwoo Park (박건우)
• Post doc, Data Analytics, QCRI (2018 - present)
• PhD, School of Computing, KAIST (2018) 

with outstanding dissertation award
• Research interest
• Computational social science using machine learning
• Text style transfer using RNN and RL
This talk will..
• Help audience understand the attention mechanism for text
• Introduce a recent research effort on detecting misleading
news headlines using deep neural networks
• Explain the building blocks of the state-of-the-art model and
show how they are implemented in TensorFlow (1.x)
• Give a hand-on experience in implementing text classifier
using attention mechanism
Target problem
• Detect incongruity between news headline and body text: 

A news headline does not correctly represent the story
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Goal: Detecting headline incongruity
from the textual relationship between body text and headline
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Input data
• Transform words into vocabulary indices
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
Define input layer in TF
• Using tf.placeholders
• Parameters
• data type: tf.int32
• shape: [None, self.max_words]
• name: used for debugging
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
Feed data into placeholders
• At the last end of computation graph: usually at optimizer
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
One-hot encoding
{“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,
“if”: 4,“what”: 5, “wouldn't”: 6, “yoga”: 7}
[[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 1, 0, … ],
[1, 0, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 1, 0, 0, … ],
[0, 0, 0, 1, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 1, 0, 0, 0, … ],
[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 1, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 0, 1, … ]]
Drawbacks of one-hot
[[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 1, 0, … ],
[1, 0, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 1, 0, 0, … ],
[0, 0, 0, 1, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 1, 0, 0, 0, … ],
[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 1, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 0, 1, … ]]
{“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,“if”: 4,
“what”: 5, “wouldn't”: 6, “yoga”: 7, … “a”:1000000000}
Word embedding
• A mapping of a discrete variable for each word to a fixed
dimensional vector of continuous numbers
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
Sequence length
Vocab size
Sequence length
Embedding size
• A mapping of a discrete variable for each word to a fixed
dimensional vector of continuous numbers
Word embedding
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
Embedding matrix
Sequence length
Vocab size
Sequence length
Embedding size
Training from scratch
[[0.01, 0.07],
[0.33, 0.68],
[0.23, 0.51],
[0.41, 0.38],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.01, 0.07],
[0.72, 0.13],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
One-hot input Embedding matrix
Vocab size
Embedding size
Sequence length
Vocab size
Sequence length
Embedding size
Embedded input
Training from scratch
Load pre-trained matrix
[[0.01, 0.07],
[0.33, 0.68],
[0.23, 0.51],
[0.41, 0.38],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.01, 0.07],
[0.72, 0.13],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
One-hot input Embedding matrix Embedded input
Vocab size
Embedding size
Sequence length
Vocab size
Sequence length
Embedding size
Load pre-trained matrix
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Deep encoder
Deep neural network
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
Embedded input
Sequence length
Embedding size
[[0.752, 0.757, 0.587],
[0.645, 0.397, 0.618],
[0.777, 0.099, 0.938],
[0.367, 0.139, 0.150],
[0.341, 0.069, 0.398],
[0.415, 0.655, 0.467],
[0.935, 0.659, 0.321],
[0.875, 0.699, 0.967],
[0.734, 0.966, 0.205]]
Hidden representation
Sequence length
Hidden size
Which neural net can we use?
• Feedforward neural network
• Convolutional network
• Recurrent neural network
Recurrent neural network
• Efficient in modeling inputs with sequential dependencies 

(e.g., text, time-series, …)
• To make an output for each step, RNNs incorporate the current
input with what we have learned so far
x4 xt
h4 ht
Long-term dependencies
• “the clouds are in the sky“
• “I grew up in France … I speak fluent French”
recurrent unit
Cell state
• Kind of memory units that keep past information
• LSTM has an ability to add or remove information to the state
by special structures called gates
Forget gate layer
• Decide what information we’re going to throw away from the
cell state
• 1: “completely keep this”. 0: “completely get rid of this”
Taking input
• What new information we’re going to store in the cell state
• Input gate layer: sigmoid decides which values we’ll update
• tanh layer: creates a vector of candidate values
Update cell state
• Combine the old cell state with the new candidate value
through andft it
Decide output
• Output is the filtered version of cell state Ct
• Update gate: combination of forget gate and input gate
• Merge cell state and hidden state
Bi-directional RNN
• Combining two RNNs together: 

One RNN reads inputs from left to right and 

another RNN reads inputs from right to left
• Able to understand context better
How to build RNN in TF
1. Decide which cell you use for RNN
2. Decide the number of layers in RNN
3. Decide whether RNN is uni- or bi- directional
Stacked RNN
Uni-directional RNN
• tf.nn.dynamic_rnn()
• outputs: the sequence of hidden states 

[batch_size, max_sequences, output_size]
• state: the final state 

[batch_size, output_size]
Bi-directional RNN
• outputs, states = (output_fw, output_bw), (state_fw, state_bw)
Some body text is too long..
should contain all necessary information
from the past over thousand steps
x4 xt
h4 ht
A news article is hierarchical
Hierarchical RNN
Word-level RNN
Paragraph-level RNN
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
1 x2
1 x3
1 h3
2 x2
2 x3
2 h3
p x2
p x3
p xt
p h3
p ht
1 ht
2 ht
u1 u2
Hierarchical RNN
Word-level RNN
Paragraph-level RNN
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
1 x2
1 x3
1 h3
2 x2
2 x3
2 h3
p x2
p x3
p xt
p h3
p ht
1 ht
2 ht
u1 u2
The maximum length of RNN
can be reduced significantly
Therefore, we can train models with a
fewer number of parameters effectively
Word-level RNN
Paragraph-level RNN
What’s more?
• Across body text, some paragraphs have a strong signal
Neural Machine Translation
• RNN-based encoder-decoder architecture, known as seq2seq
44Sutskever et al., 2014, Cho et al., 2014
Attention mechanism in NMT
Attention mechanism in NMT
Attention mechanism
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
Attention mechanism
1 ht
2 ht
1 uB
2 uB
RNN for headline (target) RNN for body text (source)
Weighted sum
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
RNN for headline (target) RNN for body text (source)
Alignment model
1 ht
2 ht
1 uB
2 uB
Weighted sum
aH(s) = align(uH
, uB
s )
, uB
s )
exp(score(uH, uB
• Calculate attention weights between each paragraph (source)
and headline (target)
1 uB
2 uB
RNN for headline (target) RNN for body text (source)
Alignment model
1 ht
2 ht
1 uB
2 uB
Weighted sum
1 uB
2 uB
• Score is a content-based function
(Luong et al. 2015)
RNN for headline (target) RNN for body text (source)
Context vector
1 ht
2 ht
1 uB
2 uB
Context vector
• Represents the body text with different attention weights
across paragraphs
s′ Weighted sum
1 uB
2 uB
Attention in TF
• Using dot-product similarity
• bodytext_outputs: sequence of the hidden states
• headline_states: the last hidden state
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Measure similarity
• : last hidden state of RNN for encoding headline
• : context vector that encodes body text
• : learnable similarity matrix, : bias term
• :
p(label) = σ((uH
+ b)
Measure similarity
p(label) = σ((uH
+ b)
Define loss function
• cross-entropy: standard loss function for classification

: ground truth (0/1) : model outputy p(y)
• Gradient clipping to prevent for exploding gradient
Model Complexity
How to prevent overfitting?
• Add more data! (most effective if possible)
• Data augmentation: add noises to input to better generalized
• Regularization: L1/L2, Dropout, Early stopping
• Reduce architecture complexity
Evaluation results
61Credit: Taegyun Kim
Attention for text classification
• Giving different weights over word sequences (Zhou et al., ACL 2016)
H = [h1, h2, ⋯, hT]
M = tanh(H)
α = softmax(wt
r = HαT
Attention for text classification
• Focusing on important sentence representation, each of which
pay a different amount of attention to words (Yang et al., NAACL 2016)
Attention for text classification
• Transfer learning on Transformer language model, trained by
multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019)
Hands-on experience
• Target problem: sentiment analysis on IMDB review dataset
Thank you
Kunwoo Park
@ IBS deep learning summer school

Weitere ähnliche Inhalte

Was ist angesagt?

Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOnSean Yu
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so WellChun-Ming Chang
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용현호 김
方策勾配型強化学習の基礎と応用Ryo Iwaki
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개r-kor
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
Personalized news recommendation engine
Personalized news recommendation enginePersonalized news recommendation engine
Personalized news recommendation enginePrateek Sachdev
Tensorflow, deep learning and recurrent neural networks without a ph d
Tensorflow, deep learning and recurrent neural networks   without a ph dTensorflow, deep learning and recurrent neural networks   without a ph d
Tensorflow, deep learning and recurrent neural networks without a ph dDanielGinot
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design ThinkingYen-lung Tsai
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose

Was ist angesagt? (20)

Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
Personalized news recommendation engine
Personalized news recommendation enginePersonalized news recommendation engine
Personalized news recommendation engine
Tensorflow, deep learning and recurrent neural networks without a ph d
Tensorflow, deep learning and recurrent neural networks   without a ph dTensorflow, deep learning and recurrent neural networks   without a ph d
Tensorflow, deep learning and recurrent neural networks without a ph d
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design Thinking
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified

Ähnlich wie Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN

Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker DiarizationHONGJOO LEE
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...tdc-globalcode
Software version numbering - DSL of change
Software version numbering - DSL of changeSoftware version numbering - DSL of change
Software version numbering - DSL of changeSergii Shmarkatiuk
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for DevelopersJulien SIMON
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
8. Deep Learning.pdf
8. Deep Learning.pdf8. Deep Learning.pdf
8. Deep Learning.pdfJyoti Yadav
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Ähnlich wie Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN (20)

Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
Software version numbering - DSL of change
Software version numbering - DSL of changeSoftware version numbering - DSL of change
Software version numbering - DSL of change
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
LSA algorithm
LSA algorithmLSA algorithm
LSA algorithm
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
8. Deep Learning.pdf
8. Deep Learning.pdf8. Deep Learning.pdf
8. Deep Learning.pdf
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020

Mehr von Kunwoo Park

Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsKunwoo Park
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Kunwoo Park
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterKunwoo Park
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용Kunwoo Park
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구Kunwoo Park
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsKunwoo Park
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?Kunwoo Park
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터Kunwoo Park
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Kunwoo Park
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Kunwoo Park
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Kunwoo Park

Mehr von Kunwoo Park (12)

Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction Ratings
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on Twitter
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGs
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2

Kürzlich hochgeladen

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201

Kürzlich hochgeladen (20)

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx

Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN

  • 1.  Detecting Misleading Headlines in Online News 
 Hands-on Experiences on Attention-based RNN Kunwoo Park 24th June 2019 IBS deep learning summer school
  • 2. Who am I • Kunwoo Park (박건우) • Post doc, Data Analytics, QCRI (2018 - present) • PhD, School of Computing, KAIST (2018) 
 with outstanding dissertation award • Research interest • Computational social science using machine learning • Text style transfer using RNN and RL 2
  • 3. This talk will.. • Help audience understand the attention mechanism for text • Introduce a recent research effort on detecting misleading news headlines using deep neural networks • Explain the building blocks of the state-of-the-art model and show how they are implemented in TensorFlow (1.x) • Give a hand-on experience in implementing text classifier using attention mechanism 3
  • 5. Target problem • Detect incongruity between news headline and body text: 
 A news headline does not correctly represent the story 5
  • 6. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer Goal: Detecting headline incongruity from the textual relationship between body text and headline 6
  • 7. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 7
  • 8. Input data • Transform words into vocabulary indices headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 8
  • 9. Define input layer in TF • Using tf.placeholders • Parameters • data type: tf.int32 • shape: [None, self.max_words] • name: used for debugging headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 9
  • 10. Feed data into placeholders • At the last end of computation graph: usually at optimizer headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 10
  • 11. One-hot encoding {“believe”: 0, “do”: 1, “you”: 2, “happens”: 3, “if”: 4,“what”: 5, “wouldn't”: 6, “yoga”: 7} Vocabulary 11 [[0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 1, 0, … ], [1, 0, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 1, 0, 0, … ], [0, 0, 0, 1, 0, 0, 0, 0, … ], [0, 0, 0, 0, 1, 0, 0, 0, … ], [0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 1, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 0, 1, … ]]
  • 12. Drawbacks of one-hot [[0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 1, 0, … ], [1, 0, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 1, 0, 0, … ], [0, 0, 0, 1, 0, 0, 0, 0, … ], [0, 0, 0, 0, 1, 0, 0, 0, … ], [0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 1, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 0, 1, … ]] {“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,“if”: 4, “what”: 5, “wouldn't”: 6, “yoga”: 7, … “a”:1000000000} Vocabulary 12
  • 13. Word embedding • A mapping of a discrete variable for each word to a fixed dimensional vector of continuous numbers [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] 13 Sequence length × Vocab size Sequence length × Embedding size
  • 14. • A mapping of a discrete variable for each word to a fixed dimensional vector of continuous numbers Word embedding [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] 14 ? Embedding matrix Sequence length × Vocab size Sequence length × Embedding size
  • 15. Training from scratch [[0.01, 0.07], [0.33, 0.68], [0.23, 0.51], [0.41, 0.38], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.01, 0.07], [0.72, 0.13], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] One-hot input Embedding matrix 15 Vocab size × Embedding size Sequence length × Vocab size Sequence length × Embedding size Embedded input
  • 17. Load pre-trained matrix [[0.01, 0.07], [0.33, 0.68], [0.23, 0.51], [0.41, 0.38], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.01, 0.07], [0.72, 0.13], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] One-hot input Embedding matrix Embedded input word2vec glove BERT …. 17 Vocab size × Embedding size Sequence length × Vocab size Sequence length × Embedding size
  • 19. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 19
  • 20. Deep encoder Deep neural network 20 [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] Embedded input Sequence length × Embedding size [[0.752, 0.757, 0.587], [0.645, 0.397, 0.618], [0.777, 0.099, 0.938], [0.367, 0.139, 0.150], [0.341, 0.069, 0.398], [0.415, 0.655, 0.467], [0.935, 0.659, 0.321], [0.875, 0.699, 0.967], [0.734, 0.966, 0.205]] Hidden representation Sequence length × Hidden size
  • 21. Which neural net can we use? • Feedforward neural network • Convolutional network • Recurrent neural network 21
  • 22. Recurrent neural network • Efficient in modeling inputs with sequential dependencies 
 (e.g., text, time-series, …) • To make an output for each step, RNNs incorporate the current input with what we have learned so far x2 x3 x4 xt h2 ⋯ h3 h4 ht x1 h1
  • 23. Long-term dependencies • “the clouds are in the sky“ • “I grew up in France … I speak fluent French” 23
  • 25. Cell state • Kind of memory units that keep past information • LSTM has an ability to add or remove information to the state by special structures called gates 25
  • 26. Forget gate layer • Decide what information we’re going to throw away from the cell state • 1: “completely keep this”. 0: “completely get rid of this” 26
  • 27. Taking input • What new information we’re going to store in the cell state • Input gate layer: sigmoid decides which values we’ll update • tanh layer: creates a vector of candidate values 27
  • 28. Update cell state • Combine the old cell state with the new candidate value through andft it 28
  • 29. Decide output • Output is the filtered version of cell state Ct 29
  • 30. GRU • Update gate: combination of forget gate and input gate • Merge cell state and hidden state 30
  • 31. Bi-directional RNN • Combining two RNNs together: 
 One RNN reads inputs from left to right and 
 another RNN reads inputs from right to left • Able to understand context better
  • 32. How to build RNN in TF 1. Decide which cell you use for RNN 2. Decide the number of layers in RNN 3. Decide whether RNN is uni- or bi- directional 32
  • 35. Uni-directional RNN • tf.nn.dynamic_rnn() • outputs: the sequence of hidden states 
 [batch_size, max_sequences, output_size] • state: the final state 
 [batch_size, output_size] 35
  • 36. Bi-directional RNN • outputs, states = (output_fw, output_bw), (state_fw, state_bw) 36
  • 37. Some body text is too long.. should contain all necessary information from the past over thousand steps ht x2 x3 x4 xt h2 ⋯ h3 h4 ht x1 h1 37
  • 38. A news article is hierarchical 38
  • 39. Hierarchical RNN Word-level RNN Paragraph-level RNN ht p = f(ht−1 p , xt p; θf ) up = g(up−1, ht p; θg) x1 1 x2 1 x3 1 xt 1 h1 1 ⋯ h2 1 h3 1 ht 1 ⋯ x1 2 x2 2 x3 2 xt 2 h1 2 ⋯ h2 2 h3 2 ht 2 x1 p x2 p x3 p xt p h1 p ⋯ h2 p h3 p ht p ht 1 ht 2 ht p⋯ u1 u2 up 39
  • 40. Hierarchical RNN Word-level RNN Paragraph-level RNN ht p = f(ht−1 p , xt p; θf ) up = g(up−1, ht p; θg) x1 1 x2 1 x3 1 xt 1 h1 1 ⋯ h2 1 h3 1 ht 1 ⋯ x1 2 x2 2 x3 2 xt 2 h1 2 ⋯ h2 2 h3 2 ht 2 x1 p x2 p x3 p xt p h1 p ⋯ h2 p h3 p ht p ht 1 ht 2 ht p⋯ u1 u2 up The maximum length of RNN can be reduced significantly Therefore, we can train models with a fewer number of parameters effectively 40
  • 43. What’s more? 43 • Across body text, some paragraphs have a strong signal
  • 44. Neural Machine Translation • RNN-based encoder-decoder architecture, known as seq2seq 44Sutskever et al., 2014, Cho et al., 2014
  • 45. Attention mechanism in NMT 45 Source (German) Target (English)
  • 46. Attention mechanism in NMT 46 Source (German) Target (English)
  • 47. Attention mechanism 47 • In detecting incongruity, we can pay a different amount of attention for each paragraph
  • 48. Attention mechanism ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH RNN for headline (target) RNN for body text (source) Alignment Model Weighted sum uB 48 • In detecting incongruity, we can pay a different amount of attention for each paragraph
  • 49. RNN for headline (target) RNN for body text (source) Alignment model ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Weighted sum uB Alignment Model Alignment Model Alignment Model Alignment Model aH(s) = align(uH , uB s ) = exp(score(uH , uB s ) ∑s′ exp(score(uH, uB s′) • Calculate attention weights between each paragraph (source) and headline (target) 49 uB 1 uB 2 uB puH
  • 50. RNN for headline (target) RNN for body text (source) Alignment model ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Weighted sum uB Alignment Model Alignment Model Alignment Model Alignment Model 50 uB 1 uB 2 uB puH • Score is a content-based function (Luong et al. 2015)
  • 51. RNN for headline (target) RNN for body text (source) Context vector ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Context vector uB Alignment Model • Represents the body text with different attention weights across paragraphs uB = ∑ s′ aH(s)uB s′ Weighted sum uB 51 uB 1 uB 2 uB p Alignment Model
  • 52. Attention in TF • Using dot-product similarity • bodytext_outputs: sequence of the hidden states • headline_states: the last hidden state 52
  • 53. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 53
  • 54. Measure similarity • : last hidden state of RNN for encoding headline • : context vector that encodes body text • : learnable similarity matrix, : bias term • : p(label) = σ((uH )⊤ MuB + b) uH uB M σ b 54
  • 55. Measure similarity p(label) = σ((uH )⊤ MuB + b) 55
  • 56. Define loss function • cross-entropy: standard loss function for classification
 : ground truth (0/1) : model outputy p(y) 56
  • 57. Optimizer • Gradient clipping to prevent for exploding gradient 57
  • 59. How to prevent overfitting? • Add more data! (most effective if possible) • Data augmentation: add noises to input to better generalized • Regularization: L1/L2, Dropout, Early stopping • Reduce architecture complexity 59
  • 63. Attention for text classification • Giving different weights over word sequences (Zhou et al., ACL 2016) 63 H = [h1, h2, ⋯, hT] M = tanh(H) α = softmax(wt M) r = HαT
  • 64. Attention for text classification • Focusing on important sentence representation, each of which pay a different amount of attention to words (Yang et al., NAACL 2016) 64
  • 65. Attention for text classification • Transfer learning on Transformer language model, trained by multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019) 65
  • 66. Hands-on experience • Target problem: sentiment analysis on IMDB review dataset 66 Link:
  • 67. Thank you Kunwoo Park @ IBS deep learning summer school