SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
 Detecting Misleading
Headlines in Online News 

Hands-on Experiences on Attention-based RNN
Kunwoo Park
24th June 2019
IBS deep learning summer school
Who am I
• Kunwoo Park (박건우)
• Post doc, Data Analytics, QCRI (2018 - present)
• PhD, School of Computing, KAIST (2018) 

with outstanding dissertation award
• Research interest
• Computational social science using machine learning
• Text style transfer using RNN and RL
2
This talk will..
• Help audience understand the attention mechanism for text
• Introduce a recent research effort on detecting misleading
news headlines using deep neural networks
• Explain the building blocks of the state-of-the-art model and
show how they are implemented in TensorFlow (1.x)
• Give a hand-on experience in implementing text classifier
using attention mechanism
3
clickbait
4
Target problem
• Detect incongruity between news headline and body text: 

A news headline does not correctly represent the story
5
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
Goal: Detecting headline incongruity
from the textual relationship between body text and headline
6
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
7
Input data
• Transform words into vocabulary indices
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
8
Define input layer in TF
• Using tf.placeholders
• Parameters
• data type: tf.int32
• shape: [None, self.max_words]
• name: used for debugging
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
9
Feed data into placeholders
• At the last end of computation graph: usually at optimizer
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
10
One-hot encoding
{“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,
“if”: 4,“what”: 5, “wouldn't”: 6, “yoga”: 7}
Vocabulary
11
[[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 1, 0, … ],
[1, 0, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 1, 0, 0, … ],
[0, 0, 0, 1, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 1, 0, 0, 0, … ],
[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 1, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 0, 1, … ]]
Drawbacks of one-hot
[[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 1, 0, … ],
[1, 0, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 1, 0, 0, … ],
[0, 0, 0, 1, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 1, 0, 0, 0, … ],
[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 1, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 0, 1, … ]]
{“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,“if”: 4,
“what”: 5, “wouldn't”: 6, “yoga”: 7, … “a”:1000000000}
Vocabulary
12
Word embedding
• A mapping of a discrete variable for each word to a fixed
dimensional vector of continuous numbers
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
13
Sequence length
×
Vocab size
Sequence length
×
Embedding size
• A mapping of a discrete variable for each word to a fixed
dimensional vector of continuous numbers
Word embedding
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
14
?
Embedding matrix
Sequence length
×
Vocab size
Sequence length
×
Embedding size
Training from scratch
[[0.01, 0.07],
[0.33, 0.68],
[0.23, 0.51],
[0.41, 0.38],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.01, 0.07],
[0.72, 0.13],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
One-hot input Embedding matrix
15
Vocab size
×
Embedding size
Sequence length
×
Vocab size
Sequence length
×
Embedding size
Embedded input
Training from scratch
16
Load pre-trained matrix
[[0.01, 0.07],
[0.33, 0.68],
[0.23, 0.51],
[0.41, 0.38],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.01, 0.07],
[0.72, 0.13],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
One-hot input Embedding matrix Embedded input
word2vec
glove
BERT
….
17
Vocab size
×
Embedding size
Sequence length
×
Vocab size
Sequence length
×
Embedding size
Load pre-trained matrix
18
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
19
Deep encoder
Deep neural network
20
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
Embedded input
Sequence length
×
Embedding size
[[0.752, 0.757, 0.587],
[0.645, 0.397, 0.618],
[0.777, 0.099, 0.938],
[0.367, 0.139, 0.150],
[0.341, 0.069, 0.398],
[0.415, 0.655, 0.467],
[0.935, 0.659, 0.321],
[0.875, 0.699, 0.967],
[0.734, 0.966, 0.205]]
Hidden representation
Sequence length
×
Hidden size
Which neural net can we use?
• Feedforward neural network
• Convolutional network
• Recurrent neural network
21
Recurrent neural network
• Efficient in modeling inputs with sequential dependencies 

(e.g., text, time-series, …)
• To make an output for each step, RNNs incorporate the current
input with what we have learned so far
https://colah.github.io/posts/2015-08-Understanding-LSTMs/22
x2
x3
x4 xt
h2
⋯
h3
h4 ht
x1
h1
Long-term dependencies
• “the clouds are in the sky“
• “I grew up in France … I speak fluent French”
23
LSTM
Vanilla
recurrent unit
LSTM
24
Cell state
• Kind of memory units that keep past information
• LSTM has an ability to add or remove information to the state
by special structures called gates
25
Forget gate layer
• Decide what information we’re going to throw away from the
cell state
• 1: “completely keep this”. 0: “completely get rid of this”
26
Taking input
• What new information we’re going to store in the cell state
• Input gate layer: sigmoid decides which values we’ll update
• tanh layer: creates a vector of candidate values
27
Update cell state
• Combine the old cell state with the new candidate value
through andft it
28
Decide output
• Output is the filtered version of cell state Ct
29
GRU
• Update gate: combination of forget gate and input gate
• Merge cell state and hidden state
30
Bi-directional RNN
• Combining two RNNs together: 

One RNN reads inputs from left to right and 

another RNN reads inputs from right to left
• Able to understand context better
https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd6631
How to build RNN in TF
1. Decide which cell you use for RNN
2. Decide the number of layers in RNN
3. Decide whether RNN is uni- or bi- directional
32
LSTM or GRU
33
Stacked RNN
34
Uni-directional RNN
• tf.nn.dynamic_rnn()
• outputs: the sequence of hidden states 

[batch_size, max_sequences, output_size]
• state: the final state 

[batch_size, output_size]
35
Bi-directional RNN
• outputs, states = (output_fw, output_bw), (state_fw, state_bw)
36
Some body text is too long..
should contain all necessary information
from the past over thousand steps
ht
x2
x3
x4 xt
h2
⋯
h3
h4 ht
x1
h1
37
A news article is hierarchical
38
Hierarchical RNN
Word-level RNN
Paragraph-level RNN
ht
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
x1
1 x2
1 x3
1
xt
1
h1
1
⋯
h2
1 h3
1
ht
1
⋯
x1
2 x2
2 x3
2
xt
2
h1
2
⋯
h2
2 h3
2
ht
2
x1
p x2
p x3
p xt
p
h1
p
⋯
h2
p h3
p ht
p
ht
1 ht
2 ht
p⋯
u1 u2
up
39
Hierarchical RNN
Word-level RNN
Paragraph-level RNN
ht
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
x1
1 x2
1 x3
1
xt
1
h1
1
⋯
h2
1 h3
1
ht
1
⋯
x1
2 x2
2 x3
2
xt
2
h1
2
⋯
h2
2 h3
2
ht
2
x1
p x2
p x3
p xt
p
h1
p
⋯
h2
p h3
p ht
p
ht
1 ht
2 ht
p⋯
u1 u2
up
The maximum length of RNN
can be reduced significantly
Therefore, we can train models with a
fewer number of parameters effectively
40
Word-level RNN
41
Paragraph-level RNN
42
What’s more?
43
• Across body text, some paragraphs have a strong signal
Neural Machine Translation
• RNN-based encoder-decoder architecture, known as seq2seq
44Sutskever et al., 2014, Cho et al., 2014
Attention mechanism in NMT
45
Source
(German)
Target
(English)
https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Attention mechanism in NMT
46https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Source
(German)
Target
(English)
Attention mechanism
47
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
Attention mechanism
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
RNN for headline (target) RNN for body text (source)
Alignment
Model
Weighted sum
uB
48
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
RNN for headline (target) RNN for body text (source)
Alignment model
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Weighted sum
uB
Alignment
Model
Alignment
Model
Alignment
Model
Alignment
Model
aH(s) = align(uH
, uB
s )
=
exp(score(uH
, uB
s )
∑s′
exp(score(uH, uB
s′)
• Calculate attention weights between each paragraph (source)
and headline (target)
49
uB
1 uB
2 uB
puH
RNN for headline (target) RNN for body text (source)
Alignment model
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Weighted sum
uB
Alignment
Model
Alignment
Model
Alignment
Model
Alignment
Model
50
uB
1 uB
2 uB
puH
• Score is a content-based function
(Luong et al. 2015)
RNN for headline (target) RNN for body text (source)
Context vector
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Context vector
uB
Alignment
Model
• Represents the body text with different attention weights
across paragraphs
uB
=
∑
s′
aH(s)uB
s′ Weighted sum
uB
51
uB
1 uB
2 uB
p
Alignment
Model
Attention in TF
• Using dot-product similarity
• bodytext_outputs: sequence of the hidden states
• headline_states: the last hidden state
52
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
53
Measure similarity
• : last hidden state of RNN for encoding headline
• : context vector that encodes body text
• : learnable similarity matrix, : bias term
• :
p(label) = σ((uH
)⊤
MuB
+ b)
uH
uB
M
σ
b
54
Measure similarity
p(label) = σ((uH
)⊤
MuB
+ b)
55
Define loss function
• cross-entropy: standard loss function for classification

: ground truth (0/1) : model outputy p(y)
56
Optimizer
• Gradient clipping to prevent for exploding gradient
57
Overfitting
Model Complexity
Error
OverfittingUnderfitting
58
How to prevent overfitting?
• Add more data! (most effective if possible)
• Data augmentation: add noises to input to better generalized
• Regularization: L1/L2, Dropout, Early stopping
• Reduce architecture complexity
59
Evaluation results
60
Demo
61Credit: Taegyun Kim
Dataset/code/paper
• https://github.com/david-yoon/detecting-incongruity
62
Attention for text classification
• Giving different weights over word sequences (Zhou et al., ACL 2016)
63
H = [h1, h2, ⋯, hT]
M = tanh(H)
α = softmax(wt
M)
r = HαT
Attention for text classification
• Focusing on important sentence representation, each of which
pay a different amount of attention to words (Yang et al., NAACL 2016)
64
Attention for text classification
• Transfer learning on Transformer language model, trained by
multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019)
65
Hands-on experience
• Target problem: sentiment analysis on IMDB review dataset
66
Link: https://bit.ly/2xbelke
Thank you
Kunwoo Park
@ IBS deep learning summer school

Weitere ähnliche Inhalte

Was ist angesagt?

Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOnSean Yu
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so WellChun-Ming Chang
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용현호 김
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개r-kor
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
 
Personalized news recommendation engine
Personalized news recommendation enginePersonalized news recommendation engine
Personalized news recommendation enginePrateek Sachdev
 
Tensorflow, deep learning and recurrent neural networks without a ph d
Tensorflow, deep learning and recurrent neural networks   without a ph dTensorflow, deep learning and recurrent neural networks   without a ph d
Tensorflow, deep learning and recurrent neural networks without a ph dDanielGinot
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design ThinkingYen-lung Tsai
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 

Was ist angesagt? (20)

Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
Personalized news recommendation engine
Personalized news recommendation enginePersonalized news recommendation engine
Personalized news recommendation engine
 
Tensorflow, deep learning and recurrent neural networks without a ph d
Tensorflow, deep learning and recurrent neural networks   without a ph dTensorflow, deep learning and recurrent neural networks   without a ph d
Tensorflow, deep learning and recurrent neural networks without a ph d
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design Thinking
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 

Ähnlich wie Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN

Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker DiarizationHONGJOO LEE
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...tdc-globalcode
 
Software version numbering - DSL of change
Software version numbering - DSL of changeSoftware version numbering - DSL of change
Software version numbering - DSL of changeSergii Shmarkatiuk
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for DevelopersJulien SIMON
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
8. Deep Learning.pdf
8. Deep Learning.pdf8. Deep Learning.pdf
8. Deep Learning.pdfJyoti Yadav
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

Ähnlich wie Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN (20)

Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
 
Software version numbering - DSL of change
Software version numbering - DSL of changeSoftware version numbering - DSL of change
Software version numbering - DSL of change
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
LSA algorithm
LSA algorithmLSA algorithm
LSA algorithm
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
8. Deep Learning.pdf
8. Deep Learning.pdf8. Deep Learning.pdf
8. Deep Learning.pdf
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

Mehr von Kunwoo Park

Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsKunwoo Park
 
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Kunwoo Park
 
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterKunwoo Park
 
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용Kunwoo Park
 
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구Kunwoo Park
 
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsKunwoo Park
 
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?Kunwoo Park
 
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터Kunwoo Park
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Kunwoo Park
 
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Kunwoo Park
 
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Kunwoo Park
 

Mehr von Kunwoo Park (12)

Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction Ratings
 
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
 
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on Twitter
 
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용
 
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
 
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGs
 
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?
 
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9
 
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7
 
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2
 

Kürzlich hochgeladen

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 

Kürzlich hochgeladen (20)

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 

Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN

  • 1.  Detecting Misleading Headlines in Online News 
 Hands-on Experiences on Attention-based RNN Kunwoo Park 24th June 2019 IBS deep learning summer school
  • 2. Who am I • Kunwoo Park (박건우) • Post doc, Data Analytics, QCRI (2018 - present) • PhD, School of Computing, KAIST (2018) 
 with outstanding dissertation award • Research interest • Computational social science using machine learning • Text style transfer using RNN and RL 2
  • 3. This talk will.. • Help audience understand the attention mechanism for text • Introduce a recent research effort on detecting misleading news headlines using deep neural networks • Explain the building blocks of the state-of-the-art model and show how they are implemented in TensorFlow (1.x) • Give a hand-on experience in implementing text classifier using attention mechanism 3
  • 5. Target problem • Detect incongruity between news headline and body text: 
 A news headline does not correctly represent the story 5
  • 6. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer Goal: Detecting headline incongruity from the textual relationship between body text and headline 6
  • 7. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 7
  • 8. Input data • Transform words into vocabulary indices headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 8
  • 9. Define input layer in TF • Using tf.placeholders • Parameters • data type: tf.int32 • shape: [None, self.max_words] • name: used for debugging headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 9
  • 10. Feed data into placeholders • At the last end of computation graph: usually at optimizer headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 10
  • 11. One-hot encoding {“believe”: 0, “do”: 1, “you”: 2, “happens”: 3, “if”: 4,“what”: 5, “wouldn't”: 6, “yoga”: 7} Vocabulary 11 [[0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 1, 0, … ], [1, 0, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 1, 0, 0, … ], [0, 0, 0, 1, 0, 0, 0, 0, … ], [0, 0, 0, 0, 1, 0, 0, 0, … ], [0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 1, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 0, 1, … ]]
  • 12. Drawbacks of one-hot [[0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 1, 0, … ], [1, 0, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 1, 0, 0, … ], [0, 0, 0, 1, 0, 0, 0, 0, … ], [0, 0, 0, 0, 1, 0, 0, 0, … ], [0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 1, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 0, 1, … ]] {“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,“if”: 4, “what”: 5, “wouldn't”: 6, “yoga”: 7, … “a”:1000000000} Vocabulary 12
  • 13. Word embedding • A mapping of a discrete variable for each word to a fixed dimensional vector of continuous numbers [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] 13 Sequence length × Vocab size Sequence length × Embedding size
  • 14. • A mapping of a discrete variable for each word to a fixed dimensional vector of continuous numbers Word embedding [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] 14 ? Embedding matrix Sequence length × Vocab size Sequence length × Embedding size
  • 15. Training from scratch [[0.01, 0.07], [0.33, 0.68], [0.23, 0.51], [0.41, 0.38], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.01, 0.07], [0.72, 0.13], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] One-hot input Embedding matrix 15 Vocab size × Embedding size Sequence length × Vocab size Sequence length × Embedding size Embedded input
  • 17. Load pre-trained matrix [[0.01, 0.07], [0.33, 0.68], [0.23, 0.51], [0.41, 0.38], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.01, 0.07], [0.72, 0.13], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] One-hot input Embedding matrix Embedded input word2vec glove BERT …. 17 Vocab size × Embedding size Sequence length × Vocab size Sequence length × Embedding size
  • 19. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 19
  • 20. Deep encoder Deep neural network 20 [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] Embedded input Sequence length × Embedding size [[0.752, 0.757, 0.587], [0.645, 0.397, 0.618], [0.777, 0.099, 0.938], [0.367, 0.139, 0.150], [0.341, 0.069, 0.398], [0.415, 0.655, 0.467], [0.935, 0.659, 0.321], [0.875, 0.699, 0.967], [0.734, 0.966, 0.205]] Hidden representation Sequence length × Hidden size
  • 21. Which neural net can we use? • Feedforward neural network • Convolutional network • Recurrent neural network 21
  • 22. Recurrent neural network • Efficient in modeling inputs with sequential dependencies 
 (e.g., text, time-series, …) • To make an output for each step, RNNs incorporate the current input with what we have learned so far https://colah.github.io/posts/2015-08-Understanding-LSTMs/22 x2 x3 x4 xt h2 ⋯ h3 h4 ht x1 h1
  • 23. Long-term dependencies • “the clouds are in the sky“ • “I grew up in France … I speak fluent French” 23
  • 25. Cell state • Kind of memory units that keep past information • LSTM has an ability to add or remove information to the state by special structures called gates 25
  • 26. Forget gate layer • Decide what information we’re going to throw away from the cell state • 1: “completely keep this”. 0: “completely get rid of this” 26
  • 27. Taking input • What new information we’re going to store in the cell state • Input gate layer: sigmoid decides which values we’ll update • tanh layer: creates a vector of candidate values 27
  • 28. Update cell state • Combine the old cell state with the new candidate value through andft it 28
  • 29. Decide output • Output is the filtered version of cell state Ct 29
  • 30. GRU • Update gate: combination of forget gate and input gate • Merge cell state and hidden state 30
  • 31. Bi-directional RNN • Combining two RNNs together: 
 One RNN reads inputs from left to right and 
 another RNN reads inputs from right to left • Able to understand context better https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd6631
  • 32. How to build RNN in TF 1. Decide which cell you use for RNN 2. Decide the number of layers in RNN 3. Decide whether RNN is uni- or bi- directional 32
  • 35. Uni-directional RNN • tf.nn.dynamic_rnn() • outputs: the sequence of hidden states 
 [batch_size, max_sequences, output_size] • state: the final state 
 [batch_size, output_size] 35
  • 36. Bi-directional RNN • outputs, states = (output_fw, output_bw), (state_fw, state_bw) 36
  • 37. Some body text is too long.. should contain all necessary information from the past over thousand steps ht x2 x3 x4 xt h2 ⋯ h3 h4 ht x1 h1 37
  • 38. A news article is hierarchical 38
  • 39. Hierarchical RNN Word-level RNN Paragraph-level RNN ht p = f(ht−1 p , xt p; θf ) up = g(up−1, ht p; θg) x1 1 x2 1 x3 1 xt 1 h1 1 ⋯ h2 1 h3 1 ht 1 ⋯ x1 2 x2 2 x3 2 xt 2 h1 2 ⋯ h2 2 h3 2 ht 2 x1 p x2 p x3 p xt p h1 p ⋯ h2 p h3 p ht p ht 1 ht 2 ht p⋯ u1 u2 up 39
  • 40. Hierarchical RNN Word-level RNN Paragraph-level RNN ht p = f(ht−1 p , xt p; θf ) up = g(up−1, ht p; θg) x1 1 x2 1 x3 1 xt 1 h1 1 ⋯ h2 1 h3 1 ht 1 ⋯ x1 2 x2 2 x3 2 xt 2 h1 2 ⋯ h2 2 h3 2 ht 2 x1 p x2 p x3 p xt p h1 p ⋯ h2 p h3 p ht p ht 1 ht 2 ht p⋯ u1 u2 up The maximum length of RNN can be reduced significantly Therefore, we can train models with a fewer number of parameters effectively 40
  • 43. What’s more? 43 • Across body text, some paragraphs have a strong signal
  • 44. Neural Machine Translation • RNN-based encoder-decoder architecture, known as seq2seq 44Sutskever et al., 2014, Cho et al., 2014
  • 45. Attention mechanism in NMT 45 Source (German) Target (English) https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
  • 46. Attention mechanism in NMT 46https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/ Source (German) Target (English)
  • 47. Attention mechanism 47 • In detecting incongruity, we can pay a different amount of attention for each paragraph
  • 48. Attention mechanism ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH RNN for headline (target) RNN for body text (source) Alignment Model Weighted sum uB 48 • In detecting incongruity, we can pay a different amount of attention for each paragraph
  • 49. RNN for headline (target) RNN for body text (source) Alignment model ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Weighted sum uB Alignment Model Alignment Model Alignment Model Alignment Model aH(s) = align(uH , uB s ) = exp(score(uH , uB s ) ∑s′ exp(score(uH, uB s′) • Calculate attention weights between each paragraph (source) and headline (target) 49 uB 1 uB 2 uB puH
  • 50. RNN for headline (target) RNN for body text (source) Alignment model ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Weighted sum uB Alignment Model Alignment Model Alignment Model Alignment Model 50 uB 1 uB 2 uB puH • Score is a content-based function (Luong et al. 2015)
  • 51. RNN for headline (target) RNN for body text (source) Context vector ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Context vector uB Alignment Model • Represents the body text with different attention weights across paragraphs uB = ∑ s′ aH(s)uB s′ Weighted sum uB 51 uB 1 uB 2 uB p Alignment Model
  • 52. Attention in TF • Using dot-product similarity • bodytext_outputs: sequence of the hidden states • headline_states: the last hidden state 52
  • 53. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 53
  • 54. Measure similarity • : last hidden state of RNN for encoding headline • : context vector that encodes body text • : learnable similarity matrix, : bias term • : p(label) = σ((uH )⊤ MuB + b) uH uB M σ b 54
  • 55. Measure similarity p(label) = σ((uH )⊤ MuB + b) 55
  • 56. Define loss function • cross-entropy: standard loss function for classification
 : ground truth (0/1) : model outputy p(y) 56
  • 57. Optimizer • Gradient clipping to prevent for exploding gradient 57
  • 59. How to prevent overfitting? • Add more data! (most effective if possible) • Data augmentation: add noises to input to better generalized • Regularization: L1/L2, Dropout, Early stopping • Reduce architecture complexity 59
  • 63. Attention for text classification • Giving different weights over word sequences (Zhou et al., ACL 2016) 63 H = [h1, h2, ⋯, hT] M = tanh(H) α = softmax(wt M) r = HαT
  • 64. Attention for text classification • Focusing on important sentence representation, each of which pay a different amount of attention to words (Yang et al., NAACL 2016) 64
  • 65. Attention for text classification • Transfer learning on Transformer language model, trained by multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019) 65
  • 66. Hands-on experience • Target problem: sentiment analysis on IMDB review dataset 66 Link: https://bit.ly/2xbelke
  • 67. Thank you Kunwoo Park @ IBS deep learning summer school