This slide is used for the tutorial in Deep Learning Summer School, held in IBS, Daejeon. Based on the recent effort on detecting misleading headlines through deep neural networks (Yoon et al., AAAI 2019), it explains how RNN and Attention mechanism works for text. Moreover, implementations based on TensorFlow 1.x are introduced.
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN
1. Detecting Misleading
Headlines in Online News
Hands-on Experiences on Attention-based RNN
Kunwoo Park
24th June 2019
IBS deep learning summer school
2. Who am I
• Kunwoo Park (박건우)
• Post doc, Data Analytics, QCRI (2018 - present)
• PhD, School of Computing, KAIST (2018)
with outstanding dissertation award
• Research interest
• Computational social science using machine learning
• Text style transfer using RNN and RL
2
3. This talk will..
• Help audience understand the attention mechanism for text
• Introduce a recent research effort on detecting misleading
news headlines using deep neural networks
• Explain the building blocks of the state-of-the-art model and
show how they are implemented in TensorFlow (1.x)
• Give a hand-on experience in implementing text classifier
using attention mechanism
3
5. Target problem
• Detect incongruity between news headline and body text:
A news headline does not correctly represent the story
5
6. Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
Goal: Detecting headline incongruity
from the textual relationship between body text and headline
6
7. Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
7
8. Input data
• Transform words into vocabulary indices
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
8
9. Define input layer in TF
• Using tf.placeholders
• Parameters
• data type: tf.int32
• shape: [None, self.max_words]
• name: used for debugging
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
9
10. Feed data into placeholders
• At the last end of computation graph: usually at optimizer
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
10
19. Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
19
21. Which neural net can we use?
• Feedforward neural network
• Convolutional network
• Recurrent neural network
21
22. Recurrent neural network
• Efficient in modeling inputs with sequential dependencies
(e.g., text, time-series, …)
• To make an output for each step, RNNs incorporate the current
input with what we have learned so far
https://colah.github.io/posts/2015-08-Understanding-LSTMs/22
x2
x3
x4 xt
h2
⋯
h3
h4 ht
x1
h1
25. Cell state
• Kind of memory units that keep past information
• LSTM has an ability to add or remove information to the state
by special structures called gates
25
26. Forget gate layer
• Decide what information we’re going to throw away from the
cell state
• 1: “completely keep this”. 0: “completely get rid of this”
26
27. Taking input
• What new information we’re going to store in the cell state
• Input gate layer: sigmoid decides which values we’ll update
• tanh layer: creates a vector of candidate values
27
28. Update cell state
• Combine the old cell state with the new candidate value
through andft it
28
30. GRU
• Update gate: combination of forget gate and input gate
• Merge cell state and hidden state
30
31. Bi-directional RNN
• Combining two RNNs together:
One RNN reads inputs from left to right and
another RNN reads inputs from right to left
• Able to understand context better
https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd6631
32. How to build RNN in TF
1. Decide which cell you use for RNN
2. Decide the number of layers in RNN
3. Decide whether RNN is uni- or bi- directional
32
35. Uni-directional RNN
• tf.nn.dynamic_rnn()
• outputs: the sequence of hidden states
[batch_size, max_sequences, output_size]
• state: the final state
[batch_size, output_size]
35
39. Hierarchical RNN
Word-level RNN
Paragraph-level RNN
ht
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
x1
1 x2
1 x3
1
xt
1
h1
1
⋯
h2
1 h3
1
ht
1
⋯
x1
2 x2
2 x3
2
xt
2
h1
2
⋯
h2
2 h3
2
ht
2
x1
p x2
p x3
p xt
p
h1
p
⋯
h2
p h3
p ht
p
ht
1 ht
2 ht
p⋯
u1 u2
up
39
40. Hierarchical RNN
Word-level RNN
Paragraph-level RNN
ht
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
x1
1 x2
1 x3
1
xt
1
h1
1
⋯
h2
1 h3
1
ht
1
⋯
x1
2 x2
2 x3
2
xt
2
h1
2
⋯
h2
2 h3
2
ht
2
x1
p x2
p x3
p xt
p
h1
p
⋯
h2
p h3
p ht
p
ht
1 ht
2 ht
p⋯
u1 u2
up
The maximum length of RNN
can be reduced significantly
Therefore, we can train models with a
fewer number of parameters effectively
40
46. Attention mechanism in NMT
46https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Source
(German)
Target
(English)
47. Attention mechanism
47
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
48. Attention mechanism
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
RNN for headline (target) RNN for body text (source)
Alignment
Model
Weighted sum
uB
48
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
49. RNN for headline (target) RNN for body text (source)
Alignment model
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Weighted sum
uB
Alignment
Model
Alignment
Model
Alignment
Model
Alignment
Model
aH(s) = align(uH
, uB
s )
=
exp(score(uH
, uB
s )
∑s′
exp(score(uH, uB
s′)
• Calculate attention weights between each paragraph (source)
and headline (target)
49
uB
1 uB
2 uB
puH
50. RNN for headline (target) RNN for body text (source)
Alignment model
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Weighted sum
uB
Alignment
Model
Alignment
Model
Alignment
Model
Alignment
Model
50
uB
1 uB
2 uB
puH
• Score is a content-based function
(Luong et al. 2015)
51. RNN for headline (target) RNN for body text (source)
Context vector
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Context vector
uB
Alignment
Model
• Represents the body text with different attention weights
across paragraphs
uB
=
∑
s′
aH(s)uB
s′ Weighted sum
uB
51
uB
1 uB
2 uB
p
Alignment
Model
52. Attention in TF
• Using dot-product similarity
• bodytext_outputs: sequence of the hidden states
• headline_states: the last hidden state
52
53. Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
53
54. Measure similarity
• : last hidden state of RNN for encoding headline
• : context vector that encodes body text
• : learnable similarity matrix, : bias term
• :
p(label) = σ((uH
)⊤
MuB
+ b)
uH
uB
M
σ
b
54
59. How to prevent overfitting?
• Add more data! (most effective if possible)
• Data augmentation: add noises to input to better generalized
• Regularization: L1/L2, Dropout, Early stopping
• Reduce architecture complexity
59
63. Attention for text classification
• Giving different weights over word sequences (Zhou et al., ACL 2016)
63
H = [h1, h2, ⋯, hT]
M = tanh(H)
α = softmax(wt
M)
r = HαT
64. Attention for text classification
• Focusing on important sentence representation, each of which
pay a different amount of attention to words (Yang et al., NAACL 2016)
64
65. Attention for text classification
• Transfer learning on Transformer language model, trained by
multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019)
65