SlideShare a Scribd company logo
1 of 99
Download to read offline
1
@graphific
Roelof Pieters
Deep	
  Learning	
  for	
  Natural	
  
Language	
  Processing:	
  Word	
  
Embeddings
3	
  December	
  2015	
  

KTH
www.csc.kth.se/~roelof/
roelof@kth.se
Language Understanding
2
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
3
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
• the students went to class
DT NN VB P NN
4
Some of the challenges in Language Understanding:
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
5
Some of the challenges in Language Understanding:
[Karlgren 2014, NLP Sthlm Meetup]6
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
7
ML: Traditional Approach
1. Gather as much LABELED data as you can get
2. Throw some algorithms at it (mainly put in an SVM and
keep it at that)
3. If you actually have tried more algos: Pick the best
4. Spend hours hand engineering some features / feature
selection / dimensionality reduction (PCA, SVD, etc)
5. Repeat…
For each new problem/question::
8
Machine Learning for NLP
Data
Classic Approach: Data is fed into a learning algorithm:
Learning 

Algorithm
9
Machine Learning for NLP
some of the (many) treebank datasets
source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks
!
10
Penn Treebank
That’s a lot of “manual” work:
11
• the students went to class
DT NN VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
With a lot of issues:
Penn Treebank
12
Machine Learning for NLP
Learning 

Algorithm
Data
“Features”
Prediction
Prediction/

Classifier
train set
test set
13
Machine Learning for NLP
Learning 

Algorithm
“Features”
Prediction
Prediction/

Classifier
train set
test set
14
One Model rules them all ?



DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summarization Coreference resolution Discourse analysis
Machine translation Morphological segmentation Named entity recognition (NER)
Natural language generation
Natural language understanding
Optical character recognition (OCR)
Part-of-speech tagging
Parsing
Question answering
Relationship extraction
sentence boundary disambiguation
Sentiment analysis
Speech recognition
Speech segmentation
Topic segmentation and recognition
Word segmentation
Word sense disambiguation
Information retrieval (IR)
Information extraction (IE)
Speech processing
15
Deep Learning: Why for NLP ?
16
• What is the meaning of a word?

(Lexical semantics)
• What is the meaning of a sentence?

([Compositional] semantics)
• What is the meaning of a longer piece of text?
(Discourse semantics)
Semantics: Meaning
18
• NLP treats words mainly (rule-based/statistical
approaches at least) as atomic symbols:

• or in vector space:

• also known as “one hot” representation.
• Its problem ?
Word Representation
Love Candy Store
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND
Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !
19
Word Representation
20
• Structure corresponds to meaning:
Structure and Meaning
21
• Semantics
• Syntax
22
NLP: what can we work with?
• Language models define probability distributions
over (natural language) strings or sentences
• Joint and Conditional Probability
Language Model
23
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
24
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
25
Word senses
What is the meaning of words?
• Most words have many different senses:

dog = animal or sausage?
How are the meanings of different words related?
• - Specific relations between senses:

Animal is more general than dog.
• - Semantic fields:

money is related to bank
26
Word senses
Polysemy:
• A lexeme is polysemous if it has different related
senses
• bank = financial institution or building
Homonyms:
• Two lexemes are homonyms if their senses are
unrelated, but they happen to have the same spelling
and pronunciation
• bank = (financial) bank or (river) bank
27
Word senses: relations
Symmetric relations:
• Synonyms: couch/sofa

Two lemmas with the same sense
• Antonyms: cold/hot, rise/fall, in/out

Two lemmas with the opposite sense
Hierarchical relations:
• Hypernyms and hyponyms: pet/dog

The hyponym (dog) is more specific than the hypernym
(pet)
• Holonyms and meronyms: car/wheel

The meronym (wheel) is a part of the holonym (car)
28
Distributional representations
“You shall know a word by the company it keeps”

(J. R. Firth 1957)
One of the most successful ideas of modern
statistical NLP!
these words represent banking
• Hard (class based) clustering models
• Soft clustering models
29
Distributional hypothesis
He filled the wampimuk, passed it
around and we all drunk some
We found a little, hairy wampimuk
sleeping behind the tree
(McDonald & Ramscar 2001)
30
Distributional semantics
Landauer and Dumais (1997), Turney and Pantel (2010), …
31
Distributional semantics
Distributional meaning as co-occurrence vector:
32
Distributional representations
• Taking it further:
• Continuous word embeddings
• Combine vector space semantics with the
prediction of probabilistic models
• Words are represented as a dense vector:
Candy =
33
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
34
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born
35
• Can theoretically (given enough units) approximate
“any” function
• and fit to “any” kind of data
• Efficient for NLP: hidden layers can be used as word
lookup tables
• Dense distributed word vectors + efficient NN
training algorithms:
• Can scale to billions of words !
Why Neural Networks for NLP?
36
• Representation of words as continuous vectors has a
long history (Hinton et al. 1986; Rumelhart et al. 1986;
Elman 1990)
• First neural network language model: NNLM (Bengio
et al. 2001; Bengio et al. 2003) based on earlier ideas of
distributed representations for symbols (Hinton 1986)
How?
37
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born ?
…
38
Compositionality
Principle of compositionality:
the “meaning (vector) of a
complex expression (sentence)
is determined by:
— Gottlob Frege 

(1848 - 1925)
- the meanings of its constituent
expressions (words) and
- the rules (grammar) used to
combine them”
39
• How do we handle the compositionality of language in
our models?
40
Compositionality
• How do we handle the compositionality of language in
our models?
• Recursion :

the same operator (same parameters) is
applied repeatedly on different components
41
Compositionality
• How do we handle the compositionality of language in
our models?
• Option 1: Recurrent Neural Networks (RNN)
42
RNN 1: Recurrent Neural Networks
• How do we handle the compositionality of language in
our models?
• Option 2: Recursive Neural Networks (also
sometimes called RNN)
43
RNN 2: Recursive Neural Networks
• achieved SOTA in 2011 on
Language Modeling (WSJ AR
task) (Mikolov et al.,
INTERSPEECH 2011):
• and again at ASRU 2011:
44
Recurrent Neural Networks
“Comparison to other LMs shows that RNN
LMs are state of the art by a large margin.
Improvements inrease with more training data.”
“[ RNN LM trained on a] single core on 400M words in a few days,
with 1% absolute improvement in WER on state of the art setup”
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)

Recurrent neural network based language model
45
Recurrent Neural Networks
(simple recurrent 

neural network for LM)
input
hidden layer(s)
output layer
+ sigmoid activation function
+ softmax function:
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)

Recurrent neural network based language model
46
Recurrent Neural Networks
backpropagation through time
47
Recurrent Neural Networks
backpropagation through time
class based recurrent NN
[code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
• Recursive Neural
Network for LM (Socher
et al. 2011; Socher
2014)
• achieved SOTA on new
Stanford Sentiment
Treebank dataset (but
comparing it to many
other models):
Recursive Neural Network
48
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
Recursive Neural Tensor Network
49
code & info: http://www.socher.org/index.php/Main/
ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks
Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 

Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Recursive Neural Tensor Network
50
• RNN (Socher et al.
2011a)
Recursive Neural Network
51
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
Recursive Neural Network
52
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
• Recursive Neural
Tensor Network (RNTN)
(Socher et al. 2013)
Recursive Neural Network
53
• negation detection:
Recursive Neural Network
54
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
Recurrent NN for Vector Space
55
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
INDT NN PRP NN
Compositionality
56
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
PRP NN
Parse Tree
DT NN
Compositionality
57
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules” “meanings”
Compositionality
58
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
59
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
60
Recurrent NN for Vector Space
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/61
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/
62
Word Embeddings: Collobert & Weston (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) .
Natural Language Processing (almost) from Scratch
63
Multi-embeddings: Stanford (2012)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)

Improving Word Representations via Global Context and Multiple Word Prototypes
64
Linguistic Regularities: Mikolov (2013)
code & info: https://code.google.com/p/word2vec/
Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations
65
Word Embeddings for MT: Mikolov (2013)
Mikolov, T., Le, V. L., Sutskever, I. (2013) . 

Exploiting Similarities among Languages for Machine Translation
66
Word Embeddings for MT: Kiros (2014)
67
Recursive Deep Models & Sentiment: Socher (2013)
Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.
code & demo: http://nlp.stanford.edu/sentiment/index.html
68
Paragraph Vectors: Le & Mikolov (2014)
Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents
69
• add context (sentence, paragraph, document) to word
vectors during training
!
Results on Stanford Sentiment 

Treebank dataset:
Paragraph Vectors: Dai et al. (2014)
70
Paragraph Vectors: Dai et al. (2014)
71
Paragraph Vectors: Dai et al. (2014)
72
Global Vectors, GloVe: Stanford (2014)
Pennington, P., Socher, R., Manning,. D.M. (2014). 

GloVe: Global Vectors for Word Representation
code & demo: http://nlp.stanford.edu/projects/glove/
vs
results on the word analogy task
“similar accuracy”
73
Dependency-based Embeddings: Levy & Goldberg (2014)
Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings
code & demo: https://levyomer.wordpress.com/2014/04/25/dependency-based-word-
embeddings/
- Syntactic Dependency Context
Australian scientist discovers star with telescope
- Bag of Words (BoW) Context
0.3$
0.4$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$
Precision$
Recall$
“Dependency-based
embeddings have more
functional
similarities”
74
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
75
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
76
Wanna Play ?
LSTM
77
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
78
Attention
Gregor et al (2015) DRAW: A Recurrent Neural Network For Image
Generation (arxiv) (code)
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
80
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
81
Wanna Play ?
QA & Memory
82
• Memory Networks (Weston et al 2015)
• Dynamic Memory Network (Kumar et al 2015)
• Neural Turing Machine (Graves et al 2014)
Facebook
Metamind
DeepMind
Weston et al (2015) Memory Networks (arxiv)
QA & Memory
83
Yyer et al. (2014) A Neural Network for Factoid Question Answering over
Paragraphs (paper)
Wanna Play ?
QA & Memory
84
• Memory Networks (Weston et al 2015)
• Dynamic Memory Network (Kumar et al 2015)
• Neural Turing Machine (Graves et al 2014)
Facebook
Metamind
DeepMind
Zaremba & Sutskever (2015) Learning to Execute (arxiv)
Wanna Play ?
QA & Memory
85
Babl Dataset
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
86
Wanna Play ?
Text generation
87
Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural
Networks (blog)
Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural
Networks (blog)
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
91
Image-Text Embeddings
92
Socher et al (2013) Zero Shot Learning Through Cross-Modal Transfer (info)
Image-Captioning
• Andrej Karpathy Li Fei-Fei , 2015. 

Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code)
• Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A
Neural Image Caption Generator (arxiv)
• Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image
Caption Generation with Visual Attention (arxiv) (info) (code)
“A person riding a motorcycle on a dirt road.”???
Image-Captioning
“Two hockey players are fighting over the puck.”???
Image-Captioning
“A stop sign is flying in blue skies.”
“A herd of elephants flying in the blue skies.”
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan
Salakhutdinov, 2015. Generating Images from Captions
with Attention (arxiv) (examples)
Image-Captioning
• TensorFlow: Recently released library by Google. 

http://tensorflow.org
• Theano - CPU/GPU symbolic expression compiler in python (from
LISA lab at University of Montreal). http://deeplearning.net/software/
theano/
• Caffe - Computer Vision oriented Deep Learning framework:
caffe.berkeleyvision.org
• Torch - Matlab-like environment for state-of-the-art machine learning
algorithms in lua (from Ronan Collobert, Clement Farabet and Koray
Kavukcuoglu) http://torch.ch/
• more info: http://deeplearning.net/software links/
Wanna Play ? General Deep Learning
97
• RNNLM (Mikolov)

http://rnnlm.org
• NB-SVM

https://github.com/mesnilgr/nbsvm
• Word2Vec (skipgrams/cbow)

https://code.google.com/p/word2vec/ (original)

http://radimrehurek.com/gensim/models/word2vec.html (python)
• GloVe

http://nlp.stanford.edu/projects/glove/ (original)

https://github.com/maciejkula/glove-python (python)
• Socher et al / Stanford RNN Sentiment code:

http://nlp.stanford.edu/sentiment/code.html
• Deep Learning without Magic Tutorial:

http://nlp.stanford.edu/courses/NAACL2013/
Wanna Play ? NLP
98
Questions?
roelof@kth.se
www.csc.kth.se/~roelof/
99
Code & Papers:
Collaborative Open Computer Science
.com
@graphific

More Related Content

What's hot

An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text SummarizationTho Phan
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingsaurabhnarhe
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 

What's hot (20)

An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
NLP
NLPNLP
NLP
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 

Similar to Deep Learning for Natural Language Processing: Word Embeddings

Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningKv Sagar
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMariYam371004
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...Digital Classicist Seminar Berlin
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNetDomyoung Lee
 

Similar to Deep Learning for Natural Language Processing: Word Embeddings (20)

Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learning
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptx
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 

More from Roelof Pieters

Speculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureSpeculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureRoelof Pieters
 
AI assisted creativity
AI assisted creativity AI assisted creativity
AI assisted creativity Roelof Pieters
 
Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"Roelof Pieters
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleRoelof Pieters
 
Building a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineBuilding a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineRoelof Pieters
 
Multi-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiMulti-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiRoelof Pieters
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadRoelof Pieters
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationRoelof Pieters
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorRoelof Pieters
 
Graph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningGraph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningRoelof Pieters
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye viewRoelof Pieters
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferRoelof Pieters
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP HackRoelof Pieters
 

More from Roelof Pieters (20)

Speculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureSpeculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain future
 
AI assisted creativity
AI assisted creativity AI assisted creativity
AI assisted creativity
 
Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Building a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineBuilding a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) Machine
 
Multi-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiMulti-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative ai
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + Visualization
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Graph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningGraph, Data-science, and Deep Learning
Graph, Data-science, and Deep Learning
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transfer
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
 

Recently uploaded

RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptxerickamwana1
 
Internship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SEInternship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SESaleh Ibne Omar
 
proposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerproposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerkumenegertelayegrama
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Sebastiano Panichella
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityApp Ethena
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...Sebastiano Panichella
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxRoquia Salam
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitysandeepnani2260
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per MVidyaAdsule1
 
A Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air CoolerA Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air Coolerenquirieskenstar
 

Recently uploaded (17)

RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
 
Internship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SEInternship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SE
 
proposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerproposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeeger
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptx
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber security
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per M
 
A Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air CoolerA Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air Cooler
 

Deep Learning for Natural Language Processing: Word Embeddings

  • 1. 1 @graphific Roelof Pieters Deep  Learning  for  Natural   Language  Processing:  Word   Embeddings 3  December  2015  
 KTH www.csc.kth.se/~roelof/ roelof@kth.se
  • 3. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 3
  • 4. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN • the students went to class DT NN VB P NN 4 Some of the challenges in Language Understanding:
  • 5. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 5 Some of the challenges in Language Understanding:
  • 6. [Karlgren 2014, NLP Sthlm Meetup]6
  • 7. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 7
  • 8. ML: Traditional Approach 1. Gather as much LABELED data as you can get 2. Throw some algorithms at it (mainly put in an SVM and keep it at that) 3. If you actually have tried more algos: Pick the best 4. Spend hours hand engineering some features / feature selection / dimensionality reduction (PCA, SVD, etc) 5. Repeat… For each new problem/question:: 8
  • 9. Machine Learning for NLP Data Classic Approach: Data is fed into a learning algorithm: Learning 
 Algorithm 9
  • 10. Machine Learning for NLP some of the (many) treebank datasets source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks ! 10
  • 11. Penn Treebank That’s a lot of “manual” work: 11
  • 12. • the students went to class DT NN VB P NN • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN With a lot of issues: Penn Treebank 12
  • 13. Machine Learning for NLP Learning 
 Algorithm Data “Features” Prediction Prediction/
 Classifier train set test set 13
  • 14. Machine Learning for NLP Learning 
 Algorithm “Features” Prediction Prediction/
 Classifier train set test set 14
  • 15. One Model rules them all ?
 
 DL approaches have been successfully applied to: Deep Learning: Why for NLP ? Automatic summarization Coreference resolution Discourse analysis Machine translation Morphological segmentation Named entity recognition (NER) Natural language generation Natural language understanding Optical character recognition (OCR) Part-of-speech tagging Parsing Question answering Relationship extraction sentence boundary disambiguation Sentiment analysis Speech recognition Speech segmentation Topic segmentation and recognition Word segmentation Word sense disambiguation Information retrieval (IR) Information extraction (IE) Speech processing 15
  • 16. Deep Learning: Why for NLP ? 16
  • 17.
  • 18. • What is the meaning of a word?
 (Lexical semantics) • What is the meaning of a sentence?
 ([Compositional] semantics) • What is the meaning of a longer piece of text? (Discourse semantics) Semantics: Meaning 18
  • 19. • NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
 • or in vector space:
 • also known as “one hot” representation. • Its problem ? Word Representation Love Candy Store [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 ! 19
  • 21. • Structure corresponds to meaning: Structure and Meaning 21
  • 22. • Semantics • Syntax 22 NLP: what can we work with?
  • 23. • Language models define probability distributions over (natural language) strings or sentences • Joint and Conditional Probability Language Model 23
  • 24. • Language models define probability distributions over (natural language) strings or sentences Language Model 24
  • 25. • Language models define probability distributions over (natural language) strings or sentences Language Model 25
  • 26. Word senses What is the meaning of words? • Most words have many different senses:
 dog = animal or sausage? How are the meanings of different words related? • - Specific relations between senses:
 Animal is more general than dog. • - Semantic fields:
 money is related to bank 26
  • 27. Word senses Polysemy: • A lexeme is polysemous if it has different related senses • bank = financial institution or building Homonyms: • Two lexemes are homonyms if their senses are unrelated, but they happen to have the same spelling and pronunciation • bank = (financial) bank or (river) bank 27
  • 28. Word senses: relations Symmetric relations: • Synonyms: couch/sofa
 Two lemmas with the same sense • Antonyms: cold/hot, rise/fall, in/out
 Two lemmas with the opposite sense Hierarchical relations: • Hypernyms and hyponyms: pet/dog
 The hyponym (dog) is more specific than the hypernym (pet) • Holonyms and meronyms: car/wheel
 The meronym (wheel) is a part of the holonym (car) 28
  • 29. Distributional representations “You shall know a word by the company it keeps”
 (J. R. Firth 1957) One of the most successful ideas of modern statistical NLP! these words represent banking • Hard (class based) clustering models • Soft clustering models 29
  • 30. Distributional hypothesis He filled the wampimuk, passed it around and we all drunk some We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) 30
  • 31. Distributional semantics Landauer and Dumais (1997), Turney and Pantel (2010), … 31
  • 32. Distributional semantics Distributional meaning as co-occurrence vector: 32
  • 33. Distributional representations • Taking it further: • Continuous word embeddings • Combine vector space semantics with the prediction of probabilistic models • Words are represented as a dense vector: Candy = 33
  • 34. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: 34
  • 35. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born 35
  • 36. • Can theoretically (given enough units) approximate “any” function • and fit to “any” kind of data • Efficient for NLP: hidden layers can be used as word lookup tables • Dense distributed word vectors + efficient NN training algorithms: • Can scale to billions of words ! Why Neural Networks for NLP? 36
  • 37. • Representation of words as continuous vectors has a long history (Hinton et al. 1986; Rumelhart et al. 1986; Elman 1990) • First neural network language model: NNLM (Bengio et al. 2001; Bengio et al. 2003) based on earlier ideas of distributed representations for symbols (Hinton 1986) How? 37
  • 38. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born ? … 38
  • 39. Compositionality Principle of compositionality: the “meaning (vector) of a complex expression (sentence) is determined by: — Gottlob Frege 
 (1848 - 1925) - the meanings of its constituent expressions (words) and - the rules (grammar) used to combine them” 39
  • 40. • How do we handle the compositionality of language in our models? 40 Compositionality
  • 41. • How do we handle the compositionality of language in our models? • Recursion :
 the same operator (same parameters) is applied repeatedly on different components 41 Compositionality
  • 42. • How do we handle the compositionality of language in our models? • Option 1: Recurrent Neural Networks (RNN) 42 RNN 1: Recurrent Neural Networks
  • 43. • How do we handle the compositionality of language in our models? • Option 2: Recursive Neural Networks (also sometimes called RNN) 43 RNN 2: Recursive Neural Networks
  • 44. • achieved SOTA in 2011 on Language Modeling (WSJ AR task) (Mikolov et al., INTERSPEECH 2011): • and again at ASRU 2011: 44 Recurrent Neural Networks “Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.” “[ RNN LM trained on a] single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup” Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  • 45. 45 Recurrent Neural Networks (simple recurrent 
 neural network for LM) input hidden layer(s) output layer + sigmoid activation function + softmax function: Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  • 47. 47 Recurrent Neural Networks backpropagation through time class based recurrent NN [code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
  • 48. • Recursive Neural Network for LM (Socher et al. 2011; Socher 2014) • achieved SOTA on new Stanford Sentiment Treebank dataset (but comparing it to many other models): Recursive Neural Network 48 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 49. Recursive Neural Tensor Network 49 code & info: http://www.socher.org/index.php/Main/ ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 
 Parsing Natural Scenes and Natural Language with Recursive Neural Networks
  • 51. • RNN (Socher et al. 2011a) Recursive Neural Network 51 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 52. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) Recursive Neural Network 52 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 53. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) • Recursive Neural Tensor Network (RNTN) (Socher et al. 2013) Recursive Neural Network 53
  • 54. • negation detection: Recursive Neural Network 54 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 55. NP PP/IN NP DT NN PRP$ NN Parse Tree Recurrent NN for Vector Space 55
  • 56. NP PP/IN NP DT NN PRP$ NN Parse Tree INDT NN PRP NN Compositionality 56 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 57. NP IN NP PRP NN Parse Tree DT NN Compositionality 57 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 58. NP IN NP DT NN PRP NN PP NP (S / ROOT) “rules” “meanings” Compositionality 58 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 59. Vector Space + Word Embeddings: Socher 59 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 60. Vector Space + Word Embeddings: Socher 60 Recurrent NN for Vector Space
  • 61. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/61
  • 62. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/ 62
  • 63. Word Embeddings: Collobert & Weston (2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch 63
  • 64. Multi-embeddings: Stanford (2012) Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)
 Improving Word Representations via Global Context and Multiple Word Prototypes 64
  • 65. Linguistic Regularities: Mikolov (2013) code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations 65
  • 66. Word Embeddings for MT: Mikolov (2013) Mikolov, T., Le, V. L., Sutskever, I. (2013) . 
 Exploiting Similarities among Languages for Machine Translation 66
  • 67. Word Embeddings for MT: Kiros (2014) 67
  • 68. Recursive Deep Models & Sentiment: Socher (2013) Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. code & demo: http://nlp.stanford.edu/sentiment/index.html 68
  • 69. Paragraph Vectors: Le & Mikolov (2014) Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents 69 • add context (sentence, paragraph, document) to word vectors during training ! Results on Stanford Sentiment 
 Treebank dataset:
  • 70. Paragraph Vectors: Dai et al. (2014) 70
  • 71. Paragraph Vectors: Dai et al. (2014) 71
  • 72. Paragraph Vectors: Dai et al. (2014) 72
  • 73. Global Vectors, GloVe: Stanford (2014) Pennington, P., Socher, R., Manning,. D.M. (2014). 
 GloVe: Global Vectors for Word Representation code & demo: http://nlp.stanford.edu/projects/glove/ vs results on the word analogy task “similar accuracy” 73
  • 74. Dependency-based Embeddings: Levy & Goldberg (2014) Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings code & demo: https://levyomer.wordpress.com/2014/04/25/dependency-based-word- embeddings/ - Syntactic Dependency Context Australian scientist discovers star with telescope - Bag of Words (BoW) Context 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ Precision$ Recall$ “Dependency-based embeddings have more functional similarities” 74
  • 75. • LSTMS • Attention Wanna Play ? Recent breakthroughs 75
  • 76. • LSTMS • Attention Wanna Play ? Recent breakthroughs 76
  • 78. • LSTMS • Attention Wanna Play ? Recent breakthroughs 78
  • 79. Attention Gregor et al (2015) DRAW: A Recurrent Neural Network For Image Generation (arxiv) (code)
  • 80. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 80
  • 81. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 81
  • 82. Wanna Play ? QA & Memory 82 • Memory Networks (Weston et al 2015) • Dynamic Memory Network (Kumar et al 2015) • Neural Turing Machine (Graves et al 2014) Facebook Metamind DeepMind Weston et al (2015) Memory Networks (arxiv)
  • 83. QA & Memory 83 Yyer et al. (2014) A Neural Network for Factoid Question Answering over Paragraphs (paper)
  • 84. Wanna Play ? QA & Memory 84 • Memory Networks (Weston et al 2015) • Dynamic Memory Network (Kumar et al 2015) • Neural Turing Machine (Graves et al 2014) Facebook Metamind DeepMind Zaremba & Sutskever (2015) Learning to Execute (arxiv)
  • 85. Wanna Play ? QA & Memory 85 Babl Dataset
  • 86. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 86
  • 87. Wanna Play ? Text generation 87 Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)
  • 88.
  • 89.
  • 90. Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)
  • 91. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 91
  • 92. Image-Text Embeddings 92 Socher et al (2013) Zero Shot Learning Through Cross-Modal Transfer (info)
  • 93. Image-Captioning • Andrej Karpathy Li Fei-Fei , 2015. 
 Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code) • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A Neural Image Caption Generator (arxiv) • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arxiv) (info) (code)
  • 94. “A person riding a motorcycle on a dirt road.”??? Image-Captioning
  • 95. “Two hockey players are fighting over the puck.”??? Image-Captioning
  • 96. “A stop sign is flying in blue skies.” “A herd of elephants flying in the blue skies.” Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples) Image-Captioning
  • 97. • TensorFlow: Recently released library by Google. 
 http://tensorflow.org • Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/ theano/ • Caffe - Computer Vision oriented Deep Learning framework: caffe.berkeleyvision.org • Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/ • more info: http://deeplearning.net/software links/ Wanna Play ? General Deep Learning 97
  • 98. • RNNLM (Mikolov)
 http://rnnlm.org • NB-SVM
 https://github.com/mesnilgr/nbsvm • Word2Vec (skipgrams/cbow)
 https://code.google.com/p/word2vec/ (original)
 http://radimrehurek.com/gensim/models/word2vec.html (python) • GloVe
 http://nlp.stanford.edu/projects/glove/ (original)
 https://github.com/maciejkula/glove-python (python) • Socher et al / Stanford RNN Sentiment code:
 http://nlp.stanford.edu/sentiment/code.html • Deep Learning without Magic Tutorial:
 http://nlp.stanford.edu/courses/NAACL2013/ Wanna Play ? NLP 98