SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Open-ended
Visual Question-Answering
[thesis][web][code]
Issey Masuda Mora Santiago Pascual de la PuenteXavier Giró i Nieto
Roadmap
Introduction Related
Work
Methodology Results Conclusions Future
work
2
Introduction Related
Work
Methodology Results Conclusions Future
Work
Introduction
3
Visual Question-Answering
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). Vqa: Visual question
answering. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2425-2433).
4
Predict the answer of
a given question
related to an image
5
Visual Question-Answering: Types
6
Real images Abstract scenes
Multi-Choice
Open-ended
Q: Does it
appear to be
rainy?
A: no
Q: What is just
under the tree?
A: a ball
Q: How
many slices
of pizza are
there?
A: 1, 2, 3, 4
Q: What is for
desert?
A: cake, ice
cream,
cheesecake, pie
Example
7
Question: What is bobbing in the water other than
the boats?
Answer: buoys
Motivation
8
New visual Turing test
Motivation: AI research
● Multidisciplinary tasks
● Models able to perform more
complex activities
● Different sub-problems tackled at
once
9
Computer Vision
Knowledge
Representation
and Reasoning
Natural
Language
Processing
Introduction Related
Work
Methodology Results Conclusions Future
Work
Related Work
10
Deep Learning
11Credit: Google
VQA: Common approach
12
Visual
representation
Textual
representation
Predict answerMerge
Question
What object is flying?
Answer
Kite
CNN
Word/sentence
embedding + LSTM
Tools: Convolutional Neural Networks (CNN)
13
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In
Advances in neural information processing systems (pp. 1097-1105).
AlexNet
Tools: Word and Sentence embeddings
14
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases
and their compositionality. InAdvances in neural information processing systems (pp. 3111-3119).
Experiments from: Socher et. al. (2013b) and Collbert et. al. (2011)
King Man- Woman+ Queen=
Tools: Long Short-Term Memory networks (LSTM)
15
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Introduction Related
Work
Methodology Results Conclusions Future
Work
Methodology
16
First steps: Text-based QA
17
Extending text-based QA for VQA
18
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556.
Substitute VGG-16 with KCNN
19
Liu, Z. (2015). Kernelized Deep Convolutional Neural Network for Describing Complex Images. arXiv preprint arXiv:
1509.04581.
Sentence embedding and image projection
20
Image
Question
Answer
Introduction Related
Work
Methodology Results Conclusions Future
Work
Results
21
VQA Dataset: Real Images, Open-ended questions
22
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). Vqa: Visual question
answering. CVPR 2015.
1 (image) x 3 (questions) x 10 (answers)
Evaluation
23
Metric: Script:
● Characters to lowercase
● Remove periods (unless decimal
periods)
● Number words to digits
● Remove articles
● Add apostrophe to contractions
● Replace punctuation with space
VQA Challenge
24
53.62%CVPR2016 VQA Challenge
Real Images Open-ended, test-standard dataset partition
25
Results in detail
26
VALIDATION SET TEST SET
Model Yes/No Number Other Overall Yes/No Number Other Overall
Model 1 71.82 23.79 27.99 43.87 71.62 28.76 29.32 46.70
Model 3 75.02 28.60 29.30 46.32 - - - -
Model 2 75.62 31.81 28.11 46.36 - - - -
Model 5 78.15 32.79 33.91 50.32 78.15 36.20 35.26 53.03
Model 4 78.73 32.82 35.5 51.34 78.02 35.68 36.54 53.62
Results in context
27
100%0%
Humans
83.30%
UC Berkeley
& Sony
66.47%
Baseline
LSTM&CNN
54.06%
Baseline Nearest
neighbor
42.85%
Baseline Prior per
question type
37.47%
Baseline All yes
29.88%
Ours
53.62%
Comparison with the baseline
Our model
● Single word answer
● Generate answers
28
Baseline
● Multi word answers (hardcoded)
● Classify over the 1000 most common
answers
Qualitative results: I
29
Qualitative results: II
30
Deep Python Project
31
https://github.com/imatge-upc/vqa-2016-cvprw
Research contribution: Extended abstract
32
VQA workshop, CVPR 2016
Research controbution: Extended abstract - Poster
33
… ticket to Las Vegas 34
35Presenting our poster and extended abstract at CVPR 2016, Las Vegas, USA
VQA Challenge statistics: Answering method
36
Introduction Related
Work
Methodology Results Conclusions Future
Work
Conclusions
37
Conclusion
38
✓ Present to VQA Challenge,
CVPR 2016
Goals accomplished
✓ First GPI project using text
processing techniques
✓ Create a scalable VQA model
✓ Build a modular and reusable
software package
✓ Extended abstract accepted
to VQA workshop CVPR 2016
Conclusion
Personal overview
● Submission to VQA Challenge
● VQA, hot topic at CVPR 2016
● Model designed to generate
answers instead of classifying
them
● Question-Answer pair
generation proposal
39
Introduction Related
Work
Methodology Results Conclusions Future
Work
Future Work
40
Future work
41
● Decoder for multiple word
answers
● Character embedding
● Attention mechanisms
● Question-Answer pairs
generation
Next steps
Automatic Question-Answer Pairs Generation
42
Thank You!
43
Do you have any
question?
Project resource links
● Thesis: https://imatge.upc.edu/web/sites/default/files/pub/xMasuda-
Mora_0.pdf
● Web page: http://imatge-upc.github.io/vqa-2016-cvprw/
● Source code: https://github.com/imatge-upc/vqa-2016-cvprw
44
Motivation: First steps towards QA Generation
45
AI System
Question
What is the man doing?
Answer
Surf
VQA: Counterexample
46
Dynamic Parameter Prediction Network (DPPnet)
Noh, H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with dynamic parameter
prediction. CVPR 2016
Experiments: Batch Normalization
47
Losses I
48
Losses II
49
Losses III
50
VQA Challenge statistics: Image modelling
51
VQA Challenge statistics: Question modelling
52

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
 
Pneumonia detection using cnn
Pneumonia detection using cnnPneumonia detection using cnn
Pneumonia detection using cnnTushar Dalvi
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text SummarizationTho Phan
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 

Was ist angesagt? (20)

Text summarization
Text summarizationText summarization
Text summarization
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And Applications
 
Andrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at BaiduAndrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at Baidu
 
Captcha ppt
Captcha pptCaptcha ppt
Captcha ppt
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Pneumonia detection using cnn
Pneumonia detection using cnnPneumonia detection using cnn
Pneumonia detection using cnn
 
Generative models
Generative modelsGenerative models
Generative models
 
PhD Defense Talk - Near-Optimal Mobile Crowdsensing: Design Framework and Alg...
PhD Defense Talk - Near-Optimal Mobile Crowdsensing: Design Framework and Alg...PhD Defense Talk - Near-Optimal Mobile Crowdsensing: Design Framework and Alg...
PhD Defense Talk - Near-Optimal Mobile Crowdsensing: Design Framework and Alg...
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 

Andere mochten auch

Prepositions of place [โหมดความเข้ากันได้]
Prepositions of place [โหมดความเข้ากันได้]Prepositions of place [โหมดความเข้ากันได้]
Prepositions of place [โหมดความเข้ากันได้]jureeporn55
 
Common prefix
Common prefixCommon prefix
Common prefixmha-fing
 
Top Twenty Prefixes
Top Twenty PrefixesTop Twenty Prefixes
Top Twenty PrefixesLKominos
 
Prepositions of place
Prepositions of placePrepositions of place
Prepositions of placelicerys
 
Prefix & Suffix
Prefix & Suffix Prefix & Suffix
Prefix & Suffix bmorgan45
 
Prepositions (PPT)
Prepositions (PPT)Prepositions (PPT)
Prepositions (PPT)Ysa Garcera
 
Prefixes and suffixes ppt
Prefixes and suffixes pptPrefixes and suffixes ppt
Prefixes and suffixes pptlgio64
 
Pronouns powerpoint
Pronouns powerpointPronouns powerpoint
Pronouns powerpointcaloughman
 
Prefixes and suffixes
Prefixes and suffixesPrefixes and suffixes
Prefixes and suffixesnidiajaimes26
 
Prepositions powerpoint[1]
Prepositions powerpoint[1]Prepositions powerpoint[1]
Prepositions powerpoint[1]mfondren
 
Slide power point preposition noreen
Slide power point preposition  noreenSlide power point preposition  noreen
Slide power point preposition noreengrammarliciousit
 

Andere mochten auch (15)

Prepositions
 Prepositions Prepositions
Prepositions
 
Prepositions of place [โหมดความเข้ากันได้]
Prepositions of place [โหมดความเข้ากันได้]Prepositions of place [โหมดความเข้ากันได้]
Prepositions of place [โหมดความเข้ากันได้]
 
Common prefix
Common prefixCommon prefix
Common prefix
 
Top Twenty Prefixes
Top Twenty PrefixesTop Twenty Prefixes
Top Twenty Prefixes
 
Prepositions of place
Prepositions of placePrepositions of place
Prepositions of place
 
Pronouns
PronounsPronouns
Pronouns
 
Prefix & Suffix
Prefix & Suffix Prefix & Suffix
Prefix & Suffix
 
Prepositions
PrepositionsPrepositions
Prepositions
 
Prepositions (PPT)
Prepositions (PPT)Prepositions (PPT)
Prepositions (PPT)
 
Prefixes and suffixes ppt
Prefixes and suffixes pptPrefixes and suffixes ppt
Prefixes and suffixes ppt
 
Pronouns powerpoint
Pronouns powerpointPronouns powerpoint
Pronouns powerpoint
 
Prefixes and suffixes
Prefixes and suffixesPrefixes and suffixes
Prefixes and suffixes
 
Preposition of-time
Preposition of-timePreposition of-time
Preposition of-time
 
Prepositions powerpoint[1]
Prepositions powerpoint[1]Prepositions powerpoint[1]
Prepositions powerpoint[1]
 
Slide power point preposition noreen
Slide power point preposition  noreenSlide power point preposition  noreen
Slide power point preposition noreen
 

Ähnlich wie Open-ended Visual Question-Answering

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...Universitat Politècnica de Catalunya
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Universitat Politècnica de Catalunya
 
SCAI invited talk @EMNLP2020
SCAI invited talk @EMNLP2020SCAI invited talk @EMNLP2020
SCAI invited talk @EMNLP2020Verena Rieser
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 
Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...
Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...
Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...Mathias Magdowski
 
Scalable Dynamic Graph Summarization
Scalable Dynamic Graph SummarizationScalable Dynamic Graph Summarization
Scalable Dynamic Graph SummarizationIoanna Tsalouchidou
 
stanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdfstanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdfAdeIndriawan1
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningMarc Bolaños Solà
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Universitat Politècnica de Catalunya
 
Обучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграхОбучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграхAnatol Alizar
 
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image TransformationDeep Learning JP
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Puya - Hossein Vahabi
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...Duke Network Analysis Center
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingMike McCann
 

Ähnlich wie Open-ended Visual Question-Answering (20)

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Layer-wise CNN Surgery for Visual Sentiment Prediction
Layer-wise CNN Surgery for Visual Sentiment PredictionLayer-wise CNN Surgery for Visual Sentiment Prediction
Layer-wise CNN Surgery for Visual Sentiment Prediction
 
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
 
SCAI invited talk @EMNLP2020
SCAI invited talk @EMNLP2020SCAI invited talk @EMNLP2020
SCAI invited talk @EMNLP2020
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...
Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...
Personalized Tasks and Anonymous Peer Feedback in the Fundamentals of Electri...
 
Scalable Dynamic Graph Summarization
Scalable Dynamic Graph SummarizationScalable Dynamic Graph Summarization
Scalable Dynamic Graph Summarization
 
stanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdfstanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdf
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
 
Обучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграхОбучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграх
 
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
 

Mehr von Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Open-ended Visual Question-Answering