SlideShare a Scribd company logo
1 of 40
Download to read offline
Backpropagation
Day 2 Lecture 1
http://bit.ly/idl2020
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
2
Acknowledgements
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Elisa Sayrol
elisa.sayrol@upc.edu
Associate Professor
ETSETB TelecomBCN
Universitat Politècnica de Catalunya
3
Video lecture
Loss function - L(y, ŷ)
The loss function assesses the performance of our model by comparing its
predictions (ŷ) to an expected value (y), typically coming from annotations.
Example: the predicted price (ŷ) and one actually paid (y)
could be compared with the Euclidean distance (also
referred as L2 distance or Mean Square Error - MSE):
1
x1
x2
x3
y = {-∞, ∞}
b
w1
w2
w3
Discussion: Consider the
single-parameter model...
…....and that, given a pair (y, ŷ), we would
like to update the current wt
value to a
new wt+1
based on the loss function Lw
.
(a) Would you increase or decrease wt
?
(b) What operation could indicate which
way to go ?
(c) How much would you increase or
decrease wt
?
Loss function - L(y, ŷ)
ŷ = x · w
Lw
Motivation for this lecture:
if we had a way to estimate the
gradient of the loss (▽L)with respect
to the parameter(s), we could use
gradient descent to optimize them.
6
Gradient Descent (GD)
Descend
(minus sign)
Learning
rate (LR)
Lw
Backpropagation will allow us to compute the gradients of the loss function with
respect to:
● all model parameters (w & b) - final goal during training
● input/intermediate data - visualization & interpretability purposes.
Gradients will “flow” from the output of the model towards the input (“back”).
Gradient Descent (GD)
Computational graph of a simple perceptron
8
σ
1
x1
x2
y = {0, 1}
b
w1
w2
Question: What is the computational graph
(operations & order) of this perceptron with a
sigmoid activation ?
Computational graph of a perceptron
9
σ
1
x1
x2
y = {0, 1}
b
w1
w2
x
x
w1
x1
w2
x2
Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University.
10
σ
1
x1
x2
y = {0, 1}
b
w1
w2
x
+
x
w1
x1
w2
x2
Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University.
Computational graph of a perceptron
11
σ
1
x1
x2
y = {0, 1}
b
w1
w2
x
+
x
+
w1
x1
w2
x2
b
Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University.
Computational graph of a perceptron
12
σ
1
x1
x2
y = {0, 1}
b
w1
w2
x
+
x
+ σ
w1
x1
w2
x2
b
Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University.
y
Computational graph of a perceptron
13
σ
1
x1
x2
y = {0, 1}
b
w1
w2
x
+
x
+ σ
1 0.73
Computational graph of a perceptron
y
Forward pass
w1
x1
w2
x2
b
14
x
+
x
+ σ
Challenge: How to compute
the gradient of the loss
function with respect to w1
or
w2
?
w1
x1
w2
x2
b
y
ŷ
Computational graph of a perceptron
15
Gradients from composition (chain rule)
Forward pass
16
Gradients from composition (chain rule)
How does a variation
(“difference”) on the
input affect the
prediction ?
Backward pass
17
Gradients from composition (chain rule)
A variation in x5
directly affects on ŷ
with a 1:1 factor.
18
Gradients from composition (chain rule)
How does a
variation on x4
affect the
predicted ŷ ?
19
Gradients from composition (chain rule)
How does a
variation on x4
affect the
predicted ŷ ?
It corresponds to
how a variation of x5
affects ŷ...
20
Gradients from composition (chain rule)
...multiplied by how
a variation near the
input x4
affects the
output g4
(x4
).
It corresponds to
how a variation of x5
affects ŷ ...
How does a
variation on x4
affect the
predicted ŷ ?
21
Gradients from composition (chain rule)
Backward pass
The same reasoning can be iteratively applied until reaching :
22
Gradients from composition (chain rule)
In order to compute , we must:
1) Find the derivative function ➝ g’i
(·)
2) Evaluate g’i
(·) at xi
➝ g’i
(xi
)
3) Multiply g’i
(xi
) with the backpropagated gradient (δk
).
23
Gradients from composition (chain rule)
Backward pass
When training NN, we
will actually compute
the derivative over the
loss function, not over
the predicted value ŷ.
L(y, ŷ)
24
Gradients from composition (chain rule)
x
+
x
+ σ
w1
x1
w2
x2
b
Question: What are the
derivatives of the function
involved in the
computational graph of a
perceptron ?
● SIGMOID (σ)
● SUM (+)
● PRODUCT (x)
ŷ
25
Gradient backpropagation in a perceptron
We can now estimate the
sensitivity of the output y with
respect to each input
parameter wi
and xi
.
Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University.
w1
x1
w2
x2
b
x
+
x
+ σ
1 0.73
dy/dy=1
Backward pass
ŷ
Gradient weights for sigmoid σ
Even more details: Arunava, “Derivative of the Sigmoid function” (2018)
x
+
x
+ σ
w1
x1
w2
x2
b
...which can be re-arranged as...
(*)
(*)
Figure: Andrej Karpathy
ŷ
27
Gradient backpropagation in a perceptron
w1
x1
w2
x2
b
x
+
x
+ σ
1 0.73
0,2
Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University.
Backward pass
ŷ
dŷ/dŷ=1
28
Sum: Distributes the gradient to both branches.
w1
x1
w2
x2
b
x
+
x
+ σ
1 0.73
0,2
0,2
0,2
0,2
Gradient backpropagation in a perceptron
Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University.
Backward passdy/db
0,2
ŷ
dŷ/dŷ=1
29
x
+
x
+ σ
1 0.73
0,2
0,2
0,2
-0,2
0,2
0,2
0,4
-0,4
-0,6
Product: Switches gradient weight values.
Gradient backpropagation in a perceptron
Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University.
Backward pass
w1
x1
w2
x2
b
dy/dw1
dy/dx1
dy/dw2
dy/dx2
dy/db
ŷ
dŷ/dŷ=1
30
w1
x1
w2
x2
b
x
+
x
+ σ
1 0.73
0,2
0,2
0,2
-0,2
0,2
0,2
0,4
-0,4
-0,6
dŷ/dŷ=1
Gradient backpropagation in a perceptron
Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University.
Normally, we will be
interested only on the
weights (wi
) and biases (b),
not the inputs (xi
). The
weights are the parameters
to learn in our models.
Backward pass
dy/dw1
dy/dx1
dy/dw2
dy/dx2
dy/db
ŷ
(bonus) Gradients weights for MAX & SPLIT
0,2
0,2
0
max
Max: Routes the gradient only to the higher input branch (not sensitive
to the lower branches).
Split: Branches that split in the forward pass and merge in the backward
pass, add gradients
+
(bonus) Gradient weights for ReLU
Figures: Andrej Karpathy
Backpropagation across layers
Backward pass
Gradients can flow across stacked layers of neurons to estimate their parameters.
Watch more
34
Gilbert Strang, “27. Backpropagation: Find
Partial Derivatives”. MIT 18.065 (2018)
Creative Commons, “Yoshua Bengio Extra
Footage 1: Brainstorm with students” (2018)
Learn more
35
READ
● Chris Olah, “Calculus on Computational Graphs: Backpropagation” (2015).
● Andrej Karpathy,, “Yes, you should understand backprop” (2016), and his “course notes at
Stanford University CS231n.
THREAD
Advanced discussion
36
Problem
Consider a perceptron with a ReLU as activation function designed to process a single-dimensional inputs x.
a) Draw the computational graph of the perceptron, drawing a circle around the parameters that need to
be estimated during training.
b) Compute the partial derivative of the output of the perceptron (y) with respect to each of its
parameters for the input sample x=2. Consider that all the trainable parameters of the perceptron are
initialized to 1.
c) Modify the results obtained in b) for the case in which all the trainable parameters of the perceptron
are initialized to -1.
d) Briefly comment and compare the results obtained in b) and c).
Problem (solved)
a) Draw the computational graph of the perceptron, drawing a circle around the parameters that need to
be estimated during training.
b) b) Compute the partial derivative of the output of the perceptron (y) with respect to each of its
parameters for the input sample x=2. Consider that all the trainable parameters of the perceptron are
initialized to 1.
Problem (solved)
c) Modify the results obtained in b) for the case in which all the trainable parameters of the perceptron are
initialized to -1.
d) Briefly comment and compare the results obtained in b) and c).
d) While in case b) the gradients can flow until the trainable parameters w1
and b, in case c) gradients are
“killed” by the ReLU.
Backpropagation for Deep Learning

More Related Content

What's hot

The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...Edge AI and Vision Alliance
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svmStéphane Canu
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysisrik0
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation哲东 郑
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 
Introductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic conceptsIntroductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic conceptsQin Jian
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김
 

What's hot (19)

Bayesian Core: Chapter 8
Bayesian Core: Chapter 8Bayesian Core: Chapter 8
Bayesian Core: Chapter 8
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
Introductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic conceptsIntroductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic concepts
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 

Similar to Backpropagation for Deep Learning

Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Universitat Politècnica de Catalunya
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfCD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfRajJain516913
 
EFFINET - Initial Presentation
EFFINET - Initial PresentationEFFINET - Initial Presentation
EFFINET - Initial PresentationPantelis Sopasakis
 
Reduction of the small gain condition
Reduction of the small gain conditionReduction of the small gain condition
Reduction of the small gain conditionMKosmykov
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesAdnanHaider234505
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Normalization cross correlation value of
Normalization cross correlation value ofNormalization cross correlation value of
Normalization cross correlation value ofeSAT Publishing House
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDARaghu Palakodety
 
Conference_paper.pdf
Conference_paper.pdfConference_paper.pdf
Conference_paper.pdfNarenRajVivek
 

Similar to Backpropagation for Deep Learning (20)

Backpropagation for Neural Networks
Backpropagation for Neural NetworksBackpropagation for Neural Networks
Backpropagation for Neural Networks
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfCD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
 
EFFINET - Initial Presentation
EFFINET - Initial PresentationEFFINET - Initial Presentation
EFFINET - Initial Presentation
 
Reduction of the small gain condition
Reduction of the small gain conditionReduction of the small gain condition
Reduction of the small gain condition
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...
MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...
MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Normalization cross correlation value of
Normalization cross correlation value ofNormalization cross correlation value of
Normalization cross correlation value of
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
EE658_Lecture_8.pdf
EE658_Lecture_8.pdfEE658_Lecture_8.pdf
EE658_Lecture_8.pdf
 
lecture 1.pptx
lecture 1.pptxlecture 1.pptx
lecture 1.pptx
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDA
 
Conference_paper.pdf
Conference_paper.pdfConference_paper.pdf
Conference_paper.pdf
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 

Recently uploaded

wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Recently uploaded (20)

wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Backpropagation for Deep Learning

  • 1. Backpropagation Day 2 Lecture 1 http://bit.ly/idl2020 Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Associate Professor Universitat Politècnica de Catalunya
  • 2. 2 Acknowledgements Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Elisa Sayrol elisa.sayrol@upc.edu Associate Professor ETSETB TelecomBCN Universitat Politècnica de Catalunya
  • 4. Loss function - L(y, ŷ) The loss function assesses the performance of our model by comparing its predictions (ŷ) to an expected value (y), typically coming from annotations. Example: the predicted price (ŷ) and one actually paid (y) could be compared with the Euclidean distance (also referred as L2 distance or Mean Square Error - MSE): 1 x1 x2 x3 y = {-∞, ∞} b w1 w2 w3
  • 5. Discussion: Consider the single-parameter model... …....and that, given a pair (y, ŷ), we would like to update the current wt value to a new wt+1 based on the loss function Lw . (a) Would you increase or decrease wt ? (b) What operation could indicate which way to go ? (c) How much would you increase or decrease wt ? Loss function - L(y, ŷ) ŷ = x · w Lw
  • 6. Motivation for this lecture: if we had a way to estimate the gradient of the loss (▽L)with respect to the parameter(s), we could use gradient descent to optimize them. 6 Gradient Descent (GD) Descend (minus sign) Learning rate (LR) Lw
  • 7. Backpropagation will allow us to compute the gradients of the loss function with respect to: ● all model parameters (w & b) - final goal during training ● input/intermediate data - visualization & interpretability purposes. Gradients will “flow” from the output of the model towards the input (“back”). Gradient Descent (GD)
  • 8. Computational graph of a simple perceptron 8 σ 1 x1 x2 y = {0, 1} b w1 w2 Question: What is the computational graph (operations & order) of this perceptron with a sigmoid activation ?
  • 9. Computational graph of a perceptron 9 σ 1 x1 x2 y = {0, 1} b w1 w2 x x w1 x1 w2 x2 Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University.
  • 10. 10 σ 1 x1 x2 y = {0, 1} b w1 w2 x + x w1 x1 w2 x2 Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University. Computational graph of a perceptron
  • 11. 11 σ 1 x1 x2 y = {0, 1} b w1 w2 x + x + w1 x1 w2 x2 b Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University. Computational graph of a perceptron
  • 12. 12 σ 1 x1 x2 y = {0, 1} b w1 w2 x + x + σ w1 x1 w2 x2 b Example adapted from “CS231n: Convolutional Neural Networks for Visual Recognition”, Stanford University. y Computational graph of a perceptron
  • 13. 13 σ 1 x1 x2 y = {0, 1} b w1 w2 x + x + σ 1 0.73 Computational graph of a perceptron y Forward pass w1 x1 w2 x2 b
  • 14. 14 x + x + σ Challenge: How to compute the gradient of the loss function with respect to w1 or w2 ? w1 x1 w2 x2 b y ŷ Computational graph of a perceptron
  • 15. 15 Gradients from composition (chain rule) Forward pass
  • 16. 16 Gradients from composition (chain rule) How does a variation (“difference”) on the input affect the prediction ? Backward pass
  • 17. 17 Gradients from composition (chain rule) A variation in x5 directly affects on ŷ with a 1:1 factor.
  • 18. 18 Gradients from composition (chain rule) How does a variation on x4 affect the predicted ŷ ?
  • 19. 19 Gradients from composition (chain rule) How does a variation on x4 affect the predicted ŷ ? It corresponds to how a variation of x5 affects ŷ...
  • 20. 20 Gradients from composition (chain rule) ...multiplied by how a variation near the input x4 affects the output g4 (x4 ). It corresponds to how a variation of x5 affects ŷ ... How does a variation on x4 affect the predicted ŷ ?
  • 21. 21 Gradients from composition (chain rule) Backward pass The same reasoning can be iteratively applied until reaching :
  • 22. 22 Gradients from composition (chain rule) In order to compute , we must: 1) Find the derivative function ➝ g’i (·) 2) Evaluate g’i (·) at xi ➝ g’i (xi ) 3) Multiply g’i (xi ) with the backpropagated gradient (δk ).
  • 23. 23 Gradients from composition (chain rule) Backward pass When training NN, we will actually compute the derivative over the loss function, not over the predicted value ŷ. L(y, ŷ)
  • 24. 24 Gradients from composition (chain rule) x + x + σ w1 x1 w2 x2 b Question: What are the derivatives of the function involved in the computational graph of a perceptron ? ● SIGMOID (σ) ● SUM (+) ● PRODUCT (x) ŷ
  • 25. 25 Gradient backpropagation in a perceptron We can now estimate the sensitivity of the output y with respect to each input parameter wi and xi . Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University. w1 x1 w2 x2 b x + x + σ 1 0.73 dy/dy=1 Backward pass ŷ
  • 26. Gradient weights for sigmoid σ Even more details: Arunava, “Derivative of the Sigmoid function” (2018) x + x + σ w1 x1 w2 x2 b ...which can be re-arranged as... (*) (*) Figure: Andrej Karpathy ŷ
  • 27. 27 Gradient backpropagation in a perceptron w1 x1 w2 x2 b x + x + σ 1 0.73 0,2 Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University. Backward pass ŷ dŷ/dŷ=1
  • 28. 28 Sum: Distributes the gradient to both branches. w1 x1 w2 x2 b x + x + σ 1 0.73 0,2 0,2 0,2 0,2 Gradient backpropagation in a perceptron Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University. Backward passdy/db 0,2 ŷ dŷ/dŷ=1
  • 29. 29 x + x + σ 1 0.73 0,2 0,2 0,2 -0,2 0,2 0,2 0,4 -0,4 -0,6 Product: Switches gradient weight values. Gradient backpropagation in a perceptron Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University. Backward pass w1 x1 w2 x2 b dy/dw1 dy/dx1 dy/dw2 dy/dx2 dy/db ŷ dŷ/dŷ=1
  • 30. 30 w1 x1 w2 x2 b x + x + σ 1 0.73 0,2 0,2 0,2 -0,2 0,2 0,2 0,4 -0,4 -0,6 dŷ/dŷ=1 Gradient backpropagation in a perceptron Example extracted from Andrej Karpathy’s notes for CS231n from Stanford University. Normally, we will be interested only on the weights (wi ) and biases (b), not the inputs (xi ). The weights are the parameters to learn in our models. Backward pass dy/dw1 dy/dx1 dy/dw2 dy/dx2 dy/db ŷ
  • 31. (bonus) Gradients weights for MAX & SPLIT 0,2 0,2 0 max Max: Routes the gradient only to the higher input branch (not sensitive to the lower branches). Split: Branches that split in the forward pass and merge in the backward pass, add gradients +
  • 32. (bonus) Gradient weights for ReLU Figures: Andrej Karpathy
  • 33. Backpropagation across layers Backward pass Gradients can flow across stacked layers of neurons to estimate their parameters.
  • 34. Watch more 34 Gilbert Strang, “27. Backpropagation: Find Partial Derivatives”. MIT 18.065 (2018) Creative Commons, “Yoshua Bengio Extra Footage 1: Brainstorm with students” (2018)
  • 35. Learn more 35 READ ● Chris Olah, “Calculus on Computational Graphs: Backpropagation” (2015). ● Andrej Karpathy,, “Yes, you should understand backprop” (2016), and his “course notes at Stanford University CS231n. THREAD
  • 37. Problem Consider a perceptron with a ReLU as activation function designed to process a single-dimensional inputs x. a) Draw the computational graph of the perceptron, drawing a circle around the parameters that need to be estimated during training. b) Compute the partial derivative of the output of the perceptron (y) with respect to each of its parameters for the input sample x=2. Consider that all the trainable parameters of the perceptron are initialized to 1. c) Modify the results obtained in b) for the case in which all the trainable parameters of the perceptron are initialized to -1. d) Briefly comment and compare the results obtained in b) and c).
  • 38. Problem (solved) a) Draw the computational graph of the perceptron, drawing a circle around the parameters that need to be estimated during training. b) b) Compute the partial derivative of the output of the perceptron (y) with respect to each of its parameters for the input sample x=2. Consider that all the trainable parameters of the perceptron are initialized to 1.
  • 39. Problem (solved) c) Modify the results obtained in b) for the case in which all the trainable parameters of the perceptron are initialized to -1. d) Briefly comment and compare the results obtained in b) and c). d) While in case b) the gradients can flow until the trainable parameters w1 and b, in case c) gradients are “killed” by the ReLU.