SlideShare a Scribd company logo
1 of 67
Download to read offline
Implementing Deep
Learning using cuDNN
이예하

VUNO Inc.
1
Contents
1. Deep Learning Review

2. Implementation on GPU using cuDNN

3. Optimization Issues

4. Introduction to VUNO-Net
2
1.

Deep Learning Review
3
Brief History of Neural Network
1940 1950 1960 1970 1980 1990 2000
Golden Age Dark Age (“AI Winter”)
Electronic Brain
1943
S. McCulloch - W. Pitts
• Adjustable Weights
• Weights are not Learned
1969
XOR
Problem
M. Minsky - S. Papert
• XOR Problem
Multi-layered
Perceptron
(Backpropagation)
1986
D. Rumelhart - G. Hinton - R. Wiliams
• Solution to nonlinearly separable problems
• Big computation, local optima and overfitting
Backward Error
Foward Activity
V. Vapnik - C. Cortes
1995
SVM
• Limitations of learning prior knowledge
• Kernel function: Human Intervention
Deep Neural Network
(Pretraining)
2006
G. Hinton - S. Ruslan
• Hierarchical feature Learning
2010
Perceptron
1957
F. Rosenblatt
• Learnable Weights and Threshold
ADALINE
1960
B. Widrow - M. Hoff
4
Machine/Deep Learning is eating the world!
5
• Restricted Boltzmann machine
• Auto-encoder
• Deep belief network

• Deep Boltzmann machines

• Generative stochastic networks

• Convolutional neural networks

• Recurrent neural networks
Building Blocks
6
• Receptive Field(Huber & Wiesel, 1959)



• Neocognitron (Fukushima, 1980)
Convolutional Neural Networks
7
• LeNet-5 (Yann LeCun, 1998)
Convolutional Neural Networks
8
• Alex Net (Alex Krizhevsky et. al., 2012)





• Clarifai (Matt Zeiler and Rob Fergus, 2013)
Convolutional Neural Networks
9
• Detection/Classification and Localization

• 1.2M for Training/50K for Validation

• 1000 Classes
ImageNet Challenge
10
Error rate Remark
Human 5.1% Karpathy
google (2014/8) 6.67% Inception
baidu (2015/1) 5.98% Data Augmentation
ms (2015/2) 4.94%
SPP, PReLU, 

Weight Initialization
google (2015/2) 4.82% Batch Normalization
baidu (2015/5) 4.58% Cheating
Convolutional Neural Networks
Network
• Softmax Layer (Output)

• Fully Connected Layer

• Pooling Layer

• Convolution Layer
Layer
• Input / Output

• Weights

• Neuron activationVGG (K. Simonyan and A. Zisserman, 2015)
11
Neural Network
12
Forward Pass
Softmax Layer
FC Layer
Conv Layer
Backward Pass
Softmax Layer
FC Layer
Conv Layer
Convolution Layer - Forward
x1 x4 x7
x2 x5 x8
x3 x6 x9
w1 w3
w2 w4
y1 y3
y2 y4
Filter Input
13
w1
w2
w4
w3
Output
Convolution Layer - Forward
How to evaluate the convolution layer efficiently?
1. Lower the convolutions into a matrix multiplication (cuDNN)

• There are several ways to implement convolutions efficiently

2. Fast Fourier Transform to compute the convolution (cuDNN_v3)

3. Computing the convolutions directly (cuda-convnet)
14
Fully Connected Layer - Forward
15
x2x1
y1 y2
x3
w11 w12 w13
w w w
• Matrix calculation is very fast on GPU

• cuBLAS library
Softmax Layer - Forward
16
x2x1
y1 y2
Issues
1. How to efficiently evaluate denominator? 

2. Numerical instability due to the too large or too
small xk

Fortunately, softmax Layer is supported by cuDNN
Learning on Neural Network
Update network (i.e. weights) to minimize loss function
• Popular loss function for classification 

• Necessary criterion
Sum-of-squared
error
Cross-entropy
error
17
Learning on Neural Network
• Due to the complexity of the loss, a closed-form solution is usually not possible

• Using iterative approach

• How to evaluate the gradient

• Error backpropagation
Gradient descent Newton’s method
18
Softmax Layer - Backward
x2x1
y1 y2
Loss function:
• Errors back-propagated from softmax Layer
can be calculated
• Subtract 1 from the output value with target
index
• This can be done efficiently using GPU
threads
19
Fully Connected Layer - Backward
20
x2x1
y1 y2
x3
w11 w w
w21 w w
Forward pass:
Error:
Gradient:
• Matrix calculation is very fast on GPU

• Element-wise multiplication can be done
efficiently using GPU thread
Error:
Gradient:
Convolution Layer - Backward
x1 x4 x7
x2 x5 x8
x3 x6 x9
w1 w3
w2 w4
y1 y3
y2 y4
Filter Input
21
w1
w2
w4
w3
Output
w4 w2
w3 w1
Forward
Error
Convolution Layer - Backward
Filter Input
22
Output
x1 x4 x7
x2 x5 x8
x3 x6 x9
w1 w3
w2 w4
Gradient
• The error and gradient can be
computed with convolution scheme
• cuDNN supports this operation
2.

Implementation on GPU
using cuDNN
23
Introduction to cuDNN
cuDNN is a GPU-accelerated library of primitives for deep
neural networks
• Convolution forward and backward
• Pooling forward and backward
• Softmax forward and backward
• Neuron activations forward and backward:
• Rectified linear (ReLU)
• Sigmoid
• Hyperbolic tangent (TANH)
• Tensor transformation functions
24
Introduction to cuDNN (version 2)
cuDNN's convolution routines aim for performance
competitive with the fastest GEMM
•Lowering the convolutions into a matrix multiplication
25(Sharan Chetlur et. al., 2015)
Introduction to cuDNN
Benchmarks
26
https://github.com/soumith/convnet-benchmarks
https://developer.nvidia.com/cudnn
Goal
Learning VGG model using cuDNN
• Data Layer
• Convolution Layer
• Pooling Layer
• Fully Connected Layer
• Softmax Layer
27
Preliminary
Common data structure for Layer
• Device memory & tensor description for input/output data & error
• Tensor Description defines dimensions of data
28
cuDNN initialize & release
cudaSetDevice( deviceID );

cudnnCreate( &cudnn_handle );
cudnnDestroy( cudnn_handle );

cudaDeviceReset();
float *d_input, *d_output, *d_inputDelta, *d_outputDelta

cudnnTensorDescriptor_t inputDesc;

cudnnTensorDescriptor_t outputDesc;
Data Layer
29
create & set Tensor Descriptor
cudnnCreateTensorDescriptor();
cudnnSetTensor4dDescriptor();
cudnnSetTensor4dDescriptor(

outputDesc,

CUDNN_TENSOR_NCHW,

CUDNN_FLOAT,

sampleCnt,

channels,

height,

width

);
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
31 32 33
34 35 36
sample #1 sample #2
channel #1
channel #2
Example: 2 images (3x3x2)
Convolution Layer
1. Initialization
30
1.1 create & set Filter Descriptor
1.2 create & set Conv Descriptor
1.3 create & set output Tensor
Descriptor
1.4 Get Convolution Algorithm
cudnnCreateFilterDescriptor(&filterDesc);
cudnnSetFilter4dDescriptor(…);
cudnnCreateConvolutionDescriptor(&convDesc);
cudnnSetConvolution2dDescriptor(…);
cudnnGetConvolution2dForwardOutputDim(…);
cudnnCreateTensorDescriptor(&dstTensorDesc);
cudnnSetTensor4dDescriptor();
cudnnGetConvolutionForwardAlgorithm(…);
cudnnGetConvolutionForwardWorkspaceSize(…);
Convolution Layer
1.1 Set Filter Descriptor
31
cudnnSetFilter4dDescriptor(

filterDesc,

CUDNN_FLOAT,

filterCnt,

input_channelCnt,

filter_height,

filter_width

);
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
Filter #1 Filter #2
channel #1
channel #2
Example: 2 Filters (2x2x2)
Convolution Layer
1.2 Set Convolution Descriptor
32
CUDNN_CROSS_CORRELATION
Convolution Layer
1.3 Set output Tensor Descriptor
33
• n, c, h and w indicate output dimension
• Tensor Description defines dimensions of data
Convolution Layer
1.4 Get Convolution Algorithm
34
inputDesc
outputDesc
CUDNN_CONVOLUTION_FWD_PREFER_FASTEST
Convolution Layer
2. Forward Pass
2.1 Convolution
2.2 Activation
cudnnConvolutionForward(…);
cudnnActivationForward(…);
35
Convolution Layer
2.1 Convolution
36
d_input

inputDesc
d_output

outputDesc
Convolution Layer
2.2 Activation
37
sigmoid

tanh

ReLU
outputDesc

d_output
Convolution Layer
3. Backward Pass
3.1 Activation Backward
3.2 Calculate Gradient
cudnnActivationBackward(…);
cudnnConvolutionBackwardFilter(…);
3.2 Error Backpropagation cudnnConvolutionBackwardData(…);
38
• Errors back-propagated from l+1 layer (d_outputDelta) is multiplied by
• See 22 slide (Convolution Layer - Backward)
Convolution Layer
3.1 Activation Backward
39
outputDesc

d_output
outputDesc

d_outputDelta
outputDesc

d_output
outputDesc

d_outputDelta
Convolution Layer
3.2 Calculate Gradient
40
inputDesc

d_input
outputDesc

d_outputDelta
filterDesc

d_filterGradient
Convolution Layer
3.2 Error Backpropagation
41
outputDesc

d_outputDelta
inputDesc

d_inputDelta
Pooling Layer / Softmax Layer
42
1. Initilization
2. Forward Pass
3. Backward Pass
cudnnCreatePoolingDescriptor(&poolingDesc);
cudnnSetPooling2dDescriptor(…);
cudnnPoolingForward(…);
cudnnPoolingBackward(…);
Pooling Layer
Softmax Layer
Forward Pass cudnnSoftmaxForward(…);
3.

Optimization Issues
43
Optimization
Learning Very Deep ConvNet
We know the Deep ConvNet can be trained without pre-training

• weight sharing

• sparsity

• Recifier unit



But, “With fixed standard deviations very deep models have difficulties to
converges” (Kaiming He et. al., 2015)

• e.g. random initialization from Gaussian dist. with 0.01 std

• >8 convolution layers
44
Optimization
VGG (K. Simonyan and A. Zisserman, 2015) GoogleNet (C. Szegedy et. al., 2014)
45
Optimization
Initialization of Weights for Rectifier (Kaiming He et. al., 2015)
•The variance of the response in each layer
•Sufficient condition that the gradient is not exponentially large/small
•Standard deviation for initialization
(spatial filter size)2 x (filter Cnt)
46
Optimization
Case study
47
The filter number
64 0.059
128 0.042
256 0.029
512 0.021
3x3 filter
• When using 0.01, the std of the gradient
propagated from conv10 to convey
Error vanishing
Speed
Data loading & model learning
•Reducing data loading and augmentation time

•Data provider thread (dp_thread)

•Model learning thread (worker_thread)
readData()

{

if(is_dp_thread_running) pthread_join(dp_thread)

…

if(is_data_remain) pthread_create(dp_thread)

}
readData();

for(…) {

readData();

pthread_create(worker_thread)

…

pthread_join(worker_thread)

}
48
Speed
Multi-GPU
• Data parallelization v.s. Model parallelization
• Distribute the model, use the same data : Model Parallelism
• Distribute the data, use the same model : Data Parallelism
• Data parallelization & Gradient Average
• One of the easiest way to use Multi-GPU

• The result is same with using single GPU
49
Parallelism
50
Data Parallelization Model Parallelization
The good : Easy to implement

The bad : Cost of sync increases with the number of GPU
The good : Larger network can be trained

The bad : Sync is necessary in all layers
Parallelism
51
Mixing Data Parallelization and Model Parallelization
(Krizhevsky, 2014)
Speed
Data Parallelization & Gradient Average
Forward Pass
Backward Pass
Gradient
Update
data (256)
Forward Pass
Backward Pass
Gradient
Update
data1 (128)
Forward Pass
Backward Pass
Gradient
data2(128)
Average
52
4.

Introduction to VUNO-
Net
53
The Team
54
VUNO-Net
Theano
Caffe
Cuda-

ConvNet
Pylearn
RNNLIB
DL4J
Torch VUNO-Net
55
VUNO-Net
56
Structure Output Learning
Convolution
LSTM
MD-LSTM(2D)
Pooling
Spatial Pyramid Pooling
Fully Connection
Concatenation
Softmax
Regression
Connectionist Temporal Classification
Multi-GPU Support
Batch Normalization
Parametric Rectifier
Initialization for Rectifier
Stochastic Gradient Descent
Dropout
Data Augmentation
Open Source
VUNO-net Only
Features
Performance
The state-of-the-art performance at image & speech
84.3%82.2%
VUNOWR1
Speech Recognition

(TIMIT*)
82.1%79.3%
VUNOWR2
Image Classification

(CIFAR-100)
Kaggle Top 1

(2014)
* TIMIT is a one of most popular benchmark dataset for speech recognition task (Texas Instrument - MIT)

WR1 (World Record) - “Speech Recognitino with Deep Recurrent Neural Networks”, Alex Graves, ICCASP(2013)

WR2 (World Record) - “kaggle competition: https://www.kaggle.com/c/cifar-10”
57
Application
We’ve achieved record breaking performance on medical
image analysis.
58
Application
Whole Lung Quantification
• Sliding window: Pixel level classification

• But, context information is more important
• Ongoing works
MD-LSTM Recurrent CNN
normal Emphysemav.s.
59
Application
Whole Lung Quantification
60
Original Image (CT) VUNO Golden Standard
Example #1
Application
Whole Lung Quantification
61
Original Image (CT) VUNO Golden Standard
Example #2
Visualization
SVM Features (22 features)
62
Histogram, Gradient, Run-length, Co-occurrence matrix, Cluster analysis, 

Top-hat transformation
Visualization
Activation of top hidden layer
63
200 hidden nodes
Visualization
64
We Are Hiring!!
65
Algorithm Engineer CUDA Programmer Application Developer
Staff MemberBusiness Developer
Q&A
66
Thank You
67

More Related Content

What's hot

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 
Deep High Resolution Representation Learning for Human Pose Estimation
Deep High Resolution Representation Learning for Human Pose EstimationDeep High Resolution Representation Learning for Human Pose Estimation
Deep High Resolution Representation Learning for Human Pose Estimationharmonylab
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Survey of Attention mechanism & Use in Computer Vision
Survey of Attention mechanism & Use in Computer VisionSurvey of Attention mechanism & Use in Computer Vision
Survey of Attention mechanism & Use in Computer VisionSwatiNarkhede1
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Vitaly Bondar
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionChristoph Körner
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
(SURVEY) Semi Supervised Learning
(SURVEY) Semi Supervised Learning(SURVEY) Semi Supervised Learning
(SURVEY) Semi Supervised LearningYamato OKAMOTO
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative ModelsMijung Kim
 
Skeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural NetworkSkeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural NetworkLuong Vo
 

What's hot (20)

CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Deep High Resolution Representation Learning for Human Pose Estimation
Deep High Resolution Representation Learning for Human Pose EstimationDeep High Resolution Representation Learning for Human Pose Estimation
Deep High Resolution Representation Learning for Human Pose Estimation
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Survey of Attention mechanism & Use in Computer Vision
Survey of Attention mechanism & Use in Computer VisionSurvey of Attention mechanism & Use in Computer Vision
Survey of Attention mechanism & Use in Computer Vision
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Mobilenet
MobilenetMobilenet
Mobilenet
 
(SURVEY) Semi Supervised Learning
(SURVEY) Semi Supervised Learning(SURVEY) Semi Supervised Learning
(SURVEY) Semi Supervised Learning
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
Skeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural NetworkSkeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural Network
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 

Similar to [251] implementing deep learning using cu dnn

Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual networkThyrixYang1
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNBelmiro Moreira
 
chap4_ann.pptx
chap4_ann.pptxchap4_ann.pptx
chap4_ann.pptxImXaib
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...taeseon ryu
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksmilad abbasi
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...Hiroki Nakahara
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESREPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESRamnandan Krishnamurthy
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 

Similar to [251] implementing deep learning using cu dnn (20)

Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
DL (v2).pptx
DL (v2).pptxDL (v2).pptx
DL (v2).pptx
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
chap4_ann.pptx
chap4_ann.pptxchap4_ann.pptx
chap4_ann.pptx
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESREPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 

More from NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual SearchNAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?NAVER D2
 

More from NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

[251] implementing deep learning using cu dnn

  • 1. Implementing Deep Learning using cuDNN 이예하 VUNO Inc. 1
  • 2. Contents 1. Deep Learning Review 2. Implementation on GPU using cuDNN 3. Optimization Issues 4. Introduction to VUNO-Net 2
  • 4. Brief History of Neural Network 1940 1950 1960 1970 1980 1990 2000 Golden Age Dark Age (“AI Winter”) Electronic Brain 1943 S. McCulloch - W. Pitts • Adjustable Weights • Weights are not Learned 1969 XOR Problem M. Minsky - S. Papert • XOR Problem Multi-layered Perceptron (Backpropagation) 1986 D. Rumelhart - G. Hinton - R. Wiliams • Solution to nonlinearly separable problems • Big computation, local optima and overfitting Backward Error Foward Activity V. Vapnik - C. Cortes 1995 SVM • Limitations of learning prior knowledge • Kernel function: Human Intervention Deep Neural Network (Pretraining) 2006 G. Hinton - S. Ruslan • Hierarchical feature Learning 2010 Perceptron 1957 F. Rosenblatt • Learnable Weights and Threshold ADALINE 1960 B. Widrow - M. Hoff 4
  • 5. Machine/Deep Learning is eating the world! 5
  • 6. • Restricted Boltzmann machine • Auto-encoder • Deep belief network • Deep Boltzmann machines • Generative stochastic networks • Convolutional neural networks • Recurrent neural networks Building Blocks 6
  • 7. • Receptive Field(Huber & Wiesel, 1959)
 
 • Neocognitron (Fukushima, 1980) Convolutional Neural Networks 7
  • 8. • LeNet-5 (Yann LeCun, 1998) Convolutional Neural Networks 8
  • 9. • Alex Net (Alex Krizhevsky et. al., 2012)
 
 
 • Clarifai (Matt Zeiler and Rob Fergus, 2013) Convolutional Neural Networks 9
  • 10. • Detection/Classification and Localization • 1.2M for Training/50K for Validation • 1000 Classes ImageNet Challenge 10 Error rate Remark Human 5.1% Karpathy google (2014/8) 6.67% Inception baidu (2015/1) 5.98% Data Augmentation ms (2015/2) 4.94% SPP, PReLU, 
 Weight Initialization google (2015/2) 4.82% Batch Normalization baidu (2015/5) 4.58% Cheating
  • 11. Convolutional Neural Networks Network • Softmax Layer (Output) • Fully Connected Layer • Pooling Layer • Convolution Layer Layer • Input / Output • Weights • Neuron activationVGG (K. Simonyan and A. Zisserman, 2015) 11
  • 12. Neural Network 12 Forward Pass Softmax Layer FC Layer Conv Layer Backward Pass Softmax Layer FC Layer Conv Layer
  • 13. Convolution Layer - Forward x1 x4 x7 x2 x5 x8 x3 x6 x9 w1 w3 w2 w4 y1 y3 y2 y4 Filter Input 13 w1 w2 w4 w3 Output
  • 14. Convolution Layer - Forward How to evaluate the convolution layer efficiently? 1. Lower the convolutions into a matrix multiplication (cuDNN) • There are several ways to implement convolutions efficiently 2. Fast Fourier Transform to compute the convolution (cuDNN_v3) 3. Computing the convolutions directly (cuda-convnet) 14
  • 15. Fully Connected Layer - Forward 15 x2x1 y1 y2 x3 w11 w12 w13 w w w • Matrix calculation is very fast on GPU • cuBLAS library
  • 16. Softmax Layer - Forward 16 x2x1 y1 y2 Issues 1. How to efficiently evaluate denominator? 2. Numerical instability due to the too large or too small xk Fortunately, softmax Layer is supported by cuDNN
  • 17. Learning on Neural Network Update network (i.e. weights) to minimize loss function • Popular loss function for classification • Necessary criterion Sum-of-squared error Cross-entropy error 17
  • 18. Learning on Neural Network • Due to the complexity of the loss, a closed-form solution is usually not possible • Using iterative approach • How to evaluate the gradient • Error backpropagation Gradient descent Newton’s method 18
  • 19. Softmax Layer - Backward x2x1 y1 y2 Loss function: • Errors back-propagated from softmax Layer can be calculated • Subtract 1 from the output value with target index • This can be done efficiently using GPU threads 19
  • 20. Fully Connected Layer - Backward 20 x2x1 y1 y2 x3 w11 w w w21 w w Forward pass: Error: Gradient: • Matrix calculation is very fast on GPU • Element-wise multiplication can be done efficiently using GPU thread Error: Gradient:
  • 21. Convolution Layer - Backward x1 x4 x7 x2 x5 x8 x3 x6 x9 w1 w3 w2 w4 y1 y3 y2 y4 Filter Input 21 w1 w2 w4 w3 Output w4 w2 w3 w1 Forward Error
  • 22. Convolution Layer - Backward Filter Input 22 Output x1 x4 x7 x2 x5 x8 x3 x6 x9 w1 w3 w2 w4 Gradient • The error and gradient can be computed with convolution scheme • cuDNN supports this operation
  • 24. Introduction to cuDNN cuDNN is a GPU-accelerated library of primitives for deep neural networks • Convolution forward and backward • Pooling forward and backward • Softmax forward and backward • Neuron activations forward and backward: • Rectified linear (ReLU) • Sigmoid • Hyperbolic tangent (TANH) • Tensor transformation functions 24
  • 25. Introduction to cuDNN (version 2) cuDNN's convolution routines aim for performance competitive with the fastest GEMM •Lowering the convolutions into a matrix multiplication 25(Sharan Chetlur et. al., 2015)
  • 27. Goal Learning VGG model using cuDNN • Data Layer • Convolution Layer • Pooling Layer • Fully Connected Layer • Softmax Layer 27
  • 28. Preliminary Common data structure for Layer • Device memory & tensor description for input/output data & error • Tensor Description defines dimensions of data 28 cuDNN initialize & release cudaSetDevice( deviceID ); cudnnCreate( &cudnn_handle ); cudnnDestroy( cudnn_handle ); cudaDeviceReset(); float *d_input, *d_output, *d_inputDelta, *d_outputDelta cudnnTensorDescriptor_t inputDesc; cudnnTensorDescriptor_t outputDesc;
  • 29. Data Layer 29 create & set Tensor Descriptor cudnnCreateTensorDescriptor(); cudnnSetTensor4dDescriptor(); cudnnSetTensor4dDescriptor( outputDesc, CUDNN_TENSOR_NCHW, CUDNN_FLOAT, sampleCnt, channels, height, width ); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 sample #1 sample #2 channel #1 channel #2 Example: 2 images (3x3x2)
  • 30. Convolution Layer 1. Initialization 30 1.1 create & set Filter Descriptor 1.2 create & set Conv Descriptor 1.3 create & set output Tensor Descriptor 1.4 Get Convolution Algorithm cudnnCreateFilterDescriptor(&filterDesc); cudnnSetFilter4dDescriptor(…); cudnnCreateConvolutionDescriptor(&convDesc); cudnnSetConvolution2dDescriptor(…); cudnnGetConvolution2dForwardOutputDim(…); cudnnCreateTensorDescriptor(&dstTensorDesc); cudnnSetTensor4dDescriptor(); cudnnGetConvolutionForwardAlgorithm(…); cudnnGetConvolutionForwardWorkspaceSize(…);
  • 31. Convolution Layer 1.1 Set Filter Descriptor 31 cudnnSetFilter4dDescriptor( filterDesc, CUDNN_FLOAT, filterCnt, input_channelCnt, filter_height, filter_width ); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Filter #1 Filter #2 channel #1 channel #2 Example: 2 Filters (2x2x2)
  • 32. Convolution Layer 1.2 Set Convolution Descriptor 32 CUDNN_CROSS_CORRELATION
  • 33. Convolution Layer 1.3 Set output Tensor Descriptor 33 • n, c, h and w indicate output dimension • Tensor Description defines dimensions of data
  • 34. Convolution Layer 1.4 Get Convolution Algorithm 34 inputDesc outputDesc CUDNN_CONVOLUTION_FWD_PREFER_FASTEST
  • 35. Convolution Layer 2. Forward Pass 2.1 Convolution 2.2 Activation cudnnConvolutionForward(…); cudnnActivationForward(…); 35
  • 38. Convolution Layer 3. Backward Pass 3.1 Activation Backward 3.2 Calculate Gradient cudnnActivationBackward(…); cudnnConvolutionBackwardFilter(…); 3.2 Error Backpropagation cudnnConvolutionBackwardData(…); 38
  • 39. • Errors back-propagated from l+1 layer (d_outputDelta) is multiplied by • See 22 slide (Convolution Layer - Backward) Convolution Layer 3.1 Activation Backward 39 outputDesc d_output outputDesc d_outputDelta outputDesc d_output outputDesc d_outputDelta
  • 40. Convolution Layer 3.2 Calculate Gradient 40 inputDesc d_input outputDesc d_outputDelta filterDesc d_filterGradient
  • 41. Convolution Layer 3.2 Error Backpropagation 41 outputDesc d_outputDelta inputDesc d_inputDelta
  • 42. Pooling Layer / Softmax Layer 42 1. Initilization 2. Forward Pass 3. Backward Pass cudnnCreatePoolingDescriptor(&poolingDesc); cudnnSetPooling2dDescriptor(…); cudnnPoolingForward(…); cudnnPoolingBackward(…); Pooling Layer Softmax Layer Forward Pass cudnnSoftmaxForward(…);
  • 44. Optimization Learning Very Deep ConvNet We know the Deep ConvNet can be trained without pre-training • weight sharing • sparsity • Recifier unit But, “With fixed standard deviations very deep models have difficulties to converges” (Kaiming He et. al., 2015) • e.g. random initialization from Gaussian dist. with 0.01 std • >8 convolution layers 44
  • 45. Optimization VGG (K. Simonyan and A. Zisserman, 2015) GoogleNet (C. Szegedy et. al., 2014) 45
  • 46. Optimization Initialization of Weights for Rectifier (Kaiming He et. al., 2015) •The variance of the response in each layer •Sufficient condition that the gradient is not exponentially large/small •Standard deviation for initialization (spatial filter size)2 x (filter Cnt) 46
  • 47. Optimization Case study 47 The filter number 64 0.059 128 0.042 256 0.029 512 0.021 3x3 filter • When using 0.01, the std of the gradient propagated from conv10 to convey Error vanishing
  • 48. Speed Data loading & model learning •Reducing data loading and augmentation time •Data provider thread (dp_thread) •Model learning thread (worker_thread) readData() { if(is_dp_thread_running) pthread_join(dp_thread) … if(is_data_remain) pthread_create(dp_thread) } readData(); for(…) { readData(); pthread_create(worker_thread) … pthread_join(worker_thread) } 48
  • 49. Speed Multi-GPU • Data parallelization v.s. Model parallelization • Distribute the model, use the same data : Model Parallelism • Distribute the data, use the same model : Data Parallelism • Data parallelization & Gradient Average • One of the easiest way to use Multi-GPU • The result is same with using single GPU 49
  • 50. Parallelism 50 Data Parallelization Model Parallelization The good : Easy to implement The bad : Cost of sync increases with the number of GPU The good : Larger network can be trained
 The bad : Sync is necessary in all layers
  • 51. Parallelism 51 Mixing Data Parallelization and Model Parallelization (Krizhevsky, 2014)
  • 52. Speed Data Parallelization & Gradient Average Forward Pass Backward Pass Gradient Update data (256) Forward Pass Backward Pass Gradient Update data1 (128) Forward Pass Backward Pass Gradient data2(128) Average 52
  • 56. VUNO-Net 56 Structure Output Learning Convolution LSTM MD-LSTM(2D) Pooling Spatial Pyramid Pooling Fully Connection Concatenation Softmax Regression Connectionist Temporal Classification Multi-GPU Support Batch Normalization Parametric Rectifier Initialization for Rectifier Stochastic Gradient Descent Dropout Data Augmentation Open Source VUNO-net Only Features
  • 57. Performance The state-of-the-art performance at image & speech 84.3%82.2% VUNOWR1 Speech Recognition (TIMIT*) 82.1%79.3% VUNOWR2 Image Classification (CIFAR-100) Kaggle Top 1
 (2014) * TIMIT is a one of most popular benchmark dataset for speech recognition task (Texas Instrument - MIT) WR1 (World Record) - “Speech Recognitino with Deep Recurrent Neural Networks”, Alex Graves, ICCASP(2013) WR2 (World Record) - “kaggle competition: https://www.kaggle.com/c/cifar-10” 57
  • 58. Application We’ve achieved record breaking performance on medical image analysis. 58
  • 59. Application Whole Lung Quantification • Sliding window: Pixel level classification • But, context information is more important • Ongoing works MD-LSTM Recurrent CNN normal Emphysemav.s. 59
  • 60. Application Whole Lung Quantification 60 Original Image (CT) VUNO Golden Standard Example #1
  • 61. Application Whole Lung Quantification 61 Original Image (CT) VUNO Golden Standard Example #2
  • 62. Visualization SVM Features (22 features) 62 Histogram, Gradient, Run-length, Co-occurrence matrix, Cluster analysis, Top-hat transformation
  • 63. Visualization Activation of top hidden layer 63 200 hidden nodes
  • 65. We Are Hiring!! 65 Algorithm Engineer CUDA Programmer Application Developer Staff MemberBusiness Developer