SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
A	Powerful,	Flexible,	and	Intui5ve	
Deep	Learning	Framework
@	NVIDIA	GTC,	April	6th,	2016
Shohei	Hido	
Chief	Research	Officer	
Preferred	Networks,	Inc.
Overview
l  Chainer	is	a	Python-based	deep	learning	framework	
l  Chainer	v1.0	was	released	as	an	open	source	on	June	2015	
l  It	DOESN’T	rely	on	Theano,	unlike	other	Python	frameworks	
l  Chainer	uses	a	unique	scheme	named	Define-by-Run	
	
http://chainer.org/
l  Why	do	users	sOll	need	another	framework?	
l  How	different	and	effecOve	Chainer	is?	
2
Preferred Networks (PFN)
A startup that applies deep learning to industrial IoT
l  Founded: March 2014
l  Headquarter: Tokyo, Japan
l  U.S. Subsidiary: San Mateo, California
l  Company size: 35 engineers & researchers
l  Investors: Toyota, FANUC, NTT
Deep learning	 Industrial IoT	
3
Manufacturing
Automotive
Healthcare
Partnering with world-leading companies using Chainer
l  R&D	collaboraOon	on	industrial	problems	with	real-world	data	
̶  Specific	requirements,	modified	algorithms,	many	trials	and	errors,	etc	
̶  Different	from	making	general-purpose	recogniOon	system	
4
Toyota	 FANUC	
Panasonic	
NTT	
Cisco	 NVIDIA
Two types of background behind DL frameworks
1.	Scalability-oriented	
l  Use-cases	in	mind	
̶  Image/speech	recogniOon	system	
̶  Fast	DL	as	a	service	in	cloud	
l  Problem	type	
̶  A	few	general	applicaOons	
̶  10+	million	training	samples	
̶  10+	nodes	cluster	w/	fast	network	
l  Possible	boZleneck	
̶  Tuning	of	well-known	algorithms	
̶  Distributed	computaOon	for	
model/data-parallel	training	
2.	Flexibility-oriented	
l  Use-cases	in	mind	
̶  Algorithm	research	
̶  R&D	projects	for	new	products	
l  Problem	type	
̶  Various	specific	applicaOons	
̶  10+	k	training	samples	
̶  1	node	with	mulOple	GPUs	
l  Possible	boZleneck	
̶  Trial-and-error	in	prototyping	
̶  Debugging,	profiling	&	refactoring	
̶  (wait	Ome	during	compilaOon)
Designed for efficient research & development
l  Flexible:	new	kinds	of	complex	models	for	various	applicaOons	
l  IntuiOve:	rapid	prototyping	and	efficient	trial-and-error	
l  Powerful:	comparable	performance	for	1	node	&	mulO-GPUs		
6
Scalability-oriented Flexibility-oriented
Agenda
l  Deep	learning	framework	basics	
l  IntroducOon	to	Chainer	
l  CuPy:	NumPy-compaOble	GPU	library	
l  Performance	and	applicaOons
7
Neural network and computation
x1
xN
・・ h1
hH
・・・・
kM
k1
yM
y1
Forward computation
Backward computation
(backpropagation)
・・
・・
Input Hidden units Output
Text
Image
Sensor
Object:

Tulip	
Anomaly score:

0.35	
Category:

Sports	
・・
・・・・
8
Chainer focuses on network representation/training
l  Design	choices	for	deep	learning	frameworks	
̶  How	to	build	neural	networks?	
̶  How	to	train	neural	networks?	
̶  Which	text	format/language	for	modeling?		
̶  Which	language	for	compuOng?		
̶  Run	with	GPU?	
̶  Run	on	mulOple	GPUs?	
̶  Run	on	mulOple	compute	nodes?	
9
Building and training neural networks:
Computational graph construction is the key
1.  Construct	a	computaOonal	graph	
̶  Based	on	network	definiOon	given	by	users	
̶  Chains	of	funcOons	and	operaOons	on	input	variables	
2.  Compute	loss	and	gradients	
̶  Forward	computaOon	to	calculate	loss	for	a	minibatch	
̶  BackpropagaOon	gives	gradients	to	all	of	parameters	
3.  OpOmize	model	
̶  Update	each	parameter	with	the	gradient	
̶  Repeat	unOl	convergence	
Step 1. is the most important and there are many approaches
10
Building blocks
l  These	funcOonaliOes	are	very	similar	between	frameworks	
l  But	the	structure,	abstracOon	level,	and	interface	are	different	
l  It	comes	to	the	design	of	domain-specific	language	for	NN	
Array data structure
(vector/matrix/tensor)
Operations & functions
Network
(computational graph)
Optimizer
(SGD/AdaGrad/Adam)
11
Types of domain-specific language for neural networks
l  Text	DSL	
̶  Ex.	Caffe	(prototxt)	
̶  Ex.	CNTK	(NDL)
l  Symbolic	program	
̶  OperaOons	
on	symbols	
̶  Ex.	Theano	
̶  Ex.	TensorFlow	
l  ImperaOve	program	
̶  Direct	computaOons	
on	raw	data	arrays	
̶  Ex.	Torch.nn	
̶  Ex.	Chainer
#	Symbolic	definiOon	
A	=	Variable(‘A’)	
B	=	Variable(‘B’)	
C	=	B	*	A	
D	=	C	+	Constant(1)	
#	Compile	
f	=	compile(D)	
d	=	f(A=np.ones(10),	
									B=np.ones(10)	*	2)	
#	ImperaOve	declaraOon	
a	=	np.ones(10)	
b	=	np.ones(10)	*	2	
c	=	b	*	a	
d	=	c	+	1	
%%	DefiniOon	in	text	
f:	{		
			“A”:	“Variable”,	
			“B”:		“Variable”,	
			“C”:	[“B”,	“*”,	“A”],	
			“ret”:	[“C”,	“+”,	1]	
}	
#	Compile	
f	=	compile(“f.txt”)	
d	=	f(A=np.ones(10),	
									B=np.ones(10)	*	2)	
12
Ex. MXNet
Comparison of DSL type
DSL	type	 Pros.	 Cons.	
Text	DSL	
•  Human-readable	definiOon	
•  Non-programmer	can	easily	
edit	the	network	
•  Users	must	study	the	format	
•  Format	might	have	to	be	
extended	for	new	algorithms	
Internal	DSL	
Symbolic	
•  StaOc	analysis	at	compile		
•  OpOmizaOon	before	training	
•  Easy	to	parallelize	
•  Users	must	study	special	syntax		
•  May	need	more	efforts	to	
implement	new	algorithms	
ImperaOve	
•  Less	efforts	to	learn	syntax	
•  Easy	debugging	and	profiling	
•  Suitable	for	new	algorithms	
with	complex	logic	
•  Hard	to	opOmize	in	advance	
•  Less	efficient	in	memory	
allocaOon	and	parallelizaOon		
Chainer	is	at	the	extreme	end	of	imperaOve	program	for	high	flexibility
13
Agenda
l  Deep	learning	framework	basics	
l  IntroducOon	to	Chainer	
l  CuPy:	NumPy-compaOble	GPU	library	
l  Performance	and	applicaOons
14
Chainer as an open-source project
l  hZps://github.com/pfnet/chainer	
l  50	contributors	
l  1,277	stars	&	255	fork	
l  3,708	commits	
l  AcOve	development	&	release	for	last	10	months	
̶  v1.0.0	(June	2015)	to		v1.7.2	(March	2016)	
	
15
Original developer
Seiya Tokui
CuPy	
	
Chainer software stack
CPU NVIDIA	GPU
CUDA
cuDNN
BLAS
NumPy
Chainer
l  Chainer	is	built	on	top	of	NumPy	and	CUDA	
l  CuPy	is	also	introduced	as	an	equivalent	of	NumPy	on	GPU
16
Run
Define
Graph build scheme (1/2) - Define-and-Run:
Most of frameworks use this scheme (Chainer does not)
l  Define:	build	a	computaOonal	graph	based	on	definiOon	
l  Run:	update	the	model	(parameters)	using	training	dataset
Network	
definiOon
ComputaOonal	
graph
Gradient	
funcOon
Parameters
ComputaOonal	
graph
Gradient	
funcOon
Parameters
Training	
data
Update	
Loss	&	gradient	
Auto	differenOaOon	
17
Define-by-Run
Graph build scheme (2/2) - Define-by-Run:
Computational graph construction on the fly
l  No	graph	is	constructed	before	training	
l  Instead,	the	graph	is	built	at	each	forward	computaOon		
l  ComputaOonal	graph	can	be	modified	dynamically	
for	each	iteraOon/sample	or	depending	on	some	condiOons	
Model	
definiOon
ComputaOonal	
graph
Gradient	
funcOon
Parameters
Training	
data
Update	
Dynamic			change	
CondiOons	
18
Define-by-Run example: MLP for MNIST
l  Only	transformaOons	between	units	are	set	before	training	
l  ConnecOon	is	given	as	forward	computaOon
l1 = Linear(784, n_units)
l2 = Linear(n_units, 10))
Linear l2Linear l1
x yh1
W bias
0
5
9	
W bias
ReLU	
def forward(x):
h1 = ReLU(l1(x))
return l2(h1)
19
Define-by-Run:
An interpreted language for neural network
l  Idea	
̶  Forward	computaOon	actually	goes	through	computaOonal	graph	
̶  By	remembering	the	history,	the	actual	graph	can	be	obtained	
l  Advantage	
̶  Flexibility	for	new	algorithms	with	complex	components	
u  Ex.	recurrent,	recursive,	aZenOon,	memory,	adversarial,	etc	
̶  IntuiOve	coding	with	highly	imperaOve	network	definiOon	
u  Ex.	stochasOc	network	of	which	graph	changes	for	each	iteraOon	
l  Current	drawbacks	
̶  Graph	is	generated	every	Ome	also	for	fixed	networks	
̶  No	opOmizaOon	even	for	staOc	part	of	graphs	
u  JIT-like	analysis	and	subgraph	cache	might	be	useful	
20
Basic components (1/2): Variable and Function
l  Variable	
̶  Variable	wraps	arrays	(.data)	
̶  It	remembers	parent	funcOon	
(.creator)	
̶  It	will	be	assigned	gradient	(.grad)	
̶  It	keeps	track	of	not	only	data	
but	also	computaOons	
l  FuncOon	
̶  TransformaOon	between	Variable	
̶  Stateless	
̶  e.g.	sigmoid,	tanh,	ReLU,	
								maxpooling,	dropout
Function
x y
Variable
x yh1
0
5
9	
21
Chain (MLP2)
Basic components (2/2): Link and Chain
l  Link	=	funcOon	with	state	
̶  Parameters	are	also	Variable		
and	gradients	will	be	assigned	
̶  e.g.	Linear	(fully-connected),		LSTM			
							ConvoluOon2d,	word-embedding	
l  Chain	=	network	
̶  Chain	has	a	set	of	child	Link	
̶  Forward	computaOon	is	defined	
in	.	__call__()	
̶  e.g.	MLP2,	AlexNet,	GoogLeNet,	
								RNNLM,	seq2seq,		
Link
(Linear)
y=f(W*x+b)
x y
W b
Linear l2Linear l1
yh1
W bias
	
W bias
ReLU
22
Backpropagation through computational graph
l  Consider	an	objecOve	(Link.Linear):		L = f(x * w + b)
l  This	computes	the	value	of	L	in	forward	computaOon,	and	
simultaneously	builds	the	following	computaOonal	graph	
	
l  The	gradient	of	L	can	be	computed	with	respect	to	
	any	variables	by	backpropagaOon	
l  Then	the	opOmizer	updates	the	value	of	parameters	
*x
W
+
b
f L
is	Variable	
is	FuncOon	
23
Code sample (1/4): Multi-layer perceptron
class MLP2(Chain):
def __init__(self):
super(MLP2, self).__init__(
l1=L.Linear(784, 100),
l2=L.Linear(100, 10),
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
y = self.l2(h1)
return y
class Classifier(Chain):
def __init__(self, predictor):
super(Classifier, self).
__init__(predictor=predictor)
def __call__(self, x, t):
y = self.predictor(x)
self.accuracy = F.accuracy(y, t)
self.loss = F.softmax_cross_entropy(y, t)
return self.loss, self.accuracy
	
# Model and optimizer setup
model = Classifier(MLP2())
optimizer = optimizers.SGD()
optimizer.setup(model)
# training loop with minibatch
for i in range(0, datasize, batchsize):
x = Variable(x_tr[i:i+batchsize])
t = Variable(y_tr[i:i+batchsize])
model.zerograds()
loss, acc = model(x, t)
loss.backward()
optimizer.update()
	
Chain (MLP2)
Linear l2Linear l1
yh1
W bias
	
W bias
ReLU
24
Code sample (2/4): Convolutional neural network
class AlexNet(Chain):
def __init__(self):
super(AlexNet, self).__init__(
conv1=L.Convolution2D(3, 96, 11, stride=4),
conv2=L.Convolution2D(96, 256, 5, pad=2),
conv3=L.Convolution2D(256, 384, 3, pad=1),
conv4=L.Convolution2D(384, 384, 3, pad=1),
conv5=L.Convolution2D(384, 256, 3, pad=1),
fc6=L.Linear(9216, 4096),
fc7=L.Linear(4096, 4096),
fc8=L.Linear(4096, 1000),
)
def __call__(self, x, t):
h = F.max_pooling_2d(F.relu(
F.local_response_normalization(self.conv1(x))), 3, stride=2)
h = F.max_pooling_2d(F.relu(
F.local_response_normalization(self.conv2(h))), 3, stride=2)
h = F.relu(self.conv3(h))
h = F.relu(self.conv4(h))
h = F.max_pooling_2d(F.relu(self.conv5(h)), 3, stride=2)
h = F.dropout(F.relu(self.fc6(h)), train=self.train)
h = F.dropout(F.relu(self.fc7(h)), train=self.train)
y = self.fc8(h)
return y
* ImageNet Classification with Deep Convolutional Neural Networks
http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf
conv2d	
conv2d	
conv2d	
conv2d	
conv2d	
linear	
linear	
25
linear
Code sample (3/4): Recurrent neural network
class SimpleRNN(Chain):
def __init__(self, n_vocab, n_units):
super(SimpleRNN, self).__init__(
embed=L.EmbedID(n_vocab, n_units)
x2h=L.Linear(n_units, n_units),
h2h=L.Linear(n_units, n_units),
h2y=L.Linear(n_units, n_vocab),)
self.h = None
def __call__(self, x):
y, h_new = self.fwd_one_step(x, self.h)
self.h = h_new
return y
def fwd_one_step(self, x, h):
x = F.tanh(self.embed(x))
if h is None:
h = F.tanh(self.x2h(x))
else:
h = F.tanh(self.x2h(x) + self.h2h(h))
y = F.softmax(self.h2y(h))
return y, h	
x_1 h y_1
x_2 h y_2
x_3 h y_3
	
x_4 h y_4
BPTT	length	=	3
Input	word OutputRecurrent	state
# Truncated BPTT (length=3)
for i in range(0, datasize, batchsize):
...
accum_loss += model(x, t)
if i % bptt_length == 0:
model.zerograds()
accum_loss.backward()
accum_loss.unchain_backward()
optimizer.update()
	
26
Code sample (4/4): Deep Networks with Stochastic Depth
A paper published on arXiv, March 30, 2016
l  A	variant	of	Residual	Net	that	skips	connecOons	stochasOcally	
̶  Outperformed	the	original	Residual	Net	(ImageNet	2015	winner,	MSR)	
̶  StochasOc	skip:	
Taken from http://arxiv.org/abs/1603.09382v2
G. Huang et al.	
# Mock code in Chainer
class StochasticResNet(Chain):
def __init__(self, prob, size, …):
super(StochasticResNet, size, …).__init__(
## Define f[i] as same for Residual Net )
self.p = prob # Survival probabilities
def __call__(self, h):
for i in range(self.size):
b = numpy.random.binomial(1, self.p[i])
c = self.f[i](h) + h if b == 1 else h
h = F.relu(c)
return h
w/ survival probability: 	
27
Miscellaneous
l  Other	features	
̶  Install	with	pip	in	one	line:		
̶  MulO-GPU	support	by	explicitly	selecOng	the	ID	to	use		
̶  Pre-trained	Caffe	model	import	from	Model	Zoo	
̶  Model	serializaOon	&	save	&	load	:	HDF5	or	NumPy	npz	
l  Future	direcOon	(not	only	for	Chainer)	
̶  JIT-like	opOmizaOon	during	Define-by-Run	
̶  Memory	consumpOon	reducOon	(GPU	memory	is	sOll	small)	
̶  Handling	variable-length	inputs	without	minibatch	
̶  Maximizing	performance	on	mulO-node	&	mulO-GPU	environment	
$ pip install chainer
28
Agenda
l  Deep	learning	framework	basics	
l  IntroducOon	to	Chainer	
l  CuPy:	NumPy-compaOble	GPU	library	
l  Performance	and	applicaOons
29
CuPy: (partially-)NumPy-compatible GPU library
l  MoOvaOon:	NumPy	+	CUDA	=	CuPy	
̶  NumPy	is	the	standard	library	in	Python	for	numerical	computaOon	
̶  CUDA	is	the	standard	APIs	for	using	GPU	for	high-performance		
̶  Unfortunately,	NumPy	does	NOT	work	with	CUDA	
l  CuPy	supports:	
̶  Fast	computaOon	using	NVIDIA’s	cuBLAS	and	cuDNN	
̶  Array	indexing,	slicing,	transpose,	and	reshape	
̶  Most	of	operaOons/funcOons	in	NumPy	
u  Chainer	v1.7.2	already	supports	more	than	170	funcOons	
̶  User-defined	funcOons	and	kernels	
̶  all	dtypes,	broadcasOng,	memory	pool,	etc	
30
How to use CuPy
l  Usage	of	CuPy:	just	replace	NumPy with	CuPy	
l  Conversion	between	numpy.ndarray	and	cupy.ndarray
	
l  Ex.	CPU/GPU-agnosOc	logsumexp	funcOon
def logsumexp(x, axis=None):
xp = cuda.get_array_module(x) #Get CuPy or NumPy
x_max = x.max(axis)
exp_sum = xp.exp(x - x_max).sum(axis)
return x_max + xp.log(exp_sum)	
import numpy, cupy
enable_cupy = True
xp = cupy if enable_cupy else numpy	
w_c = cupy.asarray(numpy.ones(10)) # cupy.ndarray
w_n = cupy.asnumpy(cupy.ones(10)) # numpy.ndarray 	
31
CuPy implementation:
Optimized for performance & NumPy-compatibility
l  Use	Cython	for	cupy.core	&	cupy.cuda	
l  Dynamic	code	generaOon	&	compile	
̶  CUDA	code	is	generated	for	specific	tensor	dimension	&	data	type	
̶  On-the-fly	compile	by	nvcc	and	binary	cache	(faster	awer	1st	use)	
CUDA	libraries	(cuBLAS,	cuRAND,	cuDNN)
ndarray
ufunc,	elementwise,	reduc5on	
CUDA	Python	wrapper cupy.cuda	
cupy.core	
Tensor	opera5ons	&	func5ons cupy	
32
CuPy performance on linear algebra:
5 to 25 times faster than NumPy
def test(xp):
a = xp.arange(1000000).reshape(1000, -1)
return a.T * 2
test(numpy)
t1 = datetime.datetime.now()
for i in range(1000):
test(numpy)
t2 = datetime.datetime.now()
print(t2 -t1)
test(cupy)
t1 = datetime.datetime.now()
for i in range(1000):
test(cupy)
t2 = datetime.datetime.now()
print(t2 -t1)
msec speed	
up
NumPy	 2,929 1.0
CuPy 585 5.0
CuPy	+	
Memory	Pool
123 23.8
Intel	Core	i7-4790	@3.60GHz,32GB,	GeForce	GTX	970	
33
Use CuPy for GPU-based computation
l  Support	three	paZerns	as	wrappers	
̶  ElementwiseKernel:	for	element-wise	computaOon		
̶  ReducOonKernel:	for	reduce	operaOon	along	axis	
̶  ufunc:	universal	funcOon	as	in	Numpy	
l  Ex.	definiOon	of	an	element-wise	funcOon		
l  Usage	(automaOc	broadcast	and	type	check	are	supported)	
squared_diff = cupy.ElementwiseKernel(
‘float32 x, float32 y’, # Input
‘float32 z’, # Output
‘z = (x - y) * (x - y)’, # Operation
‘squared_diff’) # Name
squared_diff(cupy.arange(10), 10)
34
Agenda
l  Deep	learning	framework	basics	
l  IntroducOon	to	Chainer	
l  CuPy:	NumPy-compaOble	GPU	library	
l  Performance	and	applicaOons
35
Public benchmark results (CNN):
Chainer shows comparable performance
l  Forward	computaOon	is	almost	the	same	with	TensorFlow	
l  Training	with	backward	computaOon	is	slower,	but	it	can	be	
offset	by	no	compilaOon	Ome	while	debugging/tuning	
0	
200	
400	
600	
800	
1000	
1200	
AlexNet	 GoogLeNet	 VGG-A	 OverFeat	
Torch	
TensorFlow	
Chainer	
Caffe	(naCve)	
0	
200	
400	
600	
800	
1000	
1200	
AlexNet	 GoogLeNet	 VGG-A	 OverFeat	
Torch	
TensorFlow	
Chainer	
Caffe	(naCve)	
Forward computation (msec) Backward computation (msec)
Taken from https://github.com/soumith/convnet-benchmarks, using cuDNN except Caffe	 36
Chainer can benefit from latest CUDA libraries:
Ex. Winograd algorithm in cuDNN v5
l  Conv3x3	is	common	in	CNNs	&	now	computed	with	Winograd	
l  State-of-the-art	CNN	models	(e.g.,	GoogLeNet,	VGG-A)	
can	be	accelerated	up	to	2.0x	at	test	Ome	(forward	only)	
0	
100	
200	
300	
400	
500	
600	
AlexNet	 GoogLeNet	 VGG-A	 OverFeat	
cuDNN	v4	
cuDNN	v5	
0	
100	
200	
300	
400	
500	
600	
AlexNet	 GoogLeNet	 VGG-A	 OverFeat	
cuDNN	v4	
cuDNN	v5	
Forward computation (msec) Backward computation (msec)
Independently measured by a modified version of soumith/convnet-benchmarks
cuDNN v5 can be used in Chainer v1.8.0	 37
Algorithm implementation in Chainer:
A Neural Algorithm of Artistic Style (Gatys et al., 2015)
l  hZps://github.com/maZya/chainer-gogh	
Content
image (cat)
Style
image
New
artistic
image
+	 =	
Main code (45 lines)	 38
l  Many	collaboraOons	are	on-going	w/	Chainer-based		
computer	vision,	deep	reinforcement	learning,	etc…	
l  Ex.	1	Chainer-controlled	toy	cars	in	Toyota	booth	at	CES	2016	
l  Ex.	2	Highly	accurate	FANUC’s	bin-picking	robot	at	IREX	2015	
̶  8	hours	training	to	reach	expert-level,	commercializaOon	by	2016	end
Chainer in industry:
Used in demonstrations & being commercialized
http://tinyurl.com/pfn-irex15http://tinyurl.com/pfn-ces16	
39
Summary
l  Chainer	is	a	Python-based	deep	learning	framework	
with	dynamic	network	construcOon	scheme	and	CuPy	
l  It	is	designed	for	efficient	research	and	prototyping	while	
keeping	comparable	performance	thanks	to	NVIDIA	GPU	
l  Official	web:	hZp://chainer.org/	
l  Github:	hZps://github.com/pfnet/chainer	
Your	contribuOons	will	be	appreciated	&	we	are	hiring!
40

Weitere ähnliche Inhalte

Was ist angesagt?

【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabImry Kissos
 
On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidYufeng Guo
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkitde:code 2017
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsAtsunori Kanemura
 

Was ist angesagt? (20)

【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
 
Pitch v2.2
Pitch v2.2Pitch v2.2
Pitch v2.2
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and Matlab
 
On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on Android
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and Basics
 

Andere mochten auch

知識を紡ぐための言語処理と、 そのための言語資源
知識を紡ぐための言語処理と、そのための言語資源知識を紡ぐための言語処理と、そのための言語資源
知識を紡ぐための言語処理と、 そのための言語資源Koji Matsuda
 
ディープラーニングの最新動向
ディープラーニングの最新動向ディープラーニングの最新動向
ディープラーニングの最新動向Preferred Networks
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA Japan
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたsamacoba1983
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02Yuta Kashino
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstationYusuke HIDESHIMA
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例nlab_utokyo
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of ChainerKenta Oono
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門Yuya Unno
 

Andere mochten auch (11)

知識を紡ぐための言語処理と、 そのための言語資源
知識を紡ぐための言語処理と、そのための言語資源知識を紡ぐための言語処理と、そのための言語資源
知識を紡ぐための言語処理と、 そのための言語資源
 
DeepLearningDay2016Spring
DeepLearningDay2016SpringDeepLearningDay2016Spring
DeepLearningDay2016Spring
 
ディープラーニングの最新動向
ディープラーニングの最新動向ディープラーニングの最新動向
ディープラーニングの最新動向
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
 

Ähnlich wie Chainer GTC 2016

Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Preferred Networks
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning FrameworksSeiya Tokui
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVVishal Polley
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Ads team12 final_project_presentation
Ads team12 final_project_presentationAds team12 final_project_presentation
Ads team12 final_project_presentationPriti Agarwal
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusDatabricks
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Pose extraction for real time workout assistant - milestone 1
Pose extraction for real time workout assistant - milestone 1Pose extraction for real time workout assistant - milestone 1
Pose extraction for real time workout assistant - milestone 1Zachary Christmas
 
IRJET- Python Libraries and Packages for Deep Learning-A Survey
IRJET-  	  Python Libraries and Packages for Deep Learning-A SurveyIRJET-  	  Python Libraries and Packages for Deep Learning-A Survey
IRJET- Python Libraries and Packages for Deep Learning-A SurveyIRJET Journal
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCinside-BigData.com
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisVivek Raja P S
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
Continuous Delivery to the Cloud: Automate Thru Production with CI + SpinnakerContinuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
Continuous Delivery to the Cloud: Automate Thru Production with CI + SpinnakerVMware Tanzu
 
Big Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsBig Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsOPNFV
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDIRJET Journal
 

Ähnlich wie Chainer GTC 2016 (20)

Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCV
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Ads team12 final_project_presentation
Ads team12 final_project_presentationAds team12 final_project_presentation
Ads team12 final_project_presentation
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Pose extraction for real time workout assistant - milestone 1
Pose extraction for real time workout assistant - milestone 1Pose extraction for real time workout assistant - milestone 1
Pose extraction for real time workout assistant - milestone 1
 
IRJET- Python Libraries and Packages for Deep Learning-A Survey
IRJET-  	  Python Libraries and Packages for Deep Learning-A SurveyIRJET-  	  Python Libraries and Packages for Deep Learning-A Survey
IRJET- Python Libraries and Packages for Deep Learning-A Survey
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACC
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
Continuous Delivery to the Cloud: Automate Thru Production with CI + SpinnakerContinuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
 
Big Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsBig Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and Analytics
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
 

Mehr von Shohei Hido

CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUShohei Hido
 
Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Shohei Hido
 
ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術Shohei Hido
 
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFAShohei Hido
 
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 TokyoSoftware for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 TokyoShohei Hido
 
How AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industriesHow AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industriesShohei Hido
 
NIPS2015概要資料
NIPS2015概要資料NIPS2015概要資料
NIPS2015概要資料Shohei Hido
 
プロダクトマネージャのお仕事
プロダクトマネージャのお仕事プロダクトマネージャのお仕事
プロダクトマネージャのお仕事Shohei Hido
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントShohei Hido
 
PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料Shohei Hido
 
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...Shohei Hido
 
機械学習CROSS 後半資料
機械学習CROSS 後半資料機械学習CROSS 後半資料
機械学習CROSS 後半資料Shohei Hido
 
機械学習CROSS 前半資料
機械学習CROSS 前半資料機械学習CROSS 前半資料
機械学習CROSS 前半資料Shohei Hido
 
Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門Shohei Hido
 
Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤Shohei Hido
 
今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しましたShohei Hido
 
さらば!データサイエンティスト
さらば!データサイエンティストさらば!データサイエンティスト
さらば!データサイエンティストShohei Hido
 
ICML2013読み会 開会宣言
ICML2013読み会 開会宣言ICML2013読み会 開会宣言
ICML2013読み会 開会宣言Shohei Hido
 
ビッグデータはどこまで効率化できるか?
ビッグデータはどこまで効率化できるか?ビッグデータはどこまで効率化できるか?
ビッグデータはどこまで効率化できるか?Shohei Hido
 

Mehr von Shohei Hido (20)

CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門
 
NIPS2017概要
NIPS2017概要NIPS2017概要
NIPS2017概要
 
ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術
 
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
 
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 TokyoSoftware for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
 
How AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industriesHow AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industries
 
NIPS2015概要資料
NIPS2015概要資料NIPS2015概要資料
NIPS2015概要資料
 
プロダクトマネージャのお仕事
プロダクトマネージャのお仕事プロダクトマネージャのお仕事
プロダクトマネージャのお仕事
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイント
 
PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料
 
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
 
機械学習CROSS 後半資料
機械学習CROSS 後半資料機械学習CROSS 後半資料
機械学習CROSS 後半資料
 
機械学習CROSS 前半資料
機械学習CROSS 前半資料機械学習CROSS 前半資料
機械学習CROSS 前半資料
 
Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門
 
Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤
 
今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました
 
さらば!データサイエンティスト
さらば!データサイエンティストさらば!データサイエンティスト
さらば!データサイエンティスト
 
ICML2013読み会 開会宣言
ICML2013読み会 開会宣言ICML2013読み会 開会宣言
ICML2013読み会 開会宣言
 
ビッグデータはどこまで効率化できるか?
ビッグデータはどこまで効率化できるか?ビッグデータはどこまで効率化できるか?
ビッグデータはどこまで効率化できるか?
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Chainer GTC 2016

  • 2. Overview l  Chainer is a Python-based deep learning framework l  Chainer v1.0 was released as an open source on June 2015 l  It DOESN’T rely on Theano, unlike other Python frameworks l  Chainer uses a unique scheme named Define-by-Run http://chainer.org/ l  Why do users sOll need another framework? l  How different and effecOve Chainer is? 2
  • 3. Preferred Networks (PFN) A startup that applies deep learning to industrial IoT l  Founded: March 2014 l  Headquarter: Tokyo, Japan l  U.S. Subsidiary: San Mateo, California l  Company size: 35 engineers & researchers l  Investors: Toyota, FANUC, NTT Deep learning Industrial IoT 3 Manufacturing Automotive Healthcare
  • 4. Partnering with world-leading companies using Chainer l  R&D collaboraOon on industrial problems with real-world data ̶  Specific requirements, modified algorithms, many trials and errors, etc ̶  Different from making general-purpose recogniOon system 4 Toyota FANUC Panasonic NTT Cisco NVIDIA
  • 5. Two types of background behind DL frameworks 1. Scalability-oriented l  Use-cases in mind ̶  Image/speech recogniOon system ̶  Fast DL as a service in cloud l  Problem type ̶  A few general applicaOons ̶  10+ million training samples ̶  10+ nodes cluster w/ fast network l  Possible boZleneck ̶  Tuning of well-known algorithms ̶  Distributed computaOon for model/data-parallel training 2. Flexibility-oriented l  Use-cases in mind ̶  Algorithm research ̶  R&D projects for new products l  Problem type ̶  Various specific applicaOons ̶  10+ k training samples ̶  1 node with mulOple GPUs l  Possible boZleneck ̶  Trial-and-error in prototyping ̶  Debugging, profiling & refactoring ̶  (wait Ome during compilaOon)
  • 6. Designed for efficient research & development l  Flexible: new kinds of complex models for various applicaOons l  IntuiOve: rapid prototyping and efficient trial-and-error l  Powerful: comparable performance for 1 node & mulO-GPUs 6 Scalability-oriented Flexibility-oriented
  • 7. Agenda l  Deep learning framework basics l  IntroducOon to Chainer l  CuPy: NumPy-compaOble GPU library l  Performance and applicaOons 7
  • 8. Neural network and computation x1 xN ・・ h1 hH ・・・・ kM k1 yM y1 Forward computation Backward computation (backpropagation) ・・ ・・ Input Hidden units Output Text Image Sensor Object:
 Tulip Anomaly score:
 0.35 Category:
 Sports ・・ ・・・・ 8
  • 9. Chainer focuses on network representation/training l  Design choices for deep learning frameworks ̶  How to build neural networks? ̶  How to train neural networks? ̶  Which text format/language for modeling? ̶  Which language for compuOng? ̶  Run with GPU? ̶  Run on mulOple GPUs? ̶  Run on mulOple compute nodes? 9
  • 10. Building and training neural networks: Computational graph construction is the key 1.  Construct a computaOonal graph ̶  Based on network definiOon given by users ̶  Chains of funcOons and operaOons on input variables 2.  Compute loss and gradients ̶  Forward computaOon to calculate loss for a minibatch ̶  BackpropagaOon gives gradients to all of parameters 3.  OpOmize model ̶  Update each parameter with the gradient ̶  Repeat unOl convergence Step 1. is the most important and there are many approaches 10
  • 11. Building blocks l  These funcOonaliOes are very similar between frameworks l  But the structure, abstracOon level, and interface are different l  It comes to the design of domain-specific language for NN Array data structure (vector/matrix/tensor) Operations & functions Network (computational graph) Optimizer (SGD/AdaGrad/Adam) 11
  • 12. Types of domain-specific language for neural networks l  Text DSL ̶  Ex. Caffe (prototxt) ̶  Ex. CNTK (NDL) l  Symbolic program ̶  OperaOons on symbols ̶  Ex. Theano ̶  Ex. TensorFlow l  ImperaOve program ̶  Direct computaOons on raw data arrays ̶  Ex. Torch.nn ̶  Ex. Chainer # Symbolic definiOon A = Variable(‘A’) B = Variable(‘B’) C = B * A D = C + Constant(1) # Compile f = compile(D) d = f(A=np.ones(10), B=np.ones(10) * 2) # ImperaOve declaraOon a = np.ones(10) b = np.ones(10) * 2 c = b * a d = c + 1 %% DefiniOon in text f: { “A”: “Variable”, “B”: “Variable”, “C”: [“B”, “*”, “A”], “ret”: [“C”, “+”, 1] } # Compile f = compile(“f.txt”) d = f(A=np.ones(10), B=np.ones(10) * 2) 12 Ex. MXNet
  • 13. Comparison of DSL type DSL type Pros. Cons. Text DSL •  Human-readable definiOon •  Non-programmer can easily edit the network •  Users must study the format •  Format might have to be extended for new algorithms Internal DSL Symbolic •  StaOc analysis at compile •  OpOmizaOon before training •  Easy to parallelize •  Users must study special syntax •  May need more efforts to implement new algorithms ImperaOve •  Less efforts to learn syntax •  Easy debugging and profiling •  Suitable for new algorithms with complex logic •  Hard to opOmize in advance •  Less efficient in memory allocaOon and parallelizaOon Chainer is at the extreme end of imperaOve program for high flexibility 13
  • 14. Agenda l  Deep learning framework basics l  IntroducOon to Chainer l  CuPy: NumPy-compaOble GPU library l  Performance and applicaOons 14
  • 15. Chainer as an open-source project l  hZps://github.com/pfnet/chainer l  50 contributors l  1,277 stars & 255 fork l  3,708 commits l  AcOve development & release for last 10 months ̶  v1.0.0 (June 2015) to v1.7.2 (March 2016) 15 Original developer Seiya Tokui
  • 16. CuPy Chainer software stack CPU NVIDIA GPU CUDA cuDNN BLAS NumPy Chainer l  Chainer is built on top of NumPy and CUDA l  CuPy is also introduced as an equivalent of NumPy on GPU 16
  • 17. Run Define Graph build scheme (1/2) - Define-and-Run: Most of frameworks use this scheme (Chainer does not) l  Define: build a computaOonal graph based on definiOon l  Run: update the model (parameters) using training dataset Network definiOon ComputaOonal graph Gradient funcOon Parameters ComputaOonal graph Gradient funcOon Parameters Training data Update Loss & gradient Auto differenOaOon 17
  • 18. Define-by-Run Graph build scheme (2/2) - Define-by-Run: Computational graph construction on the fly l  No graph is constructed before training l  Instead, the graph is built at each forward computaOon l  ComputaOonal graph can be modified dynamically for each iteraOon/sample or depending on some condiOons Model definiOon ComputaOonal graph Gradient funcOon Parameters Training data Update Dynamic change CondiOons 18
  • 19. Define-by-Run example: MLP for MNIST l  Only transformaOons between units are set before training l  ConnecOon is given as forward computaOon l1 = Linear(784, n_units) l2 = Linear(n_units, 10)) Linear l2Linear l1 x yh1 W bias 0 5 9 W bias ReLU def forward(x): h1 = ReLU(l1(x)) return l2(h1) 19
  • 20. Define-by-Run: An interpreted language for neural network l  Idea ̶  Forward computaOon actually goes through computaOonal graph ̶  By remembering the history, the actual graph can be obtained l  Advantage ̶  Flexibility for new algorithms with complex components u  Ex. recurrent, recursive, aZenOon, memory, adversarial, etc ̶  IntuiOve coding with highly imperaOve network definiOon u  Ex. stochasOc network of which graph changes for each iteraOon l  Current drawbacks ̶  Graph is generated every Ome also for fixed networks ̶  No opOmizaOon even for staOc part of graphs u  JIT-like analysis and subgraph cache might be useful 20
  • 21. Basic components (1/2): Variable and Function l  Variable ̶  Variable wraps arrays (.data) ̶  It remembers parent funcOon (.creator) ̶  It will be assigned gradient (.grad) ̶  It keeps track of not only data but also computaOons l  FuncOon ̶  TransformaOon between Variable ̶  Stateless ̶  e.g. sigmoid, tanh, ReLU, maxpooling, dropout Function x y Variable x yh1 0 5 9 21
  • 22. Chain (MLP2) Basic components (2/2): Link and Chain l  Link = funcOon with state ̶  Parameters are also Variable and gradients will be assigned ̶  e.g. Linear (fully-connected), LSTM ConvoluOon2d, word-embedding l  Chain = network ̶  Chain has a set of child Link ̶  Forward computaOon is defined in . __call__() ̶  e.g. MLP2, AlexNet, GoogLeNet, RNNLM, seq2seq, Link (Linear) y=f(W*x+b) x y W b Linear l2Linear l1 yh1 W bias W bias ReLU 22
  • 23. Backpropagation through computational graph l  Consider an objecOve (Link.Linear): L = f(x * w + b) l  This computes the value of L in forward computaOon, and simultaneously builds the following computaOonal graph l  The gradient of L can be computed with respect to any variables by backpropagaOon l  Then the opOmizer updates the value of parameters *x W + b f L is Variable is FuncOon 23
  • 24. Code sample (1/4): Multi-layer perceptron class MLP2(Chain): def __init__(self): super(MLP2, self).__init__( l1=L.Linear(784, 100), l2=L.Linear(100, 10), ) def __call__(self, x): h1 = F.relu(self.l1(x)) y = self.l2(h1) return y class Classifier(Chain): def __init__(self, predictor): super(Classifier, self). __init__(predictor=predictor) def __call__(self, x, t): y = self.predictor(x) self.accuracy = F.accuracy(y, t) self.loss = F.softmax_cross_entropy(y, t) return self.loss, self.accuracy # Model and optimizer setup model = Classifier(MLP2()) optimizer = optimizers.SGD() optimizer.setup(model) # training loop with minibatch for i in range(0, datasize, batchsize): x = Variable(x_tr[i:i+batchsize]) t = Variable(y_tr[i:i+batchsize]) model.zerograds() loss, acc = model(x, t) loss.backward() optimizer.update() Chain (MLP2) Linear l2Linear l1 yh1 W bias W bias ReLU 24
  • 25. Code sample (2/4): Convolutional neural network class AlexNet(Chain): def __init__(self): super(AlexNet, self).__init__( conv1=L.Convolution2D(3, 96, 11, stride=4), conv2=L.Convolution2D(96, 256, 5, pad=2), conv3=L.Convolution2D(256, 384, 3, pad=1), conv4=L.Convolution2D(384, 384, 3, pad=1), conv5=L.Convolution2D(384, 256, 3, pad=1), fc6=L.Linear(9216, 4096), fc7=L.Linear(4096, 4096), fc8=L.Linear(4096, 1000), ) def __call__(self, x, t): h = F.max_pooling_2d(F.relu( F.local_response_normalization(self.conv1(x))), 3, stride=2) h = F.max_pooling_2d(F.relu( F.local_response_normalization(self.conv2(h))), 3, stride=2) h = F.relu(self.conv3(h)) h = F.relu(self.conv4(h)) h = F.max_pooling_2d(F.relu(self.conv5(h)), 3, stride=2) h = F.dropout(F.relu(self.fc6(h)), train=self.train) h = F.dropout(F.relu(self.fc7(h)), train=self.train) y = self.fc8(h) return y * ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf conv2d conv2d conv2d conv2d conv2d linear linear 25 linear
  • 26. Code sample (3/4): Recurrent neural network class SimpleRNN(Chain): def __init__(self, n_vocab, n_units): super(SimpleRNN, self).__init__( embed=L.EmbedID(n_vocab, n_units) x2h=L.Linear(n_units, n_units), h2h=L.Linear(n_units, n_units), h2y=L.Linear(n_units, n_vocab),) self.h = None def __call__(self, x): y, h_new = self.fwd_one_step(x, self.h) self.h = h_new return y def fwd_one_step(self, x, h): x = F.tanh(self.embed(x)) if h is None: h = F.tanh(self.x2h(x)) else: h = F.tanh(self.x2h(x) + self.h2h(h)) y = F.softmax(self.h2y(h)) return y, h x_1 h y_1 x_2 h y_2 x_3 h y_3 x_4 h y_4 BPTT length = 3 Input word OutputRecurrent state # Truncated BPTT (length=3) for i in range(0, datasize, batchsize): ... accum_loss += model(x, t) if i % bptt_length == 0: model.zerograds() accum_loss.backward() accum_loss.unchain_backward() optimizer.update() 26
  • 27. Code sample (4/4): Deep Networks with Stochastic Depth A paper published on arXiv, March 30, 2016 l  A variant of Residual Net that skips connecOons stochasOcally ̶  Outperformed the original Residual Net (ImageNet 2015 winner, MSR) ̶  StochasOc skip: Taken from http://arxiv.org/abs/1603.09382v2 G. Huang et al. # Mock code in Chainer class StochasticResNet(Chain): def __init__(self, prob, size, …): super(StochasticResNet, size, …).__init__( ## Define f[i] as same for Residual Net ) self.p = prob # Survival probabilities def __call__(self, h): for i in range(self.size): b = numpy.random.binomial(1, self.p[i]) c = self.f[i](h) + h if b == 1 else h h = F.relu(c) return h w/ survival probability: 27
  • 28. Miscellaneous l  Other features ̶  Install with pip in one line: ̶  MulO-GPU support by explicitly selecOng the ID to use ̶  Pre-trained Caffe model import from Model Zoo ̶  Model serializaOon & save & load : HDF5 or NumPy npz l  Future direcOon (not only for Chainer) ̶  JIT-like opOmizaOon during Define-by-Run ̶  Memory consumpOon reducOon (GPU memory is sOll small) ̶  Handling variable-length inputs without minibatch ̶  Maximizing performance on mulO-node & mulO-GPU environment $ pip install chainer 28
  • 29. Agenda l  Deep learning framework basics l  IntroducOon to Chainer l  CuPy: NumPy-compaOble GPU library l  Performance and applicaOons 29
  • 30. CuPy: (partially-)NumPy-compatible GPU library l  MoOvaOon: NumPy + CUDA = CuPy ̶  NumPy is the standard library in Python for numerical computaOon ̶  CUDA is the standard APIs for using GPU for high-performance ̶  Unfortunately, NumPy does NOT work with CUDA l  CuPy supports: ̶  Fast computaOon using NVIDIA’s cuBLAS and cuDNN ̶  Array indexing, slicing, transpose, and reshape ̶  Most of operaOons/funcOons in NumPy u  Chainer v1.7.2 already supports more than 170 funcOons ̶  User-defined funcOons and kernels ̶  all dtypes, broadcasOng, memory pool, etc 30
  • 31. How to use CuPy l  Usage of CuPy: just replace NumPy with CuPy l  Conversion between numpy.ndarray and cupy.ndarray l  Ex. CPU/GPU-agnosOc logsumexp funcOon def logsumexp(x, axis=None): xp = cuda.get_array_module(x) #Get CuPy or NumPy x_max = x.max(axis) exp_sum = xp.exp(x - x_max).sum(axis) return x_max + xp.log(exp_sum) import numpy, cupy enable_cupy = True xp = cupy if enable_cupy else numpy w_c = cupy.asarray(numpy.ones(10)) # cupy.ndarray w_n = cupy.asnumpy(cupy.ones(10)) # numpy.ndarray 31
  • 32. CuPy implementation: Optimized for performance & NumPy-compatibility l  Use Cython for cupy.core & cupy.cuda l  Dynamic code generaOon & compile ̶  CUDA code is generated for specific tensor dimension & data type ̶  On-the-fly compile by nvcc and binary cache (faster awer 1st use) CUDA libraries (cuBLAS, cuRAND, cuDNN) ndarray ufunc, elementwise, reduc5on CUDA Python wrapper cupy.cuda cupy.core Tensor opera5ons & func5ons cupy 32
  • 33. CuPy performance on linear algebra: 5 to 25 times faster than NumPy def test(xp): a = xp.arange(1000000).reshape(1000, -1) return a.T * 2 test(numpy) t1 = datetime.datetime.now() for i in range(1000): test(numpy) t2 = datetime.datetime.now() print(t2 -t1) test(cupy) t1 = datetime.datetime.now() for i in range(1000): test(cupy) t2 = datetime.datetime.now() print(t2 -t1) msec speed up NumPy 2,929 1.0 CuPy 585 5.0 CuPy + Memory Pool 123 23.8 Intel Core i7-4790 @3.60GHz,32GB, GeForce GTX 970 33
  • 34. Use CuPy for GPU-based computation l  Support three paZerns as wrappers ̶  ElementwiseKernel: for element-wise computaOon ̶  ReducOonKernel: for reduce operaOon along axis ̶  ufunc: universal funcOon as in Numpy l  Ex. definiOon of an element-wise funcOon l  Usage (automaOc broadcast and type check are supported) squared_diff = cupy.ElementwiseKernel( ‘float32 x, float32 y’, # Input ‘float32 z’, # Output ‘z = (x - y) * (x - y)’, # Operation ‘squared_diff’) # Name squared_diff(cupy.arange(10), 10) 34
  • 35. Agenda l  Deep learning framework basics l  IntroducOon to Chainer l  CuPy: NumPy-compaOble GPU library l  Performance and applicaOons 35
  • 36. Public benchmark results (CNN): Chainer shows comparable performance l  Forward computaOon is almost the same with TensorFlow l  Training with backward computaOon is slower, but it can be offset by no compilaOon Ome while debugging/tuning 0 200 400 600 800 1000 1200 AlexNet GoogLeNet VGG-A OverFeat Torch TensorFlow Chainer Caffe (naCve) 0 200 400 600 800 1000 1200 AlexNet GoogLeNet VGG-A OverFeat Torch TensorFlow Chainer Caffe (naCve) Forward computation (msec) Backward computation (msec) Taken from https://github.com/soumith/convnet-benchmarks, using cuDNN except Caffe 36
  • 37. Chainer can benefit from latest CUDA libraries: Ex. Winograd algorithm in cuDNN v5 l  Conv3x3 is common in CNNs & now computed with Winograd l  State-of-the-art CNN models (e.g., GoogLeNet, VGG-A) can be accelerated up to 2.0x at test Ome (forward only) 0 100 200 300 400 500 600 AlexNet GoogLeNet VGG-A OverFeat cuDNN v4 cuDNN v5 0 100 200 300 400 500 600 AlexNet GoogLeNet VGG-A OverFeat cuDNN v4 cuDNN v5 Forward computation (msec) Backward computation (msec) Independently measured by a modified version of soumith/convnet-benchmarks cuDNN v5 can be used in Chainer v1.8.0 37
  • 38. Algorithm implementation in Chainer: A Neural Algorithm of Artistic Style (Gatys et al., 2015) l  hZps://github.com/maZya/chainer-gogh Content image (cat) Style image New artistic image + = Main code (45 lines) 38
  • 39. l  Many collaboraOons are on-going w/ Chainer-based computer vision, deep reinforcement learning, etc… l  Ex. 1 Chainer-controlled toy cars in Toyota booth at CES 2016 l  Ex. 2 Highly accurate FANUC’s bin-picking robot at IREX 2015 ̶  8 hours training to reach expert-level, commercializaOon by 2016 end Chainer in industry: Used in demonstrations & being commercialized http://tinyurl.com/pfn-irex15http://tinyurl.com/pfn-ces16 39