SlideShare a Scribd company logo
1 of 84
Download to read offline
Deep Learning with Python
Getting started and getting from
ideas to insights in minutes
PyData Seattle 2015
Alex Korbonits (@korbonits)
About Me
Alex Korbonits
• Data Scientist at Nuiku, Inc.
• Seattleite
• Huge math/philosophy/music/art nerd
You may think you need to have…
… in order to do Deep Learning. That is not the
case.
There’s a lot you can do with a little.
6
Yann Lecun, Geoff Hinton, Yoshua Bengio, and Andrew Ng
What is deep learning?
• Subset of machine learning and AI
• Yes, artificial neural networks are inspired by
the brain;
• BUT they are usually created to perform a
specific task, rather than mimic the brain.
• “Deep”: many-layered neural networks.
Perceptron
• Rosenblatt, 1957,
Cornell Aeronautics
Laboratory, funded by
the Office of Naval
Research
• Linear classifier.
Designed for image
recognition.
• Inputs x and weights w
linearly combined to
achieve some sort of
output y.
XOR• What’s great about perceptrons? They are linear classifiers.
• What’s wrong with this picture?
• They can’t classify non-linear problems such as XOR (the counterexample to everything)
• Minsky & Papert in Perceptrons (1969): it’s impossible for perceptrons to learn XOR.
Multilayer Perceptrons
vs.
Enter the multilayer perceptron
• With one hidden layer, a multilayer perceptron – which
can now figure out XOR – is capable of arbitrary
function approximation.
– This is where the math nerds get excited. Woot!
• Supervised, semi-supervised, unsupervised, and
reinforcement learning applications.
• Flexible architectural components – layer types,
connection types, regularization techniques – allow for
empirical tinkering. Think of playing with Lego®.
DON’T BE SCARED BY THE MATH!
• Who remembers their
first quarter of calculus?
• All we’re going to do is
take a derivative.
• This diagram is a
representation of the
chain rule.
Backpropagation
Here, we take the derivative of z, which is a function of two variables x and y, each
functions of variables s and t.
Backpropagation
• A simple learning
algorithm that takes
some total output error
E defined by some loss
function.
• For example, a typical
loss function for a
multi-class
classification task is log
loss.
Backpropagation
• E is a function of all of its
inputs.
• I.e., all of the incoming
connections to the
output unit of a neural
network.
• I.e., a function that
outputs a class
membership prediction
and whose prediction is
checked against a ground
truth/label.
Backpropagation
We then show:
• A simple derivation of
the change in error as a
function of each
connection weight w_ij.
• This gives a formula for
updating each w_ij
according to the learning
algorithm.
• There are different
algorithms to do this,
such as SGD.
APPLICATIONS AND TOOLS
17
Wherefore and how
Motivation
• We’re at PyData, and we’ve got some
motivating deep learning concepts.
• What are some of the practical applications
and tools you can use?
• Deep learning techniques have recently
beaten many long-standing benchmarks.
Some common applications
• Computer vision tasks:
– Classification
– Segmentation
– Facial recognition
• NLP tasks:
– Automatic Speech Recognition (ASR)
– Machine translation
– POS tagging
– Sentiment analysis
– Natural Language Understanding (NLU)
Some common tools
• Torch (NYU, Facebook AI, Google Deepmind)
• Caffe (Berkeley, Google)
• Theano (Univ. Montreal)
• Graphlab-Create (Dato, Inc.)
• Under active development:
– Neon (Nervana Systems)
– DeepLearning4j running on Apache Spark
Torch
• Created/Used by NYU, Facebook, Google DeepMind
• De rigeur for deep learning research
• Its language is Lua, NOT Python
• Lua’s syntax is somewhat Pythonic. Check it out.
• Torch’s main strengths are its features, which is why
I mention it though here we are at PyData.
• See http://bit.ly/1KzuFhd for a closer look.
Caffe
• Created/Used by Berkeley, Google
• Best tool to get started with:
– Lots of pre-trained reference models
– Lots of standard deep learning datasets
• Easy to configure networks with config files.
• See http://bit.ly/1Db2bHT to get started.
Theano
• Created/Used by University of Montreal
• Very flexible, very sophisticated:
– Lower level interface allows for lots of customization
– Lots of libraries being built ON TOP of Theano, e.g.:
• Keras, PyLearn2, Lasagne, etc.
• Pythonic API, and very well documented.
• See http://bit.ly/1KBsMAv to get started.
GraphLab-Create
• Created by the wonderful folks at Dato, Inc.
• User friendly, picks intelligent defaults.
• TONS of features, AND all are state of the art.
• Blazing fast out-of-core computations on
small/medium/big data.
• Pythonic API, with amazing documentation.
• See http://bit.ly/1LZVqLS to get started.
Under Active Development
• Neon
– Nervana Systems has released a blazing fast
engine for training and testing DNNs, beating a lot
of benchmarks compared to other leading tools.
• DeepLearning4j
– Being developed to run on top of Apache Spark.
– The PySpark possibilities there are huge.
NETWORK TOPOLOGIES
26
Applications and examples
Convolutional Neural Networks
• Named for one of the principal layer types: a “convolutional layer”.
• MNIST and LeNet
– Used in the 80’s by folks such as Yann LeCun for handwritten digit
recognition for ATMs
• ImageNet and AlexNet
– New-ish computer vision competition.
– In 2012, the winning submission used a deep CNN.
– This has completely changed submissions are made: from handwritten
features crafted over decades, to deep nets.
• Text understanding from scratch.
– Character-level inputs into CNNs for high-level semantic knowledge.
Convolution
• What is a convolution?
• One way to think of it is kind of like REDUCE, but
our example (next slide) is 2D since we’re doing
convolutions of 2D images!
• Here’s a short clip to guide intuition (next slide).
Convolution
http://bit.ly/1gquFDB
Let’s talk about computer vision.
Let’s look at AlexNet.
AlexNet (Krizhevsky et al. 2012)
• Won the 2012 ImageNet competition
– Hard and interesting: classification of 1000 objects
• BEAT THE PANTS off of all previous attempts,
– which included hand-engineered features;
– that had been studied and improved for decades:
– AlexNet’s millions of params learned via backprop!
AlexNet (Krizhevsky et al. 2012)
When AlexNet is processing an image, this is what is happening at each layer.
The size of the last layer is the number of classes
AlexNet (Krizhevsky et al. 2012)
When AlexNet is processing an image, this is what is happening at each layer.
The last layer takes a lot of abstraction and richness as its input
AlexNet (Krizhevsky et al. 2012)
When AlexNet is processing an image, this is what is happening at each layer.
It then outputs a vote of confidence as to which class the image belongs
AlexNet (Krizhevsky et al. 2012)
When AlexNet is processing an image, this is what is happening at each layer.
The class with the highest likelihood is the one the DNN selects
AlexNet (Krizhevsky et al. 2012)
When AlexNet is processing an image, this is what is happening at each layer.
In this case…
AlexNet (Krizhevsky et al. 2012)
When AlexNet is processing an image, this is what is happening at each layer.
It’s a cat!
AlexNet
• This is an example of
classification with
AlexNet.
• Top five class
predictions for each
image.
• Correct classification
is red.
GoogLeNet
• Networks keep getting larger and larger, with no end in sight.
• Remember AlexNet? It was a monster in 2012 for having 12 layers.
• GoogLeNet, from 2014, uses what it calls “Inception modules” to improve its convolutions.
They’re getting deeper.
Recurrent Neural Networks
• Learning sequences of
words/characters/anything.
• A few well-known varieties:
– “Plain vanilla” RNNs
– Long Short Term Memory (LSTM) RNNs
– Attention mechanisms
• HOT right now for video scene descriptions,
question and answer systems, and text.
Recurrent Neural Networks
• RNN’s are different from convolutional nets in that their don’t only connect up and down.
• They can connect sideways within the same layer.
• There are even architectures that can go in both directions.
Word2Vec: Neural network for finding high
dimensional representation per word
Mikolov et al. ‘13
Skip-gram Model: From a word, predict nearby words in sentence
Awesome learning talk
at PyData
deep
300 dim
representation
300 dim
representation
300 dim
representation
300 dim
representation
300 dim
representation
300 dim
representation
Neural net
Viewed as deep
features
Related words placed nearby high dim space
Projecting 300 dim space into 2 dim with PCA (Mikolov et al. ’13)
Ulysses on Fire with Torch
This is how my favorite book, James Joyce’s 1922
novel Ulysses, famously begins and famously
ends:
Ulysses on Fire with Torch
– I –
Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of
lather on which a mirror and a razor lay crossed.
...
yes I said yes I will Yes.
Trieste-Zurich-Paris 1914-1921
Ulysses on Fire with Torch
After 17 iterations of the training data, this is
what my LSTM RNN can generate:
Generating Joycean Prose
Bloom works. Quick! Pollyman. An a lot it was seeming, mide,
says, up and the rare borns at Leopolters! Cilleynan's face. Childs
hell my milk by their doubt in thy last, unhall sit attracted with
source
The door of Kildan
and the followed their stowabout over that of three constant
trousantly Vinisis Henry Doysed and let up to a man with hands in
surresses afraid quarts to here over someware as cup to a whie
yellow accept thicks answer to me.
Ulysses is a tough example
• Remember that Ulysses is only 1.5 MB, and that this is
trained character by character. It has no knowledge of
English or language.
• Notice some of the emergent properties of this prose.
Punctuation, indentation, and more.
• Longer samples correctly show underlining (markdown
formatted), properly formed parentheticals (which is a
classically tough problem in NLP due to variable length
issues).
Recursive Neural Tensor Networks
• Capturing natural language’s recursive nature and
handling variable-length sentences.
• Created by applying the same set of weights recursively
over a structure
• Natural language inference
– Learn logical semantics
• Learn vector representations of words, multi-word
phrases, grammar, and multi-lingual phrase pairs.
http://nlp.stanford.edu/projects/DeepLearningInNaturalLanguageProcessing.shtml
Deep Unsupervised Learning
• It’s possible to train neurons to be selective for
high-level concepts using entirely unlabeled data.
• Le et al. 2012 used a 9-layered locally connected
sparse autoencoder with pooling and local
contrast normalization.
• 1 billion parameters trained on 10 million images.
• 15.8% error; great at recognizing cats & humans.
Totally unsupervised!
QuocNet
Optimal stimulus for two units according to numerical constraint optimization.
Transfer Learning
-Old idea explored by Donahue et al., 2014.
-Steps:
- Get some data. Get a pre-trained DNN.
- Propagate unseen data through (that fits the DNN)
- Extract outputs of some layer before final output
- Use as feature vectors
- Can do supervised/unsupervised learning w/ these
Example: image similarity
A B C
A
B
C
- Distance between an image’s extracted features. Each set of extracted features forms a vector
- Images whose deep visual features are similar have similar sets of extracted features.
- We can measure quantitatively how similar two images are by taking the Euclidean distance
between these sets of features.
- More similar images are closer together, distance-wise, in that space.
IMAGE SIMILARITY IPYTHON NOTEBOOK
http://bit.ly/1Shtnvh
Deep
Reinforcement
Learning
- DeepMind’s Deep Q-network agent
- Pixels and the game score only inputs
- Comparable to pro human game tester
- Across a set of 49 games…
- Same algorithm, net, hyperparameters.
APPENDIX I: VISUALIZATION
62
What’s going on under the hood?
A view of AlexNet (Krizhevsky et al. 2012)
Helpful, but doesn’t give intuition
• On the following slides, we show:
• Random test images; with
• A subset of the feature activation maps in the indicated layer.
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
pool3 -> … -> output
Classification: Labrador retriever
APPENDIX II: PITFALLS
73
We’ve still got a lot of learnin’ to do
DNN INTERPRETATION AND INTUITION
• DNNs hard to interpret: parameters learned
via backpropagation.
• DNNs have counter-intuitive properties.
• DNNs’ expressive powers come with subtle
limitations.
75
Fool me once, shame on you
Szegedy et al., 2013
• Authors imperceptibly alter correctly classified images
to fool DNNs:
– LeNet
– AlexNet
– QuocNet
• They call such inputs “adversarial examples”.
76
“ostrich, struthio Camelus”, right?
WRONG
Left: correctly
predicted sample
Center: 10x
difference
between Left and
Right columns.
Right: “ostrich,
struthio Camelus”
77
Fool me twice, shame on me
Nguyen et al., 2014
• Authors look at counter-intuitive properties of DNNs
per Szegedy et al., 2013.
• Easy to produce images that are:
– Unrecognizable to humans; such that
– DNNs almost certain that these are in familiar classes.
• The authors call these “fooling images”.
78
Directly encoded fooling
images
These evolved images –
unrecognizable to humans
– that DNNs trained on
ImageNet believe with
near certainty to be a
familiar object.
79
Indirectly encoded fooling
images
These evolved images –
unrecognizable to humans
– that DNNs trained on
ImageNet believe with
near certainty to be a
familiar object.
80
Tip: train with adversarial examples Adds
more regularization than dropout!
Szegedy et al., 2013
“These results suggest that the
deep neural networks that are
learned by backpropagation have
nonintuitive characteristics and
intrinsic blind spots, whose
structure is connected to the data
distribution in a non-obvious
way.”
Nguyen et al., 2014
“The fact that DNNs are
increasingly used in a wide
variety of industries, including
safety-critical ones such as
driverless cars, raises the
possibility of costly exploits via
techniques that generate
fooling images”
81
Bibliography
Csáji, Balázs Csanád. "Approximation with artificial neural networks." Faculty of Sciences, Etvs Lornd University,
Hungary 24 (2001).
Donahue, J., Jia, Y., Vinyals, O., Homan, J., Zhang, N., Tzeng, E., and Darrell, T. DeCAF: A deep convolutional activation
feature for generic visual recognition. In JMLR, 2014.
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv
preprint arXiv:1412.6572 (2014).
Hermann, Karl Moritz, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil
Blunsom. "Teaching Machines to Read and Comprehend." arXiv preprint arXiv:1506.03340 (2015).
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal
approximators." Neural networks 2, no. 5 (1989): 359-366.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." In Advances in neural information processing systems, pp. 1097-1105. 2012.
Le, Quoc V., Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y.
Ng. "Building high-level features using large scale unsupervised learning." arXiv preprint arXiv:1112.6209 (2011).
Bibliography
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient estimation of word representations in vector
space." arXiv preprint arXiv:1301.3781 (2013).
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin
Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
Nguyen, Anh, Jason Yosinski, and Jeff Clune. "Deep neural networks are easily fooled: High confidence predictions for
unrecognizable images." arXiv preprint arXiv:1412.1897 (2014).
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,
Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual
Recognition Challenge. arXiv:1409.0575, 2014.
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).
Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus.
"Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013).
Yang, J., L., Y., Tian, Y., Duan, L., and Gao, W. Group-sensitive multiple kernel learning for object categorization. In ICCV,
2009.
THANKS!
twitter:
email:
blog:
@korbonits
alexkorbonits@gmail.com
korbonits.github.io

More Related Content

What's hot

Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
 
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appDetails of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorRoelof Pieters
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLSri Ambati
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningDavid Rostcheck
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachMaurizio Calo Caligaris
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305Amazon Web Services
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 

What's hot (20)

Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appDetails of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Similar to Deep Learning with Python: Getting Started and Getting Insights Fast

Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearningEyad Alshami
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningdoppenhe
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
DL4J at Workday Meetup
DL4J at Workday MeetupDL4J at Workday Meetup
DL4J at Workday MeetupDavid Kale
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 

Similar to Deep Learning with Python: Getting Started and Getting Insights Fast (20)

Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Machine Learning Overview: How did we get here ?
Machine Learning Overview: How did we get here ?Machine Learning Overview: How did we get here ?
Machine Learning Overview: How did we get here ?
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
DL4J at Workday Meetup
DL4J at Workday MeetupDL4J at Workday Meetup
DL4J at Workday Meetup
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Image captioning
Image captioningImage captioning
Image captioning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
lec01.pptx
lec01.pptxlec01.pptx
lec01.pptx
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 

Recently uploaded

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Recently uploaded (17)

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

Deep Learning with Python: Getting Started and Getting Insights Fast

  • 1. Deep Learning with Python Getting started and getting from ideas to insights in minutes PyData Seattle 2015 Alex Korbonits (@korbonits)
  • 2. About Me Alex Korbonits • Data Scientist at Nuiku, Inc. • Seattleite • Huge math/philosophy/music/art nerd
  • 3. You may think you need to have…
  • 4.
  • 5. … in order to do Deep Learning. That is not the case. There’s a lot you can do with a little.
  • 6. 6 Yann Lecun, Geoff Hinton, Yoshua Bengio, and Andrew Ng
  • 7. What is deep learning? • Subset of machine learning and AI • Yes, artificial neural networks are inspired by the brain; • BUT they are usually created to perform a specific task, rather than mimic the brain. • “Deep”: many-layered neural networks.
  • 8. Perceptron • Rosenblatt, 1957, Cornell Aeronautics Laboratory, funded by the Office of Naval Research • Linear classifier. Designed for image recognition. • Inputs x and weights w linearly combined to achieve some sort of output y.
  • 9. XOR• What’s great about perceptrons? They are linear classifiers. • What’s wrong with this picture? • They can’t classify non-linear problems such as XOR (the counterexample to everything) • Minsky & Papert in Perceptrons (1969): it’s impossible for perceptrons to learn XOR.
  • 11. Enter the multilayer perceptron • With one hidden layer, a multilayer perceptron – which can now figure out XOR – is capable of arbitrary function approximation. – This is where the math nerds get excited. Woot! • Supervised, semi-supervised, unsupervised, and reinforcement learning applications. • Flexible architectural components – layer types, connection types, regularization techniques – allow for empirical tinkering. Think of playing with Lego®.
  • 12. DON’T BE SCARED BY THE MATH!
  • 13. • Who remembers their first quarter of calculus? • All we’re going to do is take a derivative. • This diagram is a representation of the chain rule. Backpropagation Here, we take the derivative of z, which is a function of two variables x and y, each functions of variables s and t.
  • 14. Backpropagation • A simple learning algorithm that takes some total output error E defined by some loss function. • For example, a typical loss function for a multi-class classification task is log loss.
  • 15. Backpropagation • E is a function of all of its inputs. • I.e., all of the incoming connections to the output unit of a neural network. • I.e., a function that outputs a class membership prediction and whose prediction is checked against a ground truth/label.
  • 16. Backpropagation We then show: • A simple derivation of the change in error as a function of each connection weight w_ij. • This gives a formula for updating each w_ij according to the learning algorithm. • There are different algorithms to do this, such as SGD.
  • 18. Motivation • We’re at PyData, and we’ve got some motivating deep learning concepts. • What are some of the practical applications and tools you can use? • Deep learning techniques have recently beaten many long-standing benchmarks.
  • 19. Some common applications • Computer vision tasks: – Classification – Segmentation – Facial recognition • NLP tasks: – Automatic Speech Recognition (ASR) – Machine translation – POS tagging – Sentiment analysis – Natural Language Understanding (NLU)
  • 20. Some common tools • Torch (NYU, Facebook AI, Google Deepmind) • Caffe (Berkeley, Google) • Theano (Univ. Montreal) • Graphlab-Create (Dato, Inc.) • Under active development: – Neon (Nervana Systems) – DeepLearning4j running on Apache Spark
  • 21. Torch • Created/Used by NYU, Facebook, Google DeepMind • De rigeur for deep learning research • Its language is Lua, NOT Python • Lua’s syntax is somewhat Pythonic. Check it out. • Torch’s main strengths are its features, which is why I mention it though here we are at PyData. • See http://bit.ly/1KzuFhd for a closer look.
  • 22. Caffe • Created/Used by Berkeley, Google • Best tool to get started with: – Lots of pre-trained reference models – Lots of standard deep learning datasets • Easy to configure networks with config files. • See http://bit.ly/1Db2bHT to get started.
  • 23. Theano • Created/Used by University of Montreal • Very flexible, very sophisticated: – Lower level interface allows for lots of customization – Lots of libraries being built ON TOP of Theano, e.g.: • Keras, PyLearn2, Lasagne, etc. • Pythonic API, and very well documented. • See http://bit.ly/1KBsMAv to get started.
  • 24. GraphLab-Create • Created by the wonderful folks at Dato, Inc. • User friendly, picks intelligent defaults. • TONS of features, AND all are state of the art. • Blazing fast out-of-core computations on small/medium/big data. • Pythonic API, with amazing documentation. • See http://bit.ly/1LZVqLS to get started.
  • 25. Under Active Development • Neon – Nervana Systems has released a blazing fast engine for training and testing DNNs, beating a lot of benchmarks compared to other leading tools. • DeepLearning4j – Being developed to run on top of Apache Spark. – The PySpark possibilities there are huge.
  • 27. Convolutional Neural Networks • Named for one of the principal layer types: a “convolutional layer”. • MNIST and LeNet – Used in the 80’s by folks such as Yann LeCun for handwritten digit recognition for ATMs • ImageNet and AlexNet – New-ish computer vision competition. – In 2012, the winning submission used a deep CNN. – This has completely changed submissions are made: from handwritten features crafted over decades, to deep nets. • Text understanding from scratch. – Character-level inputs into CNNs for high-level semantic knowledge.
  • 28. Convolution • What is a convolution? • One way to think of it is kind of like REDUCE, but our example (next slide) is 2D since we’re doing convolutions of 2D images! • Here’s a short clip to guide intuition (next slide).
  • 30. Let’s talk about computer vision. Let’s look at AlexNet.
  • 31. AlexNet (Krizhevsky et al. 2012) • Won the 2012 ImageNet competition – Hard and interesting: classification of 1000 objects • BEAT THE PANTS off of all previous attempts, – which included hand-engineered features; – that had been studied and improved for decades: – AlexNet’s millions of params learned via backprop!
  • 32. AlexNet (Krizhevsky et al. 2012) When AlexNet is processing an image, this is what is happening at each layer. The size of the last layer is the number of classes
  • 33. AlexNet (Krizhevsky et al. 2012) When AlexNet is processing an image, this is what is happening at each layer. The last layer takes a lot of abstraction and richness as its input
  • 34. AlexNet (Krizhevsky et al. 2012) When AlexNet is processing an image, this is what is happening at each layer. It then outputs a vote of confidence as to which class the image belongs
  • 35. AlexNet (Krizhevsky et al. 2012) When AlexNet is processing an image, this is what is happening at each layer. The class with the highest likelihood is the one the DNN selects
  • 36. AlexNet (Krizhevsky et al. 2012) When AlexNet is processing an image, this is what is happening at each layer. In this case…
  • 37. AlexNet (Krizhevsky et al. 2012) When AlexNet is processing an image, this is what is happening at each layer. It’s a cat!
  • 38. AlexNet • This is an example of classification with AlexNet. • Top five class predictions for each image. • Correct classification is red.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. GoogLeNet • Networks keep getting larger and larger, with no end in sight. • Remember AlexNet? It was a monster in 2012 for having 12 layers. • GoogLeNet, from 2014, uses what it calls “Inception modules” to improve its convolutions. They’re getting deeper.
  • 44. Recurrent Neural Networks • Learning sequences of words/characters/anything. • A few well-known varieties: – “Plain vanilla” RNNs – Long Short Term Memory (LSTM) RNNs – Attention mechanisms • HOT right now for video scene descriptions, question and answer systems, and text.
  • 45. Recurrent Neural Networks • RNN’s are different from convolutional nets in that their don’t only connect up and down. • They can connect sideways within the same layer. • There are even architectures that can go in both directions.
  • 46. Word2Vec: Neural network for finding high dimensional representation per word Mikolov et al. ‘13 Skip-gram Model: From a word, predict nearby words in sentence Awesome learning talk at PyData deep 300 dim representation 300 dim representation 300 dim representation 300 dim representation 300 dim representation 300 dim representation Neural net Viewed as deep features
  • 47. Related words placed nearby high dim space Projecting 300 dim space into 2 dim with PCA (Mikolov et al. ’13)
  • 48. Ulysses on Fire with Torch This is how my favorite book, James Joyce’s 1922 novel Ulysses, famously begins and famously ends:
  • 49. Ulysses on Fire with Torch – I – Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed. ... yes I said yes I will Yes. Trieste-Zurich-Paris 1914-1921
  • 50. Ulysses on Fire with Torch After 17 iterations of the training data, this is what my LSTM RNN can generate:
  • 51. Generating Joycean Prose Bloom works. Quick! Pollyman. An a lot it was seeming, mide, says, up and the rare borns at Leopolters! Cilleynan's face. Childs hell my milk by their doubt in thy last, unhall sit attracted with source The door of Kildan and the followed their stowabout over that of three constant trousantly Vinisis Henry Doysed and let up to a man with hands in surresses afraid quarts to here over someware as cup to a whie yellow accept thicks answer to me.
  • 52. Ulysses is a tough example • Remember that Ulysses is only 1.5 MB, and that this is trained character by character. It has no knowledge of English or language. • Notice some of the emergent properties of this prose. Punctuation, indentation, and more. • Longer samples correctly show underlining (markdown formatted), properly formed parentheticals (which is a classically tough problem in NLP due to variable length issues).
  • 53. Recursive Neural Tensor Networks • Capturing natural language’s recursive nature and handling variable-length sentences. • Created by applying the same set of weights recursively over a structure • Natural language inference – Learn logical semantics • Learn vector representations of words, multi-word phrases, grammar, and multi-lingual phrase pairs.
  • 55. Deep Unsupervised Learning • It’s possible to train neurons to be selective for high-level concepts using entirely unlabeled data. • Le et al. 2012 used a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization. • 1 billion parameters trained on 10 million images. • 15.8% error; great at recognizing cats & humans.
  • 57. QuocNet Optimal stimulus for two units according to numerical constraint optimization.
  • 58. Transfer Learning -Old idea explored by Donahue et al., 2014. -Steps: - Get some data. Get a pre-trained DNN. - Propagate unseen data through (that fits the DNN) - Extract outputs of some layer before final output - Use as feature vectors - Can do supervised/unsupervised learning w/ these
  • 59. Example: image similarity A B C A B C - Distance between an image’s extracted features. Each set of extracted features forms a vector - Images whose deep visual features are similar have similar sets of extracted features. - We can measure quantitatively how similar two images are by taking the Euclidean distance between these sets of features. - More similar images are closer together, distance-wise, in that space.
  • 60. IMAGE SIMILARITY IPYTHON NOTEBOOK http://bit.ly/1Shtnvh
  • 61. Deep Reinforcement Learning - DeepMind’s Deep Q-network agent - Pixels and the game score only inputs - Comparable to pro human game tester - Across a set of 49 games… - Same algorithm, net, hyperparameters.
  • 62. APPENDIX I: VISUALIZATION 62 What’s going on under the hood?
  • 63. A view of AlexNet (Krizhevsky et al. 2012) Helpful, but doesn’t give intuition • On the following slides, we show: • Random test images; with • A subset of the feature activation maps in the indicated layer.
  • 64. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 65. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 66. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 67. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 68. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 69. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 70. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 71. data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  • 72. pool3 -> … -> output Classification: Labrador retriever
  • 73. APPENDIX II: PITFALLS 73 We’ve still got a lot of learnin’ to do
  • 75. • DNNs hard to interpret: parameters learned via backpropagation. • DNNs have counter-intuitive properties. • DNNs’ expressive powers come with subtle limitations. 75
  • 76. Fool me once, shame on you Szegedy et al., 2013 • Authors imperceptibly alter correctly classified images to fool DNNs: – LeNet – AlexNet – QuocNet • They call such inputs “adversarial examples”. 76
  • 77. “ostrich, struthio Camelus”, right? WRONG Left: correctly predicted sample Center: 10x difference between Left and Right columns. Right: “ostrich, struthio Camelus” 77
  • 78. Fool me twice, shame on me Nguyen et al., 2014 • Authors look at counter-intuitive properties of DNNs per Szegedy et al., 2013. • Easy to produce images that are: – Unrecognizable to humans; such that – DNNs almost certain that these are in familiar classes. • The authors call these “fooling images”. 78
  • 79. Directly encoded fooling images These evolved images – unrecognizable to humans – that DNNs trained on ImageNet believe with near certainty to be a familiar object. 79
  • 80. Indirectly encoded fooling images These evolved images – unrecognizable to humans – that DNNs trained on ImageNet believe with near certainty to be a familiar object. 80
  • 81. Tip: train with adversarial examples Adds more regularization than dropout! Szegedy et al., 2013 “These results suggest that the deep neural networks that are learned by backpropagation have nonintuitive characteristics and intrinsic blind spots, whose structure is connected to the data distribution in a non-obvious way.” Nguyen et al., 2014 “The fact that DNNs are increasingly used in a wide variety of industries, including safety-critical ones such as driverless cars, raises the possibility of costly exploits via techniques that generate fooling images” 81
  • 82. Bibliography Csáji, Balázs Csanád. "Approximation with artificial neural networks." Faculty of Sciences, Etvs Lornd University, Hungary 24 (2001). Donahue, J., Jia, Y., Vinyals, O., Homan, J., Zhang, N., Tzeng, E., and Darrell, T. DeCAF: A deep convolutional activation feature for generic visual recognition. In JMLR, 2014. Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). Hermann, Karl Moritz, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. "Teaching Machines to Read and Comprehend." arXiv preprint arXiv:1506.03340 (2015). Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal approximators." Neural networks 2, no. 5 (1989): 359-366. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." In Advances in neural information processing systems, pp. 1097-1105. 2012. Le, Quoc V., Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. "Building high-level features using large scale unsupervised learning." arXiv preprint arXiv:1112.6209 (2011).
  • 83. Bibliography Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013). Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). Nguyen, Anh, Jason Yosinski, and Jeff Clune. "Deep neural networks are easily fooled: High confidence predictions for unrecognizable images." arXiv preprint arXiv:1412.1897 (2014). Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014). Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013). Yang, J., L., Y., Tian, Y., Duan, L., and Gao, W. Group-sensitive multiple kernel learning for object categorization. In ICCV, 2009.

Editor's Notes

  1. Hi, my name is Alex Korbonits, and I'm a Data Scientist at Nuiku, Inc., where we're beginning to explore deep learning applications to natural language processing. I have been using Python for a few years now almost every day for all kinds of data-related tasks, mostly data engineering and data science. Recently I’ve been diving deeper into deep neural networks and some of the Python libraries that you can use to do deep learning.
  2. In this case, it’s a cat!
  3. In this case, it’s a cat!
  4. In this case, it’s a cat!
  5. conv1 – the first convolutional layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  6. pool1 – the first pooling layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  7. conv2 – the second convolutional layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  8. pool2 – the second pooling layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  9. conv3 – the third convolutional layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  10. conv4 – the fourth convolutional layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  11. conv5 – the fifth convolutional layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  12. pool3 – the third pooling layer data -> conv1 -> pool1 -> conv2 -> pool2 -> conv3 -> conv4 -> conv5 -> pool3
  13. Classification: Labrador retriever pool3 … -> … -> output