Promises of Deep Learning

Promises of Deep Learning
David Khosid
Sept. 1, 2015

Agenda
• Neural Networks (NN) training
• Deep Learning = NN + … + …
• Deep Learning (DL) projects
• Topics: HW, IoT

Easy to human, hard to machines
Q: The goal of this talk?
To provide you with an intuitive understanding of what DL is
and why it works.
The needs:
• Need to perceive and understand the world
• Basic speech and vision capabilities
• Language understanding
How can we do this?
• Cannot write algorithms for each task we want to accomplish separately
• Need to write general algorithms that learn from observations

Why is this hard?
You see this:
But the camera sees this:

Example: Handwritten digit recognition
• The goal: SW to recognize the digit in each image (Classifier)
• Source: “MNIST database of handwritten digits”, 60,000 examples
• Typical human error: 2.5%. Common confusion between {2, 7} , {4,9}

‘1’ versus ‘5’ – features engineering
• Features (properties): ‘intensity’ and ‘symmetry’
x1 -> ‘Intensity’ = Average value for pixel in the image
x2 -> ‘Symmetry’. ‘1’ is more symmetric
x1
x2

Digits recognition – dream solution
• We are looking for features …
• If possible (I don’t know for sure), it requires
exceptional domain expertise.
Ideas for additional features:
- number of separate, connected regions of white
pixels. 1, 2, 3, 5, 7 tend to have one contiguous
region of white space while the loops in 6, 8, 9
create more.
- ask experts 
Do you like the process?

Traditional approach (Features Engineering)

“Hand-Crafted Feature Engineering” Limitations
• Generalization:
• How to recognize handwritten text?
• Printed text in different fonts?
• Time-consuming (of data scientist)
• Not scalable
• Can’t achieve human performance
DNN – Deep Neural Networks

Let's be inspired by nature, but not too much
• “Fly like a bird”
• The dream
• Aerodynamics. We figured that feathers and wing flapping weren't crucial
• Flight envelope: speed, altitude etc
• Brain Inspiration
Biological function Biological structure

Deep Learning (Neural Networks)
Neuroscience: how does the cortex learn perception?
• Does the cortex “run” a single, general learning algorithm? (or a small
number of them)
Deep Learning addresses the problem of learning hierarchical
representations with a single algorithm
• or perhaps with a few algorithms
Concrete(pixels) Abstract (object)
Deep Learning

The Neuron
• Different weights compute different functions
F(.)

Neural Networks: Architectures

ImageNet Large Scale Visual Recognition Challenge
“World Cup” for CV and ML
1,000 object classes
1.2M training images
Resolution: 256x256 pixels
Our NN:
Input Layer: 256x256=65,536
Output layer: 1,000

NN: Back-propagation
Learning algorithm
• while not done
• pick a random training case (Xi, Yi)
• run NN on input Xi
• modify connection weights to make
prediction closer to Y
pixels

Q: What do the individual neurons
look for in an image?

DL Leaders
Andrew Ng
Jeff Dean (Google)
Stanford/CourseraNYU
Facebook (80%)
NYU (20%)
Yann LeCun
Geoffrey Hinton
Yoshua Bengio
U. MontrealU. Toronto
Baidu
2014
Google
2011
Google
2013
2013

Face recognition error (smaller is better)

DL Architectures: Autoencoders
• Output same size as input
• Have target = input
• High-dimensional data could be
represented

Projects: mining for structure
• Datasets, private and public:
• ImageNet
• YouTube as a data source
• Architectures
• RNN, ConvNet
• AlexNet

“Google Brain” (2012)
• The goal: find ways to improve DL networks that can find deeper and
more meaningful patterns in data using less processing power.
• Famous for recognizing cats in YouTube videos 
• Architecture:
• Autoencoder
• 1 billion connections
• Training procedure (2012):
• Train on 10 million unlabeled images (YouTube)
• 1000 machines (16,000 cores) cluster for 1 week
• Training procedure (2015):
• 32 GPU (HW cost ~$32,000 )
Cat neuron
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Deep Learning @Google
• Google has invested decades of person-years in building the state-of-
the-art infrastructure
• Leverage thousands of CPUs and GPUs to learn from billions of data
samples in parallel
• Publish frequently, and often place first in academic challenges in
image recognition, speech recognition, etc
• Extensive and accelerating experience in using DL in real products:
47 production launches in the last 2 years.
• e.g. Photo search, Android speech recognition, StreetView, Ads placement...

Example of modern DL architecture:
GoogLeNet

@Facebook: FAIR
• FAIR=Facebook AI Research
• Recommended:
https://www.facebook.com/yann.lecun
https://research.facebook.com/ai
• DeepFace
• M Project

Microsoft: Skype Translator
https://www.youtube.com/watch?v=eu9kMIeS0wQ

New Human-Machine Interfaces
• Beyond Verbal Communication
• Emotions Analytics

Self-driving cars
• Mobileye
• Google
• Tesla
• Apple
Autonomous Driving, clip by Mobileye
https://www.youtube.com/watch?v=yjRtGKtwOlc

Risks: unknown “Failure Modes”
• We will use DL/AI, without anybody fully understands how it works
• Reminder: Human brain and DL are different
Reference: Deep Neural Networks are Easily Fooled: High Confidence Predictions for
Unrecognizable Images

Risks: fooling NN
• These images are classified with >99.6% confidence as the shown
class by a Convolutional Network.

Is AI research safe?
• Social impact
• Employment impact
• Military usage

Risk: less privacy
Facebook’s Moments as illustrative example:
“Today we launched a new app called Moments that helps you sync
photos with your friends. Moments recognizes which of your friends are
in the photos you take, and lets you share those photos with those
people in one tap. If you use it, your friends will sync to you a lot of the
photos of you they have hidden in their camera rolls.
This is a simple example of AI at work. By building a system that
learned to recognize people and objects in images, we could enable this
new service.”
Mark Zuckerberg’s blog, June 15, 2015
who is

Hardware
• Nodes with 4 to 8 GPUs. Google has 10,000+ GPUs
• Google is building custom hardware, based on
FPGAs, to run its NNs. Microsoft also. Facebook?
• Mobileye: ConvNet chip for automotive
• Orcam: low-power ConvNet chip
• Torch7 (Lua) – Facebook, Google, Twitter and Intel
• Caffe

Open-Source Frameworks for DL
• Torch7 (Lua). Facebook, Google, Twitter and Intel
• Caffe. The community shares models in “Model Zoo”
• NVIDIA cuDNN – DL library

Money
Example: DeepMind, 75 employees, no product, £ 400 million
Google AI and robotics purchases timeline
October 1, 2012 Viewdle Facial recognition
March 12, 2013 DNNresearch Inc. Deep neural networks
April 23, 2013 Wavii Natural language processing
October 2, 2013 Flutter Gesture recognition technology
December 2, 2013 Schaft Robotics, humanoid robots
December 3, 2013 Industrial Perception Robotic arms, computer vision
December 4, 2013 Redwood Robotics Robotic arms
December 5, 2013 Meka Robotics Robots
December 6, 2013 Holomni Robotic wheels
December 7, 2013 Bot & Dolly Robotic cameras
December 10, 2013 Boston Dynamics Robotics
January 26, 2014 DeepMind Technologies Artificial intelligence
August 17, 2014 Jetpac
Artificial intelligence, image
recognition
October 23, 2014 Dark Blue Labs Artificial Intelligence
October 23, 2014 Vision Factory Artificial Intelligence

Transfer learning + fine tuning
• “training time” vs “execution time” = 5 till 8 orders of magnitude
• DL could be embedded in cars, IoT, smartphones

Promises of Deep Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Promises of Deep Learning

Ähnlich wie Promises of Deep Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Promises of Deep Learning