Deep Learning is the area of machine learning and one of the most talked about trends in business and computer science today.
In this talk, I will give a review of Deep Learning explaining what it is, what kinds of tasks it can do today, and what it probably could do in the future.
2. Agenda
• Neural Networks (NN) training
• Deep Learning = NN + … + …
• Deep Learning (DL) projects
• Topics: HW, IoT
3. Easy to human, hard to machines
Q: The goal of this talk?
To provide you with an intuitive understanding of what DL is
and why it works.
The needs:
• Need to perceive and understand the world
• Basic speech and vision capabilities
• Language understanding
How can we do this?
• Cannot write algorithms for each task we want to accomplish separately
• Need to write general algorithms that learn from observations
4. Why is this hard?
You see this:
But the camera sees this:
5. Example: Handwritten digit recognition
• The goal: SW to recognize the digit in each image (Classifier)
• Source: “MNIST database of handwritten digits”, 60,000 examples
• Typical human error: 2.5%. Common confusion between {2, 7} , {4,9}
6. ‘1’ versus ‘5’ – features engineering
• Features (properties): ‘intensity’ and ‘symmetry’
x1 -> ‘Intensity’ = Average value for pixel in the image
x2 -> ‘Symmetry’. ‘1’ is more symmetric
x1
x2
7. Digits recognition – dream solution
• We are looking for features …
• If possible (I don’t know for sure), it requires
exceptional domain expertise.
Ideas for additional features:
- number of separate, connected regions of white
pixels. 1, 2, 3, 5, 7 tend to have one contiguous
region of white space while the loops in 6, 8, 9
create more.
- ask experts
Do you like the process?
9. “Hand-Crafted Feature Engineering” Limitations
• Generalization:
• How to recognize handwritten text?
• Printed text in different fonts?
• Time-consuming (of data scientist)
• Not scalable
• Can’t achieve human performance
DNN – Deep Neural Networks
10. Let's be inspired by nature, but not too much
• “Fly like a bird”
• The dream
• Aerodynamics. We figured that feathers and wing flapping weren't crucial
• Flight envelope: speed, altitude etc
• Brain Inspiration
Biological function Biological structure
11. Deep Learning (Neural Networks)
Neuroscience: how does the cortex learn perception?
• Does the cortex “run” a single, general learning algorithm? (or a small
number of them)
Deep Learning addresses the problem of learning hierarchical
representations with a single algorithm
• or perhaps with a few algorithms
Concrete(pixels) Abstract (object)
Deep Learning
17. ImageNet Large Scale Visual Recognition Challenge
“World Cup” for CV and ML
1,000 object classes
1.2M training images
Resolution: 256x256 pixels
Our NN:
Input Layer: 256x256=65,536
Output layer: 1,000
18. NN: Back-propagation
Learning algorithm
• while not done
• pick a random training case (Xi, Yi)
• run NN on input Xi
• modify connection weights to make
prediction closer to Y
pixels
20. Q: What do the individual neurons
look for in an image?
21. DL Leaders
Andrew Ng
Jeff Dean (Google)
Stanford/CourseraNYU
Facebook (80%)
NYU (20%)
Yann LeCun
Geoffrey Hinton
Yoshua Bengio
U. MontrealU. Toronto
Baidu
2014
Google
2011
Google
2013
2013
27. Projects: mining for structure
• Datasets, private and public:
• ImageNet
• YouTube as a data source
• Architectures
• RNN, ConvNet
• AlexNet
28. “Google Brain” (2012)
• The goal: find ways to improve DL networks that can find deeper and
more meaningful patterns in data using less processing power.
• Famous for recognizing cats in YouTube videos
• Architecture:
• Autoencoder
• 1 billion connections
• Training procedure (2012):
• Train on 10 million unlabeled images (YouTube)
• 1000 machines (16,000 cores) cluster for 1 week
• Training procedure (2015):
• 32 GPU (HW cost ~$32,000 )
Cat neuron
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
29. Deep Learning @Google
• Google has invested decades of person-years in building the state-of-
the-art infrastructure
• Leverage thousands of CPUs and GPUs to learn from billions of data
samples in parallel
• Publish frequently, and often place first in academic challenges in
image recognition, speech recognition, etc
• Extensive and accelerating experience in using DL in real products:
47 production launches in the last 2 years.
• e.g. Photo search, Android speech recognition, StreetView, Ads placement...
36. Self-driving cars
• Mobileye
• Google
• Tesla
• Apple
Autonomous Driving, clip by Mobileye
https://www.youtube.com/watch?v=yjRtGKtwOlc
37. Risks: unknown “Failure Modes”
• We will use DL/AI, without anybody fully understands how it works
• Reminder: Human brain and DL are different
Reference: Deep Neural Networks are Easily Fooled: High Confidence Predictions for
Unrecognizable Images
40. Risks: fooling NN
• These images are classified with >99.6% confidence as the shown
class by a Convolutional Network.
41. Is AI research safe?
• Social impact
• Employment impact
• Military usage
42. Risk: less privacy
Facebook’s Moments as illustrative example:
“Today we launched a new app called Moments that helps you sync
photos with your friends. Moments recognizes which of your friends are
in the photos you take, and lets you share those photos with those
people in one tap. If you use it, your friends will sync to you a lot of the
photos of you they have hidden in their camera rolls.
This is a simple example of AI at work. By building a system that
learned to recognize people and objects in images, we could enable this
new service.”
Mark Zuckerberg’s blog, June 15, 2015
who is
43. Hardware
• Nodes with 4 to 8 GPUs. Google has 10,000+ GPUs
• Google is building custom hardware, based on
FPGAs, to run its NNs. Microsoft also. Facebook?
• Mobileye: ConvNet chip for automotive
• Orcam: low-power ConvNet chip
• Torch7 (Lua) – Facebook, Google, Twitter and Intel
• Caffe
44. Open-Source Frameworks for DL
• Torch7 (Lua). Facebook, Google, Twitter and Intel
• Caffe. The community shares models in “Model Zoo”
• NVIDIA cuDNN – DL library
45. Money
Example: DeepMind, 75 employees, no product, £ 400 million
Google AI and robotics purchases timeline
October 1, 2012 Viewdle Facial recognition
March 12, 2013 DNNresearch Inc. Deep neural networks
April 23, 2013 Wavii Natural language processing
October 2, 2013 Flutter Gesture recognition technology
December 2, 2013 Schaft Robotics, humanoid robots
December 3, 2013 Industrial Perception Robotic arms, computer vision
December 4, 2013 Redwood Robotics Robotic arms
December 5, 2013 Meka Robotics Robots
December 6, 2013 Holomni Robotic wheels
December 7, 2013 Bot & Dolly Robotic cameras
December 10, 2013 Boston Dynamics Robotics
January 26, 2014 DeepMind Technologies Artificial intelligence
August 17, 2014 Jetpac
Artificial intelligence, image
recognition
October 23, 2014 Dark Blue Labs Artificial Intelligence
October 23, 2014 Vision Factory Artificial Intelligence
46. Transfer learning + fine tuning
• “training time” vs “execution time” = 5 till 8 orders of magnitude
• DL could be embedded in cars, IoT, smartphones