SlideShare ist ein Scribd-Unternehmen logo
1 von 72
Downloaden Sie, um offline zu lesen
Distributed Deep Learning
with Hadoop and TensorFlow
Image Classification- 2016
Human Performance AI Performance
https://arxiv.org/pdf/1602.07261.pdf
95% 97%
The ability to understand the content of an image by using machine learning
4
AI beats human in games - 2016
Komodo beasts H. Nakamura in 2016AlphaGo beats L. Sedols in 2016
Go 4:1 Chess 2:1
Breast Cancer Diagnoses - 2017
Pathologist Performance AI Performance
https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html
73% 92%
Doctors often use additional tests to find or diagnose breast cancer
The pathologist ended up
spending 30 hours on this
task on 130 slides
A closeup of a lymph node biopsy.
Google TPU
The power of 12 GB HBM2 memory and 640 Tensor
Cores, delivering 110 TeraFLOPS of performance.
AI history à Perceptron
1958 F. Rosenblatt,
“Perceptron” model,
neuronal networks
1943 W. McCulloch,
W. Pitts, “Neuron” as
logical element
OR function XOR function
1969 M. Minsky,
S. Papert, triggers
first AI winter
feed forward
AI history à AI winter
1958 F. Rosenblatt,
Perzeptron model,
neuronal networks
1987-1993 the second
AI winter, desktop
computer, LISP
machines expensive
1943 W. McCulloch,
W. Pitts, neuron as
logical element
1980 Boom expert
systems, Q&A using
logical rules, Prolog
1969 M. Minsky,
S. Papert, trigger
first AI winter
1993-2001
Moore’s law, Deep
blue chess-
playing, Standford
DARPA challenge
12
Machine Learning Problem Types
Structured data
80% of world’s data is unstructured
Fishing in the sea versus fishing in the lake
Data Warehouse Data Lake
Business Intellingence helps find
answers to questions you know.
Data Science helps you find the
question itself.
Any kind of data & schema-on-readStructured data & schema-on-write
Parallel processing on big dataSQL-ish queries on database tables
Extract, Transform, Load Extract, Load, Transform-on-the-fly
Low cost on commodity hardwareExpensive for large data
More Data + Bigger Models
Accuracy
Scale (data size, model size)
other approaches
neural networks
1990s
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
More Data + Bigger Models + More Computation
Accuracy
Scale (data size, model size)
other approaches
neural networks
Now
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
more compute
More Data + Bigger Models + More Computation
= Better Results in Machine Learning
Millions of “trip”
events each day globally
400+ billion viewing-
related events per day
Five billion data points
for Price Tip feature
Movie
recommendation
Price
optimization
Routing and price
optimization
How to start?
Single machineML specialist Small data
Single machineML specialist Small data
Single machineML specialist Small data
Single machineML specialist Small data
Single machineML specialist Small data
X X
Single machineML specialist Big data
Single machineML specialist Big data
X X
Train and evaluate machine learning models at scale
Single machine Data center
How to run more experiments faster and in parallel?
How to share and reproduce research?
How to go from research to real products?
Distributed Machine Learning
Data Size
Model Size
Model parallelism
Single machine
Data center
Data
parallelism
training very large models exploring several model
architectures, hyper-
parameter optimization,
training several
independent models
speeds up the training
Compute Workload for Training and Evaluation
I/O intensive
Compute
intensive
Single machine
Data center
I/O Workload for Simulation and Testing
I/O intensive
Compute
intensive
Single machine
Data center
Distributed Machine Learning
Distributed Machine Learning
X
The new rising star
12/19/17 31
TensorFlow
Standalone
TensorFlow
On YARN
TensorFlow
On multi-
colored YARN
TensorFlow
On Spark
TensorFrames
TensorFlow
On
Kubernetes
TensorFlow
On Mesos
Distributed TensorFlow on
Hadoop, Mesos, Kubernetes,
Spark
https://www.slideshare.net/jwiegelmann/distributed
-tensorflow-on-hadoop-mesos-kubernetes-spark
Data Parallel vs. Model Parallel
http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf
Between-Graph Replication In-Graph Replication
Data Shards vs. Data Combined
https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
Synchronous vs. Asynchronous
https://arxiv.org/pdf/1603.04467.pdf
TensorFlow Standalone
https://www.tensorflow.org/
TensorFlow Standalone
Dedicated cluster
Short & long running jobs
Flexibility
Manual scheduling of workers
No shared resources
Hard to share data with other
applications
No data locality
TensorFlow On YARN (Intel) v3
https://github.com/Intel-bigdata/TensorFlowOnYARN
released March 12, 2017 / YARN-6043
TensorFlow On YARN (Intel)
Shared cluster and data
Optimised long running jobs
Scheduling
Data locality (not yet implemented)
Not easy to have rapid adoption
from upstream
Fault tolerance not yet implemented
GPU still not seen as a “native”
resource on yarn
No use of yarn elasticity
TensorFlow On multi-colored YARN (Hortonworks)
v3
Not yet implemented!
https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/
TensorFlow On multi-colored YARN (Hortonworks)
Shared cluster
GPUs shared by multiple tenants
and applications
Centralised scheduling
YARN-3611 Docker support
YARN-4793 Native processes
Needs YARN wrapper of NVIDIA
Docker (GPU driver)
Not implemented yet!
TensorFlow On Spark (Yahoo) v2
https://github.com/yahoo/TensorFlowOnSpark
released January 22, 2017
TensorFlow On Spark (Yahoo)
Shared cluster and data
Data locality through HDFS or
other Spark sources
Add-hoc training and evaluation
Slice and dice data with Spark
distributed transformations
Scheduling not optimal
Necessary to “convert” existing
TensorFlow application, although
simple process
Might need to restart Spark cluster
No GPU resource management
TensorFrames (Databricks) v2
Scala binding to TF via JNI https://github.com/databricks/tensorframes
released Feb 28, 2016
TensorFrames (Databricks)
Possible shared cluster
TensorFrame infers the shapes
for small tensors (no analyse
required)
Data locality via RDD
Experimental
Still not centralised scheduling, TF
and Spark need to be deployed
and scheduled separately
TF and Spark might not be
collocated
Might need data transfer between
some nodes
TensorFlow On Kubernetes
https://github.com/tensorflow/ecosystem
TensorFlow On Kubernetes
Shared cluster
Centralised scheduling by
Kubernetes
Solved network orchestration,
federation etc.
Experimental support for
managing NVIDIA GPUs (at this
time better than yarn however)
Fault tolerance
Data locality
TensorFlow On Mesos
Marathon
https://github.com/douban/tfmesos
TensorFlow On Mesos
Shared cluster
GPU-based scheduling
Short and long running jobs
Memory footprint
Number of services relative to
Kubernetes
Fault tolerance
Data locality
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Google, 2015
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Google, 2015
http://stevenwhang.com/tfx_paper.pdf
TFX: A TensorFlow-Based Production-Scale
Machine Learning Platform
Google, 2017
https://eng.uber.com/michelangelo/
Michelangelo: Uber’s Machine Learning Platform
http://searchbusinessanalytics.techtarget.com/feature/Machine-learning-platforms-comparison-Amazon-Azure-Google-IBM
Pricing for 890,000 real-time predictions w/o training
AWS:
Compute Fees + Prediction Fees = $8.40 + $96.44
= $104.84 per month
Google:
Prediction $0.10 per thousand predictions, plus $0.40 per hour
= $377 per month
Azure:
Packages $0, $100,13, $1.000,06, $9.999,98
= $1.000 per month
Q3, 2017
LESSONS LEARNED
High-level Development Process for Autonomous Vehicles
1 Collect
sensors data
3 Autonomous
Driving
2 Model
Engineering
Data Logger Control Unit
Big Data Trained Model
Data Center
Agenda
Sensors Udacity Lincoln MKZ
Camera 3x Blackfly GigE Camera, 20 Hz
Lidar Velodyne HDL-32E, 9.5 Hz
IMU Xsens, 400 Hz
GPS 2x fixed, 1 Hz
CAN bus, 1,1 kHz
Robot Operating System
Data 3 GB per minute
https://github.com/udacity/self-driving-car
Sensors Spec
Sensor blinding,
sunlight,
darkness
rain, fog,
snow
non-metal
objects
wind/ high
velocity
resolution range data
Ultrasonic yes yes yes no + + +
Lidar yes no yes yes +++ ++ +
Radar yes yes no yes ++ +++ +
Camera no no yes yes +++ +++ +++
Machine Learning 101
Observations
State
Estimation
Modeling &
Prediction
Planning
Controls
f(x)
Controls
Observations
Machine Learning for Autonomous Driving
+ Sensor Fusion clustering, segmentation, pattern recognition
+ Road ego-motion, image processing and pattern recognition
+ Localization simultaneous localization and mapping
+ Situation Understanding detection and classification
+ Trajectory Planning motion planning and control
+ Control Strategy reinforcement and supervised learning
+ Driver Model image processing and pattern recognition
Machine Learning Cycle
Data collection
for training/test
Feature
engineering
I/O workload
Model development
and architecture
Compute workload I/O workload
Training and
evaluation
Re- Simulation
and Testing
Scaling and
monitoring
Model deployment
versioning
1 2 3
Model tuning
Flux – Open Machine Learning Stack
Training & Test data
Compute + Network + Storage
Deploy model
ML Development & Catalog & REST API
ML-Specialists
Feature
Engineering
Training
Evaluation
Re-Simulation
Testing
CaffeOnSpark
Sample Model Prediction Batch Regression Cluster
Dataset Correlation Centroid Anomaly Test Scores
ü Mainly open source
ü No vendor lock in
ü Scale-out architecture
ü Multi user support
ü Resource management
ü Job scheduling
ü Speed-up training
ü Speed-up simulation
Feature Engineering
+ Hadoop InputFormat and
Record Reader for Rosbag
+ Process Rosbag with Spark,
Yarn, MapReduce, Hadoop
Streaming API, …
+ Spark RDD are cached and
optimized for analysis
Ros
bag
Processing
Engine
Computer
Network
Storage
Advanced
Analytics
RDD
Record
Reader
RDD
DataFrame, DataSet
SQL, Spark APIs
NumPy
Ros
Msg
Training & Evaluation
+ Tensorflow ROSRecordDataset
+ Protocol Buffers to serialize
records
+ Save time because data
conversion not needed
+ Save storage because data
duplication not needed
Training
Engine
Machine
Learning
Ros
bag
Computer
Network
Storage
ROS
Dataset
Ros
msg
Re-Simulation & Testing
+ Use Spark for preprocessing,
transformation, cleansing,
aggregation, time window
selection before publish to ROS
topics
+ Use Re-Simulation framework
of choice to subscribe to the
ROS topics
Engine
Re-Simulation
with framework
of choice
Computer
Network
Storage
Ros
bag
Ros
topic
core
subscribe
publish
Time Travel
fold(left)
t
fold(right)
reduce/
shuffle
HOW TO START?
Think Big Business Strategy
Data Strategy
Technology Strategy
Agile Delivery Model
Business Case Validation
Prototypes, MVPs
Data Exploration
Data AcquisitionStart Small
Value
Proposition
+ Classification, Regression, Clustering,
Collaborative Filtering, Anomaly Detection
+ Supervised/Unsupervised Reinforcement
Learning, Deep Learning, CNN
+ Model Training, Evaluation, Testing,
Simulation, Inference
+ Big Data Strategy, Consulting, Data
Lab, Data Science as a Service
+ Data Collection, Cleaning, Analyzing,
Modeling, Validation, Visualization
+ Business Case Validation,
Prototyping, MVPs, Dashboards
Data Science Machine Learning
+ Architecture, DevOps, Cloud Building
+ App. Management Hadoop Ecosystem
+ Managed Infrastructure Services
+ Compute, Network, Storage, Firewall,
Loadbalancer, DDoS, Protection
+ Continuous Integration and Deployment
+ Data Pipelines (Acquisition,
Ingestion, Analytics, Visualization)
+ Distributed Data Architectures
+ Data Processing Backend
+ Hadoop Ecosystem
+ Test Automation and Testing
Data Engineering Data Operations
“Culture eats strategy for breakfast,
technology for lunch, and products for dinner,
and soon thereafter everything else too.”
Peter Drucker
thank you

Weitere ähnliche Inhalte

Was ist angesagt?

How Spark is Making an Impact at Goldman Sachs by Vincent Saulys
How Spark is Making an Impact at Goldman Sachs by Vincent SaulysHow Spark is Making an Impact at Goldman Sachs by Vincent Saulys
How Spark is Making an Impact at Goldman Sachs by Vincent SaulysSpark Summit
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Vitaly Bondar
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkDatabricks
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesDataStax
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Was ist angesagt? (20)

How Spark is Making an Impact at Goldman Sachs by Vincent Saulys
How Spark is Making an Impact at Goldman Sachs by Vincent SaulysHow Spark is Making an Impact at Goldman Sachs by Vincent Saulys
How Spark is Making an Impact at Goldman Sachs by Vincent Saulys
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Ähnlich wie Distributed Deep Learning with Hadoop and TensorFlow

Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | ValtechDN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | ValtechDataconomy Media
 
Machine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsMachine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsJan Wiegelmann
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformGeekNightHyderabad
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache SparkSpark Summit
 
NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Japan
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosysteminovex GmbH
 

Ähnlich wie Distributed Deep Learning with Hadoop and TensorFlow (20)

Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | ValtechDN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
 
Machine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsMachine Learning for Self-Driving Cars
Machine Learning for Self-Driving Cars
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
BigData
BigDataBigData
BigData
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
The Revolution of Deep Learning
The Revolution of Deep LearningThe Revolution of Deep Learning
The Revolution of Deep Learning
 
NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosystem
 

Mehr von Jan Wiegelmann

Analytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROSAnalytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROSJan Wiegelmann
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingJan Wiegelmann
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationData Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationJan Wiegelmann
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkJan Wiegelmann
 
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...Jan Wiegelmann
 
10 things A.I. can do better than you
10 things A.I. can do better than you10 things A.I. can do better than you
10 things A.I. can do better than youJan Wiegelmann
 

Mehr von Jan Wiegelmann (7)

Analytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROSAnalytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROS
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationData Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital Transformation
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
 
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
 
10 things A.I. can do better than you
10 things A.I. can do better than you10 things A.I. can do better than you
10 things A.I. can do better than you
 

Kürzlich hochgeladen

International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 

Kürzlich hochgeladen (20)

International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 

Distributed Deep Learning with Hadoop and TensorFlow

  • 1. Distributed Deep Learning with Hadoop and TensorFlow
  • 2.
  • 3. Image Classification- 2016 Human Performance AI Performance https://arxiv.org/pdf/1602.07261.pdf 95% 97% The ability to understand the content of an image by using machine learning
  • 4. 4 AI beats human in games - 2016 Komodo beasts H. Nakamura in 2016AlphaGo beats L. Sedols in 2016 Go 4:1 Chess 2:1
  • 5. Breast Cancer Diagnoses - 2017 Pathologist Performance AI Performance https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html 73% 92% Doctors often use additional tests to find or diagnose breast cancer The pathologist ended up spending 30 hours on this task on 130 slides A closeup of a lymph node biopsy.
  • 7.
  • 8. The power of 12 GB HBM2 memory and 640 Tensor Cores, delivering 110 TeraFLOPS of performance.
  • 9.
  • 10. AI history à Perceptron 1958 F. Rosenblatt, “Perceptron” model, neuronal networks 1943 W. McCulloch, W. Pitts, “Neuron” as logical element OR function XOR function 1969 M. Minsky, S. Papert, triggers first AI winter feed forward
  • 11. AI history à AI winter 1958 F. Rosenblatt, Perzeptron model, neuronal networks 1987-1993 the second AI winter, desktop computer, LISP machines expensive 1943 W. McCulloch, W. Pitts, neuron as logical element 1980 Boom expert systems, Q&A using logical rules, Prolog 1969 M. Minsky, S. Papert, trigger first AI winter 1993-2001 Moore’s law, Deep blue chess- playing, Standford DARPA challenge
  • 13. Structured data 80% of world’s data is unstructured
  • 14. Fishing in the sea versus fishing in the lake Data Warehouse Data Lake Business Intellingence helps find answers to questions you know. Data Science helps you find the question itself. Any kind of data & schema-on-readStructured data & schema-on-write Parallel processing on big dataSQL-ish queries on database tables Extract, Transform, Load Extract, Load, Transform-on-the-fly Low cost on commodity hardwareExpensive for large data
  • 15. More Data + Bigger Models Accuracy Scale (data size, model size) other approaches neural networks 1990s https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
  • 16. More Data + Bigger Models + More Computation Accuracy Scale (data size, model size) other approaches neural networks Now https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI more compute
  • 17. More Data + Bigger Models + More Computation = Better Results in Machine Learning
  • 18. Millions of “trip” events each day globally 400+ billion viewing- related events per day Five billion data points for Price Tip feature Movie recommendation Price optimization Routing and price optimization
  • 21. Single machineML specialist Small data Single machineML specialist Small data
  • 22. Single machineML specialist Small data Single machineML specialist Small data X X
  • 23. Single machineML specialist Big data Single machineML specialist Big data X X
  • 24. Train and evaluate machine learning models at scale Single machine Data center How to run more experiments faster and in parallel? How to share and reproduce research? How to go from research to real products?
  • 25. Distributed Machine Learning Data Size Model Size Model parallelism Single machine Data center Data parallelism training very large models exploring several model architectures, hyper- parameter optimization, training several independent models speeds up the training
  • 26. Compute Workload for Training and Evaluation I/O intensive Compute intensive Single machine Data center
  • 27. I/O Workload for Simulation and Testing I/O intensive Compute intensive Single machine Data center
  • 31. 12/19/17 31 TensorFlow Standalone TensorFlow On YARN TensorFlow On multi- colored YARN TensorFlow On Spark TensorFrames TensorFlow On Kubernetes TensorFlow On Mesos Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark https://www.slideshare.net/jwiegelmann/distributed -tensorflow-on-hadoop-mesos-kubernetes-spark
  • 32. Data Parallel vs. Model Parallel http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf Between-Graph Replication In-Graph Replication
  • 33. Data Shards vs. Data Combined https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
  • 36. TensorFlow Standalone Dedicated cluster Short & long running jobs Flexibility Manual scheduling of workers No shared resources Hard to share data with other applications No data locality
  • 37. TensorFlow On YARN (Intel) v3 https://github.com/Intel-bigdata/TensorFlowOnYARN released March 12, 2017 / YARN-6043
  • 38. TensorFlow On YARN (Intel) Shared cluster and data Optimised long running jobs Scheduling Data locality (not yet implemented) Not easy to have rapid adoption from upstream Fault tolerance not yet implemented GPU still not seen as a “native” resource on yarn No use of yarn elasticity
  • 39. TensorFlow On multi-colored YARN (Hortonworks) v3 Not yet implemented! https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/
  • 40. TensorFlow On multi-colored YARN (Hortonworks) Shared cluster GPUs shared by multiple tenants and applications Centralised scheduling YARN-3611 Docker support YARN-4793 Native processes Needs YARN wrapper of NVIDIA Docker (GPU driver) Not implemented yet!
  • 41. TensorFlow On Spark (Yahoo) v2 https://github.com/yahoo/TensorFlowOnSpark released January 22, 2017
  • 42. TensorFlow On Spark (Yahoo) Shared cluster and data Data locality through HDFS or other Spark sources Add-hoc training and evaluation Slice and dice data with Spark distributed transformations Scheduling not optimal Necessary to “convert” existing TensorFlow application, although simple process Might need to restart Spark cluster No GPU resource management
  • 43. TensorFrames (Databricks) v2 Scala binding to TF via JNI https://github.com/databricks/tensorframes released Feb 28, 2016
  • 44. TensorFrames (Databricks) Possible shared cluster TensorFrame infers the shapes for small tensors (no analyse required) Data locality via RDD Experimental Still not centralised scheduling, TF and Spark need to be deployed and scheduled separately TF and Spark might not be collocated Might need data transfer between some nodes
  • 46. TensorFlow On Kubernetes Shared cluster Centralised scheduling by Kubernetes Solved network orchestration, federation etc. Experimental support for managing NVIDIA GPUs (at this time better than yarn however) Fault tolerance Data locality
  • 48. TensorFlow On Mesos Shared cluster GPU-based scheduling Short and long running jobs Memory footprint Number of services relative to Kubernetes Fault tolerance Data locality
  • 49. Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Google, 2015
  • 50. Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Google, 2015
  • 51. http://stevenwhang.com/tfx_paper.pdf TFX: A TensorFlow-Based Production-Scale Machine Learning Platform Google, 2017
  • 54. Pricing for 890,000 real-time predictions w/o training AWS: Compute Fees + Prediction Fees = $8.40 + $96.44 = $104.84 per month Google: Prediction $0.10 per thousand predictions, plus $0.40 per hour = $377 per month Azure: Packages $0, $100,13, $1.000,06, $9.999,98 = $1.000 per month Q3, 2017
  • 56. High-level Development Process for Autonomous Vehicles 1 Collect sensors data 3 Autonomous Driving 2 Model Engineering Data Logger Control Unit Big Data Trained Model Data Center Agenda
  • 57. Sensors Udacity Lincoln MKZ Camera 3x Blackfly GigE Camera, 20 Hz Lidar Velodyne HDL-32E, 9.5 Hz IMU Xsens, 400 Hz GPS 2x fixed, 1 Hz CAN bus, 1,1 kHz Robot Operating System Data 3 GB per minute https://github.com/udacity/self-driving-car
  • 58. Sensors Spec Sensor blinding, sunlight, darkness rain, fog, snow non-metal objects wind/ high velocity resolution range data Ultrasonic yes yes yes no + + + Lidar yes no yes yes +++ ++ + Radar yes yes no yes ++ +++ + Camera no no yes yes +++ +++ +++
  • 59. Machine Learning 101 Observations State Estimation Modeling & Prediction Planning Controls f(x) Controls Observations
  • 60. Machine Learning for Autonomous Driving + Sensor Fusion clustering, segmentation, pattern recognition + Road ego-motion, image processing and pattern recognition + Localization simultaneous localization and mapping + Situation Understanding detection and classification + Trajectory Planning motion planning and control + Control Strategy reinforcement and supervised learning + Driver Model image processing and pattern recognition
  • 61. Machine Learning Cycle Data collection for training/test Feature engineering I/O workload Model development and architecture Compute workload I/O workload Training and evaluation Re- Simulation and Testing Scaling and monitoring Model deployment versioning 1 2 3 Model tuning
  • 62. Flux – Open Machine Learning Stack Training & Test data Compute + Network + Storage Deploy model ML Development & Catalog & REST API ML-Specialists Feature Engineering Training Evaluation Re-Simulation Testing CaffeOnSpark Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores ü Mainly open source ü No vendor lock in ü Scale-out architecture ü Multi user support ü Resource management ü Job scheduling ü Speed-up training ü Speed-up simulation
  • 63. Feature Engineering + Hadoop InputFormat and Record Reader for Rosbag + Process Rosbag with Spark, Yarn, MapReduce, Hadoop Streaming API, … + Spark RDD are cached and optimized for analysis Ros bag Processing Engine Computer Network Storage Advanced Analytics RDD Record Reader RDD DataFrame, DataSet SQL, Spark APIs NumPy Ros Msg
  • 64. Training & Evaluation + Tensorflow ROSRecordDataset + Protocol Buffers to serialize records + Save time because data conversion not needed + Save storage because data duplication not needed Training Engine Machine Learning Ros bag Computer Network Storage ROS Dataset Ros msg
  • 65. Re-Simulation & Testing + Use Spark for preprocessing, transformation, cleansing, aggregation, time window selection before publish to ROS topics + Use Re-Simulation framework of choice to subscribe to the ROS topics Engine Re-Simulation with framework of choice Computer Network Storage Ros bag Ros topic core subscribe publish
  • 68. Think Big Business Strategy Data Strategy Technology Strategy Agile Delivery Model Business Case Validation Prototypes, MVPs Data Exploration Data AcquisitionStart Small Value Proposition
  • 69. + Classification, Regression, Clustering, Collaborative Filtering, Anomaly Detection + Supervised/Unsupervised Reinforcement Learning, Deep Learning, CNN + Model Training, Evaluation, Testing, Simulation, Inference + Big Data Strategy, Consulting, Data Lab, Data Science as a Service + Data Collection, Cleaning, Analyzing, Modeling, Validation, Visualization + Business Case Validation, Prototyping, MVPs, Dashboards Data Science Machine Learning
  • 70. + Architecture, DevOps, Cloud Building + App. Management Hadoop Ecosystem + Managed Infrastructure Services + Compute, Network, Storage, Firewall, Loadbalancer, DDoS, Protection + Continuous Integration and Deployment + Data Pipelines (Acquisition, Ingestion, Analytics, Visualization) + Distributed Data Architectures + Data Processing Backend + Hadoop Ecosystem + Test Automation and Testing Data Engineering Data Operations
  • 71. “Culture eats strategy for breakfast, technology for lunch, and products for dinner, and soon thereafter everything else too.” Peter Drucker