This session provides an analysis of the machine learning market in the enterprise. The analysis includes vendors, platforms and best practices that should be considered by companies implementing data science solutions at an enterprise scale
2. About Us
⢠Helping great companies become great software companies
⢠Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science
-Cyber-security
-Enterprise IOT
-Powered by Cloud and Mobile
⢠Bringing innovation from startups and academic institutions to the enterprise
⢠Award winning agencies: Inc 500, American Business Awards, International Business Awards
3. About This Webinar
⢠Research that brings together big enterprise software trends,
exciting startups and academic research
⢠Best practices based on real world implementation experience
⢠No sales pitches
8. Modern Machine Learning
⢠Advances in storage, compute and data science research are
making machine learning as part of mainstream technology
platforms
⢠Big data movement
⢠Machine learning platforms are optimized with developer-friendly
interfaces
⢠Platform as a service providers have drastically lowered the
entry point for machine learning applications
⢠R and Python are leading the charge
10. Cloud Machine Learning Platforms: Benefits
⢠Service abstraction layer over the machine learning infrastructure
⢠Rich visual modeling tools
⢠Rich monitoring and tracking interfaces
⢠Combine multiple platforms: R, Python, etc
⢠Enable programmatic access to ML models
11. Cloud machine Learning Platforms:: Challenges
⢠Integration with on-premise data stores
⢠Extensibility
⢠Security and privacy
12. On-Premise machine Learning Platforms: Benefits
⢠Control
⢠Security
⢠Integration with on-premise data stores
⢠Integrated with R and Python machine learning frameworks
13. On-Premise machine Learning Platforms: Challenges
⢠Code-based modeling interfaces
⢠Scalability
⢠Tightly coupled with Hadoop distributions
⢠Monitoring and management
⢠Data quality and curation
17. Azure Machine Learning
⢠Native machine learning capabilities as part of the Azure cloud
⢠Elastic infrastructure that scale based on the model requirements
⢠Support over 30 supervised and unsupervised machine learning
algorithms
⢠Integration with R and Python machine learning libraries
⢠Expose machine learning models via programmable interfaces
⢠Integrated with the Cortana Analytics suite
⢠Integrated with PowerBI
18. ⢠Supports both supervised and
unsupervised models
⢠Integrated with Azure HDInsight
⢠Large library of models and sample
gallery
⢠Support for R and Python code
Visual Model Creation
19. ⢠Visual dashboard to track the
execution of ML models
⢠Track execution of different steps
within a ML model
⢠Integrated monitoring experience
with other Azure services
Rich Monitoring and Management Interface
20. ⢠Expose machine learning models as
Web Services APIs
⢠Integrate ML Models with Azure API
Gateway
⢠Retrain and extend models via ML
APIs
Programmatic Access to ML Models
22. AWS Machine Learning
⢠Native machine learning service in AWS
⢠Provide data exploration and visualization tools
⢠Supports supervised and unsupervised algorithms
⢠Integrated data transformation models
⢠APIs for dynamically creating machine learning models
23. ⢠Programmatic creation of machine
learning models
⢠Large number of algorithms and recipes
⢠Data transformation models included in
the language
Sophisticated ML Model Authoring
24. ⢠Sophisticated monitoring for
evaluating ML models
⢠Integrated with AWS Cloud Watch
⢠KPIs that evaluate the efficiency of
ML models
Monitoring ML Model Execution
25. ⢠Optimized DSL for data
transformation
⢠Recipes that abstract common
transformations
⢠Reuse transformation recipes
across ML models
Embedded Data Transformation
26. ⢠Sophisticated monitoring for
evaluating ML models
⢠Integrated with AWS Cloud Watch
⢠KPIs that evaluate the efficiency of
ML models
Monitoring ML Model Execution
28. Databricks Machine Learning
⢠Scaling Spark machine learning pipelines
⢠Integrated data visualization tools
⢠Sophisticated ML monitoring tools
⢠Combine Python, Scala and R in a single platform
29. ⢠Implementing machine learning
models using Notebooks
⢠Publishing notebooks to a
centralized catalog
⢠Leverage Python, Scala or R to
implement machine learning models
Notebooks Based Authoring
30. ⢠Integrate data visualization into
machine learning pipelines
⢠Reuse data visualization
notebooks across applications
⢠Evaluate the efficiency of
machine learning pipelines using
visualizations
Machine Learning Data Visualization
31. ⢠Monitor the execution of machine
learning pipelines
⢠Run machine learning pipelines
manually
⢠Rapidly modify and deploy machine
learning pipelines
Monitoring and Management
33. ⢠Personality Insights
⢠Tradeoff Analytics
⢠Relationship Extraction
⢠Concept Insights
⢠Speech to Text
⢠Text to Speech
⢠Visual Recognition
⢠Natural Language Classifier
⢠Language Identification
⢠Language Translation
⢠Question and Answer
⢠Concept Expansion
⢠Message Resonance
⢠AlchemyAPI Services
Large Variety of Cognitive Services
34. ⢠Access services via REST APIs
⢠SDKs available for different
languages
⢠Integration with different
services in the BlueMix
platform
Rich Developer Interfaces
40. All of Open Source R plus:
⢠Big Data scalability
⢠High-performance analytics
⢠Development and deployment tools
⢠Data source connectivity
⢠Application integration framework
⢠Multi-platform architecture
⢠Support, Training and Services
Revolution Analytics (Microsoft)
41. DistributedR
ScaleR
ConnectR
DeployR
In the Cloud Amazon AWS
Workstations & Servers Windows
Red Hat and SUSE Linux
Clustered Systems IBM Platform LSF
Microsoft HPC
EDW IBM Netezza
Teradata
Hadoop Hortonworks
Cloudera
Write Once, Deploy Anywhere
42. DeployR does not provide any application UI.
3 integration modes embed real-time R results
into existing interfaces
Web app, mobile app, desktop app, BI tool,
Excel, âŚ
RBroker Framework :
Simple, high-performance API for Java, .NET
and Javascript apps Supports transactional,
on-demand analytics on a stateless R session
Client Libraries:
Flexible control of R services from Java,
.NET and Javascript apps Also supports
stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
Integrate R Scripts Into Third Party Applications
44. ⢠It is built on Apache Spark, a fast and
general engine for large-scale data
processing
⢠Run programs up to 100x faster than Hadoop
MapReduce in memory, or 10x faster on disk.
⢠Write applications quickly in Java, Scala,
or Python.
Spark Mlib
45. ⢠Integrated with Spark SQL for data
queries and transformations
⢠Integrated with Spark GraphX for
data visualizations
⢠Integrated with Spark Streaming for
real time data processing
Beyond Machine Learning
46. ⢠Run R and machine learning models
using the same infrastructure
⢠Leverage R scripts from Spark Mlib
models
⢠Scale R models as part of a Spark
cluster
⢠Execute R models programmatically
using Java APIs
Spark Mlib + SparkR
48. ⢠Makes Python machine learning
enterprise â ready
⢠Graphlab Create
⢠Dato Distributed
⢠Dato Predictive Services
Dato
49.
50.
51. Principles:
⢠Get started fast
⢠Rapidly iterate
⢠Combine for new apps
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(data,
user_id='user',
item_id='moviez
target='rating')
recommendations = model.recommend(k=5)
Recommender Image search Sentiment Analysis
Data Matching Auto Tagging Churn Predictor
Click Prediction Product Sentiment Object Detector
Search Ranking Summarization âŚ
Sophisticated ML made easy - Toolkits
53. ⢠Powers deep learning capabilities on dozens
of Googleâs products
⢠Interfaces for modeling machine and deep
learning algorithms
⢠Platform for executing those algorithms
⢠Scales from mobile devices to a cluster with
thousands of nodes
⢠Has become one of the most popular projects
in Guthub in less than a week
Googleâs Tensor Flow
54. ⢠Based on the principle of a dataflow
graph
⢠Nodes can perform data operations
but also send or receive data
⢠Python and C++ libraries. NodeJS, Go
and others in the pipeline
Tensorflow Programming Model
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print "step %d, training accuracy %g"%(i, train_accuracy)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print "test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
55. ⢠Scales from a single device to a large
cluster of nodes
⢠Tensorflow uses a placement algorithm
based on heuristics to place tasks on
the different nodes in a graph
⢠The execution engine assigns tasks for
fault tolerance
⢠Linear scalability model
Tensor Flow Implementation
56. ⢠TensorFlow includes an engine that
enables the visual representation of
the execution graph
⢠Visualizations include summary
statistics of the different states of
the model
⢠The visualization engine is included
in the current open source release
Tensor Flow Graph Visualization
59. â˘Enable foundational building blocks
-Data quality
-Data discovery
-Functional and integration testing
â˘Predictions are tempting but classification and clustering are
easier
â˘Run multiple models at once
â˘Enable programmatic interfaces to interact with ML models
â˘Start small, deliver quickly, iterateâŚ
Machine Learning in the Enterprise
60. â˘Machine learning is becoming one of the most important elements of
modern enterprise solutions
â˘Innovation in machine learning is happening in both the on-premise
and cloud space
â˘Cloud machine learning innovators include: Azure ML, AWS ML,
Databricks and IBM Watson
â˘On-premise machine learning innovators include: Spark Mlib,
Microsoftâs Revolution R, Dato, TensorFlow
â˘Enterprise machine learning solutions should include elements such
as data quality, data governance, etc
â˘Start small and use real use cases
Summary
63. ⢠Extensions to SciPy (Scientific Python) are called SciKits. SciKit-Learn
provides machine learning algorithms.
⢠Algorithms for supervised & unsupervised learning
⢠Built on SciPy and Numpy
⢠Standard Python API interface
⢠Sits on top of c libraries, LAPACK, LibSVM, and Cython
⢠Open Source: BSD License (part of Linux)
⢠Probably the best general ML framework out there.
Scikit-Learn
64. Load &
Transform Data
Raw Data
Feature
Extraction
Build Model
Feature
Evaluation
Very Simple Prediction Model
Evaluate
Model
65. Assess how model will generalize to independent data set (e.g.
data not in the training set).
1. Divide data into training and test splits
2. Fit model on training, predict on test
3. Determine accuracy, precision and recall
4. Repeat k times with different splits then average as F1
Predicted Class A Predicted Class B
Actual A True A False B #A
Actual B False A True B #B
#P(A) #P(B) total
Simple Programming Model-Cross Validation (classification)
66. How to evaluate clusters? Visualization (but only in 2D)
Data Visualization
68. ⢠Developer friendly machine learning platform
⢠Completely open source
⢠Based on Apache Spark
PredictionIO
69. ⢠PredictionIO platform
A machine learning stack for building, evaluating
and deploying engines with machine learning
algorithms.
⢠Event Server
An open source machine learning analytics layer for
unifying events from multiple platforms
⢠Template Gallery
engine templates for different type of machine
learning applications
A Simple Architecture
70. ⢠Execute models asynchronous via event
interface
⢠Query data programmatically via REST
interface
⢠Various SDKs provided as part of the platform
Model Execution
71. ⢠Visual model for model creation
⢠Integrated with a template gallery
⢠Ability to test and valite engines
Rich Model Creation Interface