SlideShare a Scribd company logo
1 of 35
Download to read offline
Building Machine
Learning Applications
with Sparkling Water
04/15/2015 Meetup
Michal Malohlava and Alex Tellez and H2O.ai
TBD
Head of Sales
Distributed
Systems
Engineers
Making

ML Scale!
Team@H2O.ai
Scalable 

Machine Learning
For Smarter
Applications
Smarter Applications
Scalable Applications
Distributed
Able to process huge amount of data from
different sources
Easy to develop and experiment
Powerful machine learning engine inside
BUT
how to build
them?
Build an application
with …
?
…with Spark and H2O
Open-source distributed execution platform
User-friendly API for data transformation based on RDD
Platform components - SQL, MLLib, text mining
Multitenancy
Large and active community
Open-source scalable machine
learning platform
Tuned for efficient computation
and memory use
Production ready machine
learning algorithms
R, Python, Java, Scala APIs
Interactive UI, robust data parser
Ensembles
Deep Neural Networks
• Generalized Linear Models : Binomial, Gaussian,
Gamma, Poisson and Tweedie
• Cox Proportional Hazards Models
• Naïve Bayes
• Distributed Random Forest : Classification or
regression models
• Gradient Boosting Machine : Produces an
ensemble of decision trees with increasing refined
approximations
• Deep learning : Create multi-layer feed forward
neural networks starting with an input layer
followed by multiple layers of nonlinear
transformations
Statistical Analysis
Dimensionality
Reduction
Anomaly
Detection
• K-means : Partitions observations into k
clusters/groups of the same spatial size
• Principal Component Analysis : Linearly
transforms correlated variables to
independent components
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
Clustering
Supervised
Learning
Unsupervised
Learning
Sparkling Water
Provides
Transparent integration of H2O with Spark ecosystem
Transparent use of H2O data structures and
algorithms with Spark API
Excels in existing Spark workflows
requiring advanced Machine Learning
algorithms
Platform for building
Smarter
Applications
Sparkling Water Design
spark-submit
Spark
Master
JVM
Spark
Worker
JVM
Spark
Worker
JVM
Spark
Worker
JVM
Sparkling Water Cluster
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O
Sparkling
App
implements
?
Regular Spark application
containing also
Sparkling Water
classes
Data Distribution
H2O
H2O
H2O
Sparkling Water Cluster
Spark Executor JVM
Data
Source
(e.g.
HDFS)
H2O
RDD
Spark Executor JVM
Spark Executor JVM
Spark
RDD
RDDs and DataFrames
share same memory
space
toRDD
toH2OFrame
Lets build
an application !
OR
Detect spam text messages
Data example
case class SMS(target: String, fv: Vector)
ham / spam feature vector
ML Workflow
1. Extract data
2. Transform, tokenize messages
3. Build Tf-IDF
4. Create and evaluate 

Deep Learning model
5. Score new messages
Goal: For a given text message identify if
it is spam or not
Application
environment
sparkling-shell
Lego #1: Data load
// Data load

def load(dataFile: String): RDD[Array[String]] = {

sc.textFile(dataFile).map(l => l.split(“t"))
.filter(r => !r(0).isEmpty)

}
Produces [response,message]
Lego #2: Ad-hoc
Tokenization
def tokenize(data: RDD[String]): RDD[Seq[String]] = {

val ignoredWords = Seq("the", “a", …)

val ignoredChars = Seq(',', ‘:’, …)



val texts = data.map( r => {

var smsText = r.toLowerCase

for( c <- ignoredChars) {

smsText = smsText.replace(c, ' ')

}



val words =smsText.split(" ").filter(w =>
!ignoredWords.contains(w) && w.length>2).distinct

words.toSeq

})

texts

}
Message Bag of words
Lego #3: Tf-IDF
def buildIDFModel(tokens: RDD[Seq[String]],

minDocFreq:Int = 4,

hashSpaceSize:Int = 1 << 10):
(HashingTF, IDFModel, RDD[Vector]) = {

// Hash strings into the given space

val hashingTF = new HashingTF(hashSpaceSize)

val tf = hashingTF.transform(tokens)

// Build term frequency-inverse document frequency

val idfModel = new IDF(minDocFreq=minDocFreq).fit(tf)

val expandedText = idfModel.transform(tf)

(hashingTF, idfModel, expandedText)

}
Hash words

into large 

space
Term freq scale
“Thank for the order…”
[…,0,3.5,0,1,0,0.3,0,1.3,0,0,…]
Thank Order
Bag of words
hashing
functions
term freq

model
Feature
vectors representing text
Lego #4: Build a model
def buildDLModel(train: Frame, valid: Frame,

epochs: Int = 10, l1: Double = 0.001, l2: Double = 0.0,

hidden: Array[Int] = Array[Int](200, 200))

(implicit h2oContext: H2OContext): DeepLearningModel = {

import h2oContext._

// Build a model

val dlParams = new DeepLearningParameters()

dlParams._destination_key = Key.make("dlModel.hex").asInstanceOf[Key[Frame]]

dlParams._train = train

dlParams._valid = valid

dlParams._response_column = 'target

dlParams._epochs = epochs

dlParams._l1 = l1

dlParams._hidden = hidden



// Create a job

val dl = new DeepLearning(dlParams)

val dlModel = dl.trainModel.get



// Compute metrics on both datasets

dlModel.score(train).delete()

dlModel.score(valid).delete()



dlModel

}
Deep Learning: Create
multi-layer feed forward
neural networks starting
with an input layer
followed by multiple
l a y e r s o f n o n l i n e a r
transformations
H2O model

builder API
Do final scoring

on train/test data
Assembly application
// Data load

val data = load(DATAFILE)

// Extract response spam or ham

val hamSpam = data.map( r => r(0))

val message = data.map( r => r(1))

// Tokenize message content

val tokens = tokenize(message)



// Build IDF model

var (hashingTF, idfModel, tfidf) = buildIDFModel(tokens)



// Merge response with extracted vectors

val resultRDD: SchemaRDD = hamSpam.zip(tfidf).map(v => SMS(v._1, v._2))

val table:DataFrame = resultRDD



// Split table

val keys = Array[String]("train.hex", "valid.hex")

val ratios = Array[Double](0.8)

val frs = split(table, keys, ratios)

val (train, valid) = (frs(0), frs(1))

table.delete()



// Build a model

val dlModel = buildDLModel(train, valid)
Split dataset
Build model
Data munging
Data exploration
Model evaluation
val trainMetrics = binomialMM(dlModel, train)

val validMetrics = binomialMM(dlModel, valid)
Collect Model 

Metrics
Spam predictor
def isSpam(msg: String,

dlModel: DeepLearningModel,

hashingTF: HashingTF,

idfModel: IDFModel,

hamThreshold: Double = 0.5):Boolean = {

val msgRdd = sc.parallelize(Seq(msg))

val msgVector: SchemaRDD = idfModel.transform(

hashingTF.transform (

tokenize (msgRdd)))
.map(v => SMS("?", v))

val msgTable: DataFrame = msgVector

msgTable.remove(0) // remove first column

val prediction = dlModel.score(msgTable)

prediction.vecs()(1).at(0) < hamThreshold

}
Prepared models
Default decision
threshold
Scoring
Predict spam
isSpam("Michal, beer
tonight in MV?")
isSpam("We tried to contact
you re your reply
to our offer of a Video
Handset? 750
anytime any networks mins?
UNLIMITED TEXT?")
Interactions with
application from R
You need to install
H2O’s R-package from:
http://h2o-release.s3.amazonaws.com/h2o-dev/master/
1109/index.html#R
Use app from R
Collect Model 

Metrics
Where is the code?
https://github.com/h2oai/sparkling-water/
blob/master/examples/scripts/
Sparkling Water Download
http://h2o.ai/download/
http://h2o-release.s3.amazonaws.com/
sparkling-water/master/93/index.html
Checkout H2O.ai Training Books
http://learn.h2o.ai/

Checkout H2O.ai Blog
http://h2o.ai/blog/

Checkout H2O.ai Youtube Channel
https://www.youtube.com/user/0xdata

Checkout GitHub
https://github.com/h2oai/sparkling-water
Meetups
https://meetup.com/
More info
Learn more at h2o.ai
Follow us at @h2oai
Thank you!
Sparkling Water is
open-source

ML application platform
combining

power of Spark and H2O

More Related Content

What's hot

Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFramePrashant Gupta
 
Compilers Are Databases
Compilers Are DatabasesCompilers Are Databases
Compilers Are DatabasesMartin Odersky
 
C, C++ Training Institute in Chennai , Adyar
C, C++ Training Institute in Chennai , AdyarC, C++ Training Institute in Chennai , Adyar
C, C++ Training Institute in Chennai , AdyarsasikalaD3
 
Think Like Spark
Think Like SparkThink Like Spark
Think Like SparkAlpine Data
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in ScalaPatrick Nicolas
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 
Intake 38 data access 3
Intake 38 data access 3Intake 38 data access 3
Intake 38 data access 3Mahmoud Ouf
 
Towards a language server protocol infrastructure for graphical modeling
Towards a language server protocol infrastructure for graphical modelingTowards a language server protocol infrastructure for graphical modeling
Towards a language server protocol infrastructure for graphical modelingRoberto Rodriguez-Echeverria
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 

What's hot (13)

Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
 
Compilers Are Databases
Compilers Are DatabasesCompilers Are Databases
Compilers Are Databases
 
Spark core
Spark coreSpark core
Spark core
 
C, C++ Training Institute in Chennai , Adyar
C, C++ Training Institute in Chennai , AdyarC, C++ Training Institute in Chennai , Adyar
C, C++ Training Institute in Chennai , Adyar
 
Think Like Spark
Think Like SparkThink Like Spark
Think Like Spark
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
R programming language
R programming languageR programming language
R programming language
 
Intake 38 data access 3
Intake 38 data access 3Intake 38 data access 3
Intake 38 data access 3
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Towards a language server protocol infrastructure for graphical modeling
Towards a language server protocol infrastructure for graphical modelingTowards a language server protocol infrastructure for graphical modeling
Towards a language server protocol infrastructure for graphical modeling
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 

Viewers also liked

Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015
Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015
Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015Luc Stakenborg
 
Waternomics: Key impacts for smart water management
Waternomics: Key impacts for smart water managementWaternomics: Key impacts for smart water management
Waternomics: Key impacts for smart water managementWaternomics
 
H2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavH2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavSri Ambati
 
H2O World - What's New in H2O with Cliff Click
H2O World - What's New in H2O with Cliff ClickH2O World - What's New in H2O with Cliff Click
H2O World - What's New in H2O with Cliff ClickSri Ambati
 
H2O World - Python Pipelines - Spencer Aiello
H2O World - Python Pipelines - Spencer AielloH2O World - Python Pipelines - Spencer Aiello
H2O World - Python Pipelines - Spencer AielloSri Ambati
 
H2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor JensenH2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor JensenSri Ambati
 
Basic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric EckstrandBasic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric EckstrandSri Ambati
 
Smart Citizens - Water sensor demo
Smart Citizens - Water sensor demoSmart Citizens - Water sensor demo
Smart Citizens - Water sensor demoPieter van Boheemen
 
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
H2O World - Munging, modeling, and pipelines using Python - Hank RoarkH2O World - Munging, modeling, and pipelines using Python - Hank Roark
H2O World - Munging, modeling, and pipelines using Python - Hank RoarkSri Ambati
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudSri Ambati
 
H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicSri Ambati
 
Data & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott MclellanData & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott MclellanSri Ambati
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleThe Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleSri Ambati
 
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...Sri Ambati
 

Viewers also liked (20)

Solution smart water
Solution smart waterSolution smart water
Solution smart water
 
Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015
Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015
Smart Water Networks need Smart Data Presentation - IT&Water Feb 10 2015
 
Waternomics: Key impacts for smart water management
Waternomics: Key impacts for smart water managementWaternomics: Key impacts for smart water management
Waternomics: Key impacts for smart water management
 
H2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavH2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh Yadav
 
H2O World - What's New in H2O with Cliff Click
H2O World - What's New in H2O with Cliff ClickH2O World - What's New in H2O with Cliff Click
H2O World - What's New in H2O with Cliff Click
 
H2O World - Python Pipelines - Spencer Aiello
H2O World - Python Pipelines - Spencer AielloH2O World - Python Pipelines - Spencer Aiello
H2O World - Python Pipelines - Spencer Aiello
 
H2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor JensenH2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor Jensen
 
Basic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric EckstrandBasic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric Eckstrand
 
Smart Citizens - Water sensor demo
Smart Citizens - Water sensor demoSmart Citizens - Water sensor demo
Smart Citizens - Water sensor demo
 
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
H2O World - Munging, modeling, and pipelines using Python - Hank RoarkH2O World - Munging, modeling, and pipelines using Python - Hank Roark
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom Kraljevic
 
Data & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott MclellanData & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott Mclellan
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleThe Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt Dowle
 
Smart Water Management
Smart Water Management Smart Water Management
Smart Water Management
 
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
 

Similar to Sparkling Water Meetup 4.15.15

2015 03 27_ml_conf
2015 03 27_ml_conf2015 03 27_ml_conf
2015 03 27_ml_confSri Ambati
 
Building Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterBuilding Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterSri Ambati
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Updatevithakur
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEPaco Nathan
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesEelco Visser
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingPaco Nathan
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
MLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedMLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedChao Chen
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangaloreappaji intelhunt
 
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkReynold Xin
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaSri Ambati
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 

Similar to Sparkling Water Meetup 4.15.15 (20)

2015 03 27_ml_conf
2015 03 27_ml_conf2015 03 27_ml_conf
2015 03 27_ml_conf
 
Building Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterBuilding Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling Water
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICME
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific Languages
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
MLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedMLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reduced
 
MYSQL -1.pptx
MYSQL -1.pptxMYSQL -1.pptx
MYSQL -1.pptx
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache Spark
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Praise and worship slides will lyrics and pictures
Praise and worship slides will lyrics and picturesPraise and worship slides will lyrics and pictures
Praise and worship slides will lyrics and picturesmrbeandone
 
Deerfoot Church of Christ Bulletin 2 25 24
Deerfoot Church of Christ Bulletin 2 25 24Deerfoot Church of Christ Bulletin 2 25 24
Deerfoot Church of Christ Bulletin 2 25 24deerfootcoc
 
Secrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. Helwa
Secrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. HelwaSecrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. Helwa
Secrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. HelwaNodd Nittong
 
Deerfoot Church of Christ Bulletin 3 31 24
Deerfoot Church of Christ Bulletin 3 31 24Deerfoot Church of Christ Bulletin 3 31 24
Deerfoot Church of Christ Bulletin 3 31 24deerfootcoc
 
empathy map for students very useful.pptx
empathy map for students very useful.pptxempathy map for students very useful.pptx
empathy map for students very useful.pptxGeorgePhilips7
 
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxThe King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxOH TEIK BIN
 
Ayodhya Temple saw its first Big Navratri Festival!
Ayodhya Temple saw its first Big Navratri Festival!Ayodhya Temple saw its first Big Navratri Festival!
Ayodhya Temple saw its first Big Navratri Festival!All in One Trendz
 
Deerfoot Church of Christ Bulletin 4 14 24
Deerfoot Church of Christ Bulletin 4 14 24Deerfoot Church of Christ Bulletin 4 14 24
Deerfoot Church of Christ Bulletin 4 14 24deerfootcoc
 
A Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptx
A Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptxA Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptx
A Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptxOH TEIK BIN
 
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Darul Amal Chishtia
 
Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...
Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...
Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...INDIAN YOUTH SECURED ORGANISATION
 
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...franktsao4
 
Codex Singularity: Search for the Prisca Sapientia
Codex Singularity: Search for the Prisca SapientiaCodex Singularity: Search for the Prisca Sapientia
Codex Singularity: Search for the Prisca Sapientiajfrenchau
 
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdfThe-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdfSana Khan
 
Meaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptx
Meaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptxMeaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptx
Meaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptxStephen Palm
 
Prach Autism AI - Artificial Intelligence
Prach Autism AI - Artificial IntelligencePrach Autism AI - Artificial Intelligence
Prach Autism AI - Artificial Intelligenceprachaibot
 
"There are probably more Nobel Laureates who are people of faith than is gen...
 "There are probably more Nobel Laureates who are people of faith than is gen... "There are probably more Nobel Laureates who are people of faith than is gen...
"There are probably more Nobel Laureates who are people of faith than is gen...Steven Camilleri
 

Recently uploaded (20)

Praise and worship slides will lyrics and pictures
Praise and worship slides will lyrics and picturesPraise and worship slides will lyrics and pictures
Praise and worship slides will lyrics and pictures
 
English - The Dangers of Wine Alcohol.pptx
English - The Dangers of Wine Alcohol.pptxEnglish - The Dangers of Wine Alcohol.pptx
English - The Dangers of Wine Alcohol.pptx
 
Deerfoot Church of Christ Bulletin 2 25 24
Deerfoot Church of Christ Bulletin 2 25 24Deerfoot Church of Christ Bulletin 2 25 24
Deerfoot Church of Christ Bulletin 2 25 24
 
Secrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. Helwa
Secrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. HelwaSecrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. Helwa
Secrets of Divine Love - A Spiritual Journey into the Heart of Islam - A. Helwa
 
Deerfoot Church of Christ Bulletin 3 31 24
Deerfoot Church of Christ Bulletin 3 31 24Deerfoot Church of Christ Bulletin 3 31 24
Deerfoot Church of Christ Bulletin 3 31 24
 
empathy map for students very useful.pptx
empathy map for students very useful.pptxempathy map for students very useful.pptx
empathy map for students very useful.pptx
 
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxThe King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
 
Ayodhya Temple saw its first Big Navratri Festival!
Ayodhya Temple saw its first Big Navratri Festival!Ayodhya Temple saw its first Big Navratri Festival!
Ayodhya Temple saw its first Big Navratri Festival!
 
Top 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdfTop 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdf
 
Deerfoot Church of Christ Bulletin 4 14 24
Deerfoot Church of Christ Bulletin 4 14 24Deerfoot Church of Christ Bulletin 4 14 24
Deerfoot Church of Christ Bulletin 4 14 24
 
A Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptx
A Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptxA Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptx
A Tsunami Tragedy ~ Wise Reflections for Troubled Times (Eng. & Chi.).pptx
 
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
 
Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...
Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...
Gangaur Celebrations 2024 - Rajasthani Sewa Samaj Karimnagar, Telangana State...
 
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...
 
Codex Singularity: Search for the Prisca Sapientia
Codex Singularity: Search for the Prisca SapientiaCodex Singularity: Search for the Prisca Sapientia
Codex Singularity: Search for the Prisca Sapientia
 
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdfThe-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
 
Meaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptx
Meaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptxMeaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptx
Meaningful Pursuits: Pursuing Obedience_Ecclesiastes.pptx
 
The spiritual moderator of vincentian groups
The spiritual moderator of vincentian groupsThe spiritual moderator of vincentian groups
The spiritual moderator of vincentian groups
 
Prach Autism AI - Artificial Intelligence
Prach Autism AI - Artificial IntelligencePrach Autism AI - Artificial Intelligence
Prach Autism AI - Artificial Intelligence
 
"There are probably more Nobel Laureates who are people of faith than is gen...
 "There are probably more Nobel Laureates who are people of faith than is gen... "There are probably more Nobel Laureates who are people of faith than is gen...
"There are probably more Nobel Laureates who are people of faith than is gen...
 

Sparkling Water Meetup 4.15.15

  • 1. Building Machine Learning Applications with Sparkling Water 04/15/2015 Meetup Michal Malohlava and Alex Tellez and H2O.ai
  • 3. Scalable 
 Machine Learning For Smarter Applications
  • 5. Scalable Applications Distributed Able to process huge amount of data from different sources Easy to develop and experiment Powerful machine learning engine inside
  • 9. Open-source distributed execution platform User-friendly API for data transformation based on RDD Platform components - SQL, MLLib, text mining Multitenancy Large and active community
  • 10. Open-source scalable machine learning platform Tuned for efficient computation and memory use Production ready machine learning algorithms R, Python, Java, Scala APIs Interactive UI, robust data parser
  • 11. Ensembles Deep Neural Networks • Generalized Linear Models : Binomial, Gaussian, Gamma, Poisson and Tweedie • Cox Proportional Hazards Models • Naïve Bayes • Distributed Random Forest : Classification or regression models • Gradient Boosting Machine : Produces an ensemble of decision trees with increasing refined approximations • Deep learning : Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Statistical Analysis Dimensionality Reduction Anomaly Detection • K-means : Partitions observations into k clusters/groups of the same spatial size • Principal Component Analysis : Linearly transforms correlated variables to independent components • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning Clustering Supervised Learning Unsupervised Learning
  • 12.
  • 13. Sparkling Water Provides Transparent integration of H2O with Spark ecosystem Transparent use of H2O data structures and algorithms with Spark API Excels in existing Spark workflows requiring advanced Machine Learning algorithms Platform for building Smarter Applications
  • 14. Sparkling Water Design spark-submit Spark Master JVM Spark Worker JVM Spark Worker JVM Spark Worker JVM Sparkling Water Cluster Spark Executor JVM H2O Spark Executor JVM H2O Spark Executor JVM H2O Sparkling App implements ? Regular Spark application containing also Sparkling Water classes
  • 15. Data Distribution H2O H2O H2O Sparkling Water Cluster Spark Executor JVM Data Source (e.g. HDFS) H2O RDD Spark Executor JVM Spark Executor JVM Spark RDD RDDs and DataFrames share same memory space toRDD toH2OFrame
  • 18. Data example case class SMS(target: String, fv: Vector) ham / spam feature vector
  • 19. ML Workflow 1. Extract data 2. Transform, tokenize messages 3. Build Tf-IDF 4. Create and evaluate 
 Deep Learning model 5. Score new messages Goal: For a given text message identify if it is spam or not
  • 21. Lego #1: Data load // Data load
 def load(dataFile: String): RDD[Array[String]] = {
 sc.textFile(dataFile).map(l => l.split(“t")) .filter(r => !r(0).isEmpty)
 } Produces [response,message]
  • 22. Lego #2: Ad-hoc Tokenization def tokenize(data: RDD[String]): RDD[Seq[String]] = {
 val ignoredWords = Seq("the", “a", …)
 val ignoredChars = Seq(',', ‘:’, …)
 
 val texts = data.map( r => {
 var smsText = r.toLowerCase
 for( c <- ignoredChars) {
 smsText = smsText.replace(c, ' ')
 }
 
 val words =smsText.split(" ").filter(w => !ignoredWords.contains(w) && w.length>2).distinct
 words.toSeq
 })
 texts
 } Message Bag of words
  • 23. Lego #3: Tf-IDF def buildIDFModel(tokens: RDD[Seq[String]],
 minDocFreq:Int = 4,
 hashSpaceSize:Int = 1 << 10): (HashingTF, IDFModel, RDD[Vector]) = {
 // Hash strings into the given space
 val hashingTF = new HashingTF(hashSpaceSize)
 val tf = hashingTF.transform(tokens)
 // Build term frequency-inverse document frequency
 val idfModel = new IDF(minDocFreq=minDocFreq).fit(tf)
 val expandedText = idfModel.transform(tf)
 (hashingTF, idfModel, expandedText)
 } Hash words
 into large 
 space Term freq scale “Thank for the order…” […,0,3.5,0,1,0,0.3,0,1.3,0,0,…] Thank Order Bag of words hashing functions term freq
 model Feature vectors representing text
  • 24. Lego #4: Build a model def buildDLModel(train: Frame, valid: Frame,
 epochs: Int = 10, l1: Double = 0.001, l2: Double = 0.0,
 hidden: Array[Int] = Array[Int](200, 200))
 (implicit h2oContext: H2OContext): DeepLearningModel = {
 import h2oContext._
 // Build a model
 val dlParams = new DeepLearningParameters()
 dlParams._destination_key = Key.make("dlModel.hex").asInstanceOf[Key[Frame]]
 dlParams._train = train
 dlParams._valid = valid
 dlParams._response_column = 'target
 dlParams._epochs = epochs
 dlParams._l1 = l1
 dlParams._hidden = hidden
 
 // Create a job
 val dl = new DeepLearning(dlParams)
 val dlModel = dl.trainModel.get
 
 // Compute metrics on both datasets
 dlModel.score(train).delete()
 dlModel.score(valid).delete()
 
 dlModel
 } Deep Learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple l a y e r s o f n o n l i n e a r transformations H2O model
 builder API Do final scoring
 on train/test data
  • 25. Assembly application // Data load
 val data = load(DATAFILE)
 // Extract response spam or ham
 val hamSpam = data.map( r => r(0))
 val message = data.map( r => r(1))
 // Tokenize message content
 val tokens = tokenize(message)
 
 // Build IDF model
 var (hashingTF, idfModel, tfidf) = buildIDFModel(tokens)
 
 // Merge response with extracted vectors
 val resultRDD: SchemaRDD = hamSpam.zip(tfidf).map(v => SMS(v._1, v._2))
 val table:DataFrame = resultRDD
 
 // Split table
 val keys = Array[String]("train.hex", "valid.hex")
 val ratios = Array[Double](0.8)
 val frs = split(table, keys, ratios)
 val (train, valid) = (frs(0), frs(1))
 table.delete()
 
 // Build a model
 val dlModel = buildDLModel(train, valid) Split dataset Build model Data munging
  • 27. Model evaluation val trainMetrics = binomialMM(dlModel, train)
 val validMetrics = binomialMM(dlModel, valid) Collect Model 
 Metrics
  • 28. Spam predictor def isSpam(msg: String,
 dlModel: DeepLearningModel,
 hashingTF: HashingTF,
 idfModel: IDFModel,
 hamThreshold: Double = 0.5):Boolean = {
 val msgRdd = sc.parallelize(Seq(msg))
 val msgVector: SchemaRDD = idfModel.transform(
 hashingTF.transform (
 tokenize (msgRdd))) .map(v => SMS("?", v))
 val msgTable: DataFrame = msgVector
 msgTable.remove(0) // remove first column
 val prediction = dlModel.score(msgTable)
 prediction.vecs()(1).at(0) < hamThreshold
 } Prepared models Default decision threshold Scoring
  • 29. Predict spam isSpam("Michal, beer tonight in MV?") isSpam("We tried to contact you re your reply to our offer of a Video Handset? 750 anytime any networks mins? UNLIMITED TEXT?")
  • 30. Interactions with application from R You need to install H2O’s R-package from: http://h2o-release.s3.amazonaws.com/h2o-dev/master/ 1109/index.html#R
  • 31. Use app from R Collect Model 
 Metrics
  • 32. Where is the code? https://github.com/h2oai/sparkling-water/ blob/master/examples/scripts/
  • 34. Checkout H2O.ai Training Books http://learn.h2o.ai/
 Checkout H2O.ai Blog http://h2o.ai/blog/
 Checkout H2O.ai Youtube Channel https://www.youtube.com/user/0xdata
 Checkout GitHub https://github.com/h2oai/sparkling-water Meetups https://meetup.com/ More info
  • 35. Learn more at h2o.ai Follow us at @h2oai Thank you! Sparkling Water is open-source
 ML application platform combining
 power of Spark and H2O