SlideShare a Scribd company logo
1 of 41
Download to read offline
APPLICATIONS OF MACHINE
LEARNING
AlexTellez + Amy Wang + H2OTeam
USC, 4/8/2015
AGENDA
1. Introduction to Big Data / ML
2. What is H2O.ai?
3. Use Cases:
4. Data Science Competition
a) Beat Bill Belichick
b) Fight Crime in Chicago
c) Whiskey Recommendation Engine
d) Bordeaux Wine Vintage
1. INTROTO BIG DATA / ML
BIG DATA IS LIKE TEENAGE SEX:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is
doing it, so everyone claims
they are doing it…
Dan Ariely, Prof. @ Duke
BIGVS. SMALL DATA
When you try to open
file in excel, excel
CRASHES
SMALL = Data fits in RAM
BIG = Data does NOT fit in RAM
Basically…
Big Data is data too big
to process using conventional
methods
(e.g. excel, access)
V +V +V
Today, we have access to more data than we know what to do with!
1) Wearables (fitbit, iWatch, etc)
2) Click streams from web visitors
3. Sensor readings
4. Social Media Outlets (e.g. twitter, facebook, etc)
Volume - Data volumes are becoming unmanageable
Variety - More data types being captured
Velocity - Data arrives rapidly and must
be processed / stored
THE HOPE OF BIG DATA
1. Data contains information of great business / personal value
Examples:
a) Predicting future stock movements = $$$
b) Netflix movie recommendations = Better experience = $$$
2. IF you can extract those insights from the data, you can make better
decisions
Enter, Machine Learning (ML)…
So how the hell do you do it?
MACHINE LEARNING
The Wikipedia Definition:
…a scientific discipline that explores the construction and study
of algorithms that can learn from data. Such algorithms operate
by building a model…. ZZZzzzzzZZZzzzzzz
My Definition:
The development, analysis, and application of algorithms that enable
machines to: make predictions and / or better understand data
2 Types of Learning:
SUPERVISED + UNSUPERVISED
SUPERVISED LEARNING
What is it?
Examples of supervised learning tasks:
1. ClassificationTasks - Benign / Malignant tumor
2. RegressionTasks - Predicting future stock market prices
3. Image Recognition - Highlighting faces in pictures
Methods that infer a function from labeled training data. Key task:
Predicting ________ . (Insert your task here)
UNSUPERVISED LEARNING
What is it?
Examples of unsupervised learning tasks:
1. Clustering - Discovering customer segments
2.Topic Extraction - What topics are people tweeting about?
3. Information Retrieval - IBM Watson: Question + Answer
Methods to understand the general structure of input data where
no predictions is needed.
4.Anomaly Detection - Detecting irregular heart-beats
NO CURATION NEEDED!
2.WHAT IS H2O?
What is H2O? (water, duh!)
It is ALSO an open-source, parallel processing engine for machine
learning.
What makes H2O different?
Cutting-edge algorithms + parallel architecture + ease-of-use
=
Happy Data Scientists / Analysts
TEAM @ H2O.AI
16,000 commits
H2O World Conference 2014
COMMUNITY REACH
120 meetups in 2014
11,000 installations
2,000 corporations
First Friday Hack-A-Thons
TRY IT!
Don’t take my word for it…www.h2o.ai
Simple Instructions
1. CD to Download Location
2. unzip h2o file
3. java -jar h2o.jar
4. Point browser to: localhost:54321
GUI
R
3. USE CASES (LOTS OF EM)
BEAT BILL BELICHICK
TB + BB
Bill Belichick Tom Brady
+ =
15 years together
3 Super Bowls
PASS OR RUN?
On any given offensive play…
Coach Bill can either call a PASS or a RUN
What determines this?
Game situation
Opposing team
Time remaining, etc, etc
Yards to go (until 1st down)
Basically, LOTS of stuff.
Personnel
BUT WHAT IF??
Question:
Can we try to predict whether the next play will be PASS or RUN
using historical data?
Approach:
Download every offensive play from Belichick-Brady era since 2000
Use various Machine Learning approaches to model PASS / RUN
Disclaimer: I’m not a Seahawks fan!
Extract known features to build model inputs
DATA COLLECTION
Data:
13 years of data (2002 -2013 season)
194 games total
14,547 total offensive plays (excludes punts, kickoffs, returns)
Response Variable: PASS / RUN
Model Inputs:
Quarter, Minutes, Seconds, OpposingTeam, Down, Distance,
Line of Scrimmage, NE-Score, OpposingTeam Score, Season,
Formation, Game Status (is NE losing / winning / tied)
FIGHTING CRIME IN CHICAGO
Spark + H2O
OPEN CRIME DATA
Crime Dataset: Crimes from 2001 - Present Day
~ 4.6 million crimes
THE WINDY CITY
Harvest Chicago Weather data since 2001
SOCIOECONOMIC FACTORS
Crimes segmented into Community Area IDs
Percent of households below poverty, unemployed, etc.
SPARK + H2O
Weather CrimesCensusWeatherWeather
Data munging
Spark SQL join
Deep
Learning
Evaluate models
GOAL:
For a given crime,
predict if an
arrest is
more / less
likely to be made!
JOIN DATASETS
crime
data
weather
data
census
data
Using Spark, we join 3 datasets together
to make one mega dataset!
DATAVISUALIZATION
arrest rate season of
crime
temperature
during crime
community
crime is
committed in
SPLIT DATA INTOTEST/TRAIN SETS
training set arrest rate test set arrest rate
train model on this segment, 80% of data
validate the model on this segment (remaining 20%)
~40% of crimes lead to arrest
DEEP LEARNING
Problem:
For a given crime, is an arrest more / less likely?
Deep Learning:
A multi-layer feed-forward
neural network that starts
w/ an input layer
(crime + weather data)
followed by
multiple layers of
non-linear transformations
HOW’D WE DO?
nice!
~ 10 mins
SINGLE-MALT SCOTCH
Single-Malt Scotch
A whiskey made at one particular distillery from a mash that only uses
malted grain (barley)
Solid Standards:
Must be aged at least 3 years in oak casks
Many famous distilleries produced in northern regions of Scotland
OF COURSE,THERE’S A
DATASET FORTHAT!
THE Single Malt Dataset
85 distilleries from Northern Scotland
12 descriptor features:
E.g. Sweetness, Smoky,Tobacco, Honey, Spicy, Malty, etc
Each descriptor rated 0 (weak) to 4 (strong)
Problem:
Can we build a whiskey recommendation engine based on whiskeys I
have tried (and liked!) already?
DIMENSIONALITY
REDUCTION + K-MEANS
First, let’s reduce the 12 features to a lower dimensional space using a
linear transformation (Principal Components Analysis)
7 principal components explain ~ 85% of the variance in dataset
Then let’s use a clustering algorithm to determine unique whiskeys
using the new PCA’d dataset
11 clusters are appropriate
Pipe out the cluster assignments and start buying whiskey!
MODEL RESULTS
I ENJOY:
OTHER WHISKEYS THAT CLUSTER WITH THESE:
OTHER POPULAR BRANDS
APPARENTLY, LOTS OF PEOPLE LIKE:
OTHER WHISKYES THAT CLUSTER WITH THESE:
AUTOENCODER + H2O
Input Output
Hidden
Features
Information Flow
x1
x2
x3
x4
x1
x2
x3
x4
Dogs, Dogs and Dogs
ANOMALY DETECTION OFVINTAGE
YEAR BORDEAUX WINE
BORDEAUX WINE
Largest wine-growing region in France
+ 700 Million bottles of wine produced / year !
Some years better than others: Great ($$$) vs.Typical ($)
Last Great years: 2010, 2009, 2005, 2000
GREATVS.TYPICALVINTAGE?
Question:
Can we study weather patterns in Bordeaux
leading up to harvest to identify ‘anomalous’ weather years >>
correlates to Great ($$$) vs.Typical ($)Vintage?
The Bordeaux Dataset (1952 - 2014 Yearly)
Amount of Winter Rain (Oct > Apr of harvest year)
Average Summer Temp (Apr > Sept of harvest year)
Rain during Harvest (Aug > Sept)
Years since last Great Vintage
AUTOENCODER + ANOMALY
DETECTION
ML Workflow:
1)Train autoencoder to learn ‘typical’ vintage weather pattern
2) Append ‘great’ vintage year weather data to original dataset
3) IF great vintage year weather data does NOT match learned
weather pattern, autoencoder will produce high reconstruction
error (MSE)
‘en primeur of en primeur’ - Can we use weather patterns to identify
anomalous years >> indicates great vintage quality?
Goal:
RESULTS (MSE > 0.10)
Mean	
  Square	
  Error
1961V 2009V
2005V
2000V
1990V
1989V
1982V
2010V
2014 BORDEAUX??
Mean	
  Square	
  Error
2014	
  ?2013
4. DATA SCIENCE
COMPETITION
Apply / Learn More @: apps.h2o.ai
Checkout ourYouTube Channel for last year’s talks @ H2O World

More Related Content

What's hot

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersSudha Jamthe
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEuropean Data Forum
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 

What's hot (20)

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business Leaders
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 

Viewers also liked

Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Nicolas Nicolov
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and ApplicationsGeeta Arora
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 
USC Presentation March 11 2011
USC Presentation March 11 2011USC Presentation March 11 2011
USC Presentation March 11 2011Jeanne von Zastrow
 
Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...
Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...
Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...essp2
 
Resume Edited (2)
Resume Edited (2)Resume Edited (2)
Resume Edited (2)Xi Hou
 
Copy of POPCOMM PPT.pptx
Copy of POPCOMM PPT.pptxCopy of POPCOMM PPT.pptx
Copy of POPCOMM PPT.pptxZhenzhou Ni
 
USC Sustainability Report 2013 (1)
USC Sustainability Report 2013 (1)USC Sustainability Report 2013 (1)
USC Sustainability Report 2013 (1)Austin Reagan
 
USC Companies- Human Capitol Risk Management Services
USC Companies- Human Capitol Risk Management ServicesUSC Companies- Human Capitol Risk Management Services
USC Companies- Human Capitol Risk Management ServicesDebra Danforth
 
Changing Reality (With Constructive Conversations) - John Marshall Roberts
Changing Reality (With Constructive Conversations) - John Marshall RobertsChanging Reality (With Constructive Conversations) - John Marshall Roberts
Changing Reality (With Constructive Conversations) - John Marshall RobertsSustainable Brands
 
USC Lecture 2.24.16
USC Lecture 2.24.16USC Lecture 2.24.16
USC Lecture 2.24.16Lloyd Hussey
 
Social sustainability and future communities
Social sustainability and future communitiesSocial sustainability and future communities
Social sustainability and future communitiessocial_life_presentations
 
Big Data Expo 2015 - Big 4 Data Bonaparte
Big Data Expo 2015 - Big 4 Data BonaparteBig Data Expo 2015 - Big 4 Data Bonaparte
Big Data Expo 2015 - Big 4 Data BonaparteBigDataExpo
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshareLonghow Lam
 
6 h blockeel - machine learning en geo-toepassingen
6   h blockeel - machine learning en geo-toepassingen6   h blockeel - machine learning en geo-toepassingen
6 h blockeel - machine learning en geo-toepassingenresearch4geomatica
 

Viewers also liked (20)

Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
USC Presentation March 11 2011
USC Presentation March 11 2011USC Presentation March 11 2011
USC Presentation March 11 2011
 
What's Your Why?
What's Your Why?What's Your Why?
What's Your Why?
 
Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...
Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...
Who benefits from rapidly increasing Voluntary Sustainability Standards (VSS)...
 
Resume Edited (2)
Resume Edited (2)Resume Edited (2)
Resume Edited (2)
 
Copy of POPCOMM PPT.pptx
Copy of POPCOMM PPT.pptxCopy of POPCOMM PPT.pptx
Copy of POPCOMM PPT.pptx
 
USC Sustainability Report 2013 (1)
USC Sustainability Report 2013 (1)USC Sustainability Report 2013 (1)
USC Sustainability Report 2013 (1)
 
USC Companies- Human Capitol Risk Management Services
USC Companies- Human Capitol Risk Management ServicesUSC Companies- Human Capitol Risk Management Services
USC Companies- Human Capitol Risk Management Services
 
Changing Reality (With Constructive Conversations) - John Marshall Roberts
Changing Reality (With Constructive Conversations) - John Marshall RobertsChanging Reality (With Constructive Conversations) - John Marshall Roberts
Changing Reality (With Constructive Conversations) - John Marshall Roberts
 
USC Lecture 2.24.16
USC Lecture 2.24.16USC Lecture 2.24.16
USC Lecture 2.24.16
 
NLP with H2O
NLP with H2ONLP with H2O
NLP with H2O
 
Social sustainability and future communities
Social sustainability and future communitiesSocial sustainability and future communities
Social sustainability and future communities
 
Big Data Expo 2015 - Big 4 Data Bonaparte
Big Data Expo 2015 - Big 4 Data BonaparteBig Data Expo 2015 - Big 4 Data Bonaparte
Big Data Expo 2015 - Big 4 Data Bonaparte
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
 
6 h blockeel - machine learning en geo-toepassingen
6   h blockeel - machine learning en geo-toepassingen6   h blockeel - machine learning en geo-toepassingen
6 h blockeel - machine learning en geo-toepassingen
 
Hans f hans adviseert
Hans f   hans adviseertHans f   hans adviseert
Hans f hans adviseert
 

Similar to Applications of Machine Learning at USC

Applications of Machine Learning at UCSB
Applications of Machine Learning at UCSBApplications of Machine Learning at UCSB
Applications of Machine Learning at UCSBSri Ambati
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsSri Ambati
 
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...J T "Tom" Johnson
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Notes On Hadoop And Mark Logic
Notes On Hadoop And Mark LogicNotes On Hadoop And Mark Logic
Notes On Hadoop And Mark LogicTracy Jimenez
 
Getting Started with Big Data and Splunk
Getting Started with Big Data and SplunkGetting Started with Big Data and Splunk
Getting Started with Big Data and SplunkTom Chavez
 
DNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataDNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataRolf Koski
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...Garrett Teoh Hor Keong
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 

Similar to Applications of Machine Learning at USC (20)

Applications of Machine Learning at UCSB
Applications of Machine Learning at UCSBApplications of Machine Learning at UCSB
Applications of Machine Learning at UCSB
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
 
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Notes On Hadoop And Mark Logic
Notes On Hadoop And Mark LogicNotes On Hadoop And Mark Logic
Notes On Hadoop And Mark Logic
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Getting Started with Big Data and Splunk
Getting Started with Big Data and SplunkGetting Started with Big Data and Splunk
Getting Started with Big Data and Splunk
 
DNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataDNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdata
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big Data
Big DataBig Data
Big Data
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Big data
Big dataBig data
Big data
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 

More from Sri Ambati

Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 

More from Sri Ambati (20)

Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 

Recently uploaded

20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기Chiwon Song
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Projectwajrcs
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxPrakarsh -
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Program with GUTs
Program with GUTsProgram with GUTs
Program with GUTs
 

Applications of Machine Learning at USC

  • 1. APPLICATIONS OF MACHINE LEARNING AlexTellez + Amy Wang + H2OTeam USC, 4/8/2015
  • 2. AGENDA 1. Introduction to Big Data / ML 2. What is H2O.ai? 3. Use Cases: 4. Data Science Competition a) Beat Bill Belichick b) Fight Crime in Chicago c) Whiskey Recommendation Engine d) Bordeaux Wine Vintage
  • 3. 1. INTROTO BIG DATA / ML BIG DATA IS LIKE TEENAGE SEX: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it… Dan Ariely, Prof. @ Duke
  • 4. BIGVS. SMALL DATA When you try to open file in excel, excel CRASHES SMALL = Data fits in RAM BIG = Data does NOT fit in RAM Basically… Big Data is data too big to process using conventional methods (e.g. excel, access)
  • 5. V +V +V Today, we have access to more data than we know what to do with! 1) Wearables (fitbit, iWatch, etc) 2) Click streams from web visitors 3. Sensor readings 4. Social Media Outlets (e.g. twitter, facebook, etc) Volume - Data volumes are becoming unmanageable Variety - More data types being captured Velocity - Data arrives rapidly and must be processed / stored
  • 6. THE HOPE OF BIG DATA 1. Data contains information of great business / personal value Examples: a) Predicting future stock movements = $$$ b) Netflix movie recommendations = Better experience = $$$ 2. IF you can extract those insights from the data, you can make better decisions Enter, Machine Learning (ML)… So how the hell do you do it?
  • 7. MACHINE LEARNING The Wikipedia Definition: …a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model…. ZZZzzzzzZZZzzzzzz My Definition: The development, analysis, and application of algorithms that enable machines to: make predictions and / or better understand data 2 Types of Learning: SUPERVISED + UNSUPERVISED
  • 8. SUPERVISED LEARNING What is it? Examples of supervised learning tasks: 1. ClassificationTasks - Benign / Malignant tumor 2. RegressionTasks - Predicting future stock market prices 3. Image Recognition - Highlighting faces in pictures Methods that infer a function from labeled training data. Key task: Predicting ________ . (Insert your task here)
  • 9. UNSUPERVISED LEARNING What is it? Examples of unsupervised learning tasks: 1. Clustering - Discovering customer segments 2.Topic Extraction - What topics are people tweeting about? 3. Information Retrieval - IBM Watson: Question + Answer Methods to understand the general structure of input data where no predictions is needed. 4.Anomaly Detection - Detecting irregular heart-beats NO CURATION NEEDED!
  • 10. 2.WHAT IS H2O? What is H2O? (water, duh!) It is ALSO an open-source, parallel processing engine for machine learning. What makes H2O different? Cutting-edge algorithms + parallel architecture + ease-of-use = Happy Data Scientists / Analysts
  • 11. TEAM @ H2O.AI 16,000 commits H2O World Conference 2014
  • 12. COMMUNITY REACH 120 meetups in 2014 11,000 installations 2,000 corporations First Friday Hack-A-Thons
  • 13. TRY IT! Don’t take my word for it…www.h2o.ai Simple Instructions 1. CD to Download Location 2. unzip h2o file 3. java -jar h2o.jar 4. Point browser to: localhost:54321 GUI R
  • 14. 3. USE CASES (LOTS OF EM) BEAT BILL BELICHICK
  • 15. TB + BB Bill Belichick Tom Brady + = 15 years together 3 Super Bowls
  • 16. PASS OR RUN? On any given offensive play… Coach Bill can either call a PASS or a RUN What determines this? Game situation Opposing team Time remaining, etc, etc Yards to go (until 1st down) Basically, LOTS of stuff. Personnel
  • 17. BUT WHAT IF?? Question: Can we try to predict whether the next play will be PASS or RUN using historical data? Approach: Download every offensive play from Belichick-Brady era since 2000 Use various Machine Learning approaches to model PASS / RUN Disclaimer: I’m not a Seahawks fan! Extract known features to build model inputs
  • 18. DATA COLLECTION Data: 13 years of data (2002 -2013 season) 194 games total 14,547 total offensive plays (excludes punts, kickoffs, returns) Response Variable: PASS / RUN Model Inputs: Quarter, Minutes, Seconds, OpposingTeam, Down, Distance, Line of Scrimmage, NE-Score, OpposingTeam Score, Season, Formation, Game Status (is NE losing / winning / tied)
  • 19. FIGHTING CRIME IN CHICAGO Spark + H2O
  • 20. OPEN CRIME DATA Crime Dataset: Crimes from 2001 - Present Day ~ 4.6 million crimes
  • 21. THE WINDY CITY Harvest Chicago Weather data since 2001
  • 22. SOCIOECONOMIC FACTORS Crimes segmented into Community Area IDs Percent of households below poverty, unemployed, etc.
  • 23. SPARK + H2O Weather CrimesCensusWeatherWeather Data munging Spark SQL join Deep Learning Evaluate models GOAL: For a given crime, predict if an arrest is more / less likely to be made!
  • 24. JOIN DATASETS crime data weather data census data Using Spark, we join 3 datasets together to make one mega dataset!
  • 25. DATAVISUALIZATION arrest rate season of crime temperature during crime community crime is committed in
  • 26. SPLIT DATA INTOTEST/TRAIN SETS training set arrest rate test set arrest rate train model on this segment, 80% of data validate the model on this segment (remaining 20%) ~40% of crimes lead to arrest
  • 27. DEEP LEARNING Problem: For a given crime, is an arrest more / less likely? Deep Learning: A multi-layer feed-forward neural network that starts w/ an input layer (crime + weather data) followed by multiple layers of non-linear transformations
  • 29. SINGLE-MALT SCOTCH Single-Malt Scotch A whiskey made at one particular distillery from a mash that only uses malted grain (barley) Solid Standards: Must be aged at least 3 years in oak casks Many famous distilleries produced in northern regions of Scotland
  • 30. OF COURSE,THERE’S A DATASET FORTHAT! THE Single Malt Dataset 85 distilleries from Northern Scotland 12 descriptor features: E.g. Sweetness, Smoky,Tobacco, Honey, Spicy, Malty, etc Each descriptor rated 0 (weak) to 4 (strong) Problem: Can we build a whiskey recommendation engine based on whiskeys I have tried (and liked!) already?
  • 31. DIMENSIONALITY REDUCTION + K-MEANS First, let’s reduce the 12 features to a lower dimensional space using a linear transformation (Principal Components Analysis) 7 principal components explain ~ 85% of the variance in dataset Then let’s use a clustering algorithm to determine unique whiskeys using the new PCA’d dataset 11 clusters are appropriate Pipe out the cluster assignments and start buying whiskey!
  • 32. MODEL RESULTS I ENJOY: OTHER WHISKEYS THAT CLUSTER WITH THESE:
  • 33. OTHER POPULAR BRANDS APPARENTLY, LOTS OF PEOPLE LIKE: OTHER WHISKYES THAT CLUSTER WITH THESE:
  • 34. AUTOENCODER + H2O Input Output Hidden Features Information Flow x1 x2 x3 x4 x1 x2 x3 x4 Dogs, Dogs and Dogs
  • 36. BORDEAUX WINE Largest wine-growing region in France + 700 Million bottles of wine produced / year ! Some years better than others: Great ($$$) vs.Typical ($) Last Great years: 2010, 2009, 2005, 2000
  • 37. GREATVS.TYPICALVINTAGE? Question: Can we study weather patterns in Bordeaux leading up to harvest to identify ‘anomalous’ weather years >> correlates to Great ($$$) vs.Typical ($)Vintage? The Bordeaux Dataset (1952 - 2014 Yearly) Amount of Winter Rain (Oct > Apr of harvest year) Average Summer Temp (Apr > Sept of harvest year) Rain during Harvest (Aug > Sept) Years since last Great Vintage
  • 38. AUTOENCODER + ANOMALY DETECTION ML Workflow: 1)Train autoencoder to learn ‘typical’ vintage weather pattern 2) Append ‘great’ vintage year weather data to original dataset 3) IF great vintage year weather data does NOT match learned weather pattern, autoencoder will produce high reconstruction error (MSE) ‘en primeur of en primeur’ - Can we use weather patterns to identify anomalous years >> indicates great vintage quality? Goal:
  • 39. RESULTS (MSE > 0.10) Mean  Square  Error 1961V 2009V 2005V 2000V 1990V 1989V 1982V 2010V
  • 40. 2014 BORDEAUX?? Mean  Square  Error 2014  ?2013
  • 41. 4. DATA SCIENCE COMPETITION Apply / Learn More @: apps.h2o.ai Checkout ourYouTube Channel for last year’s talks @ H2O World