SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Gentle Introduction:
Bayesian Modelling and Probabilistic Programming in R
Geneva R Users Group
Speaker: Marco Wirthlin
@marcowirthlin
Image Source: https://i.stack.imgur.com/GONoV.jpg
Uploaded Version
This talk was made for Geneva R Users:
Image Source: https://i.stack.imgur.com/GONoV.jpg
After a long Search!
You got a Job!
Your Boss: “Can you give us a hand?”
“Look at this complex machine.
Sometimes it malfunctions and
produces items that will have
faults difficult to spot.
Can you predict when and
why this happens?”
The Data: 10 TB of Joy
How would you solve this? (Discriminative Edition)
Raw
Data
Tidy
Data
ML Ready
Data
Trained
Classifier
● Cleaning
● Munching
● Exploratory
Analysis
● KNN
● PCA/ICA
● Random Forest
● Feature
Engineering
● Regularization
● Model Tuning
● Training
Prediction /
Classification
● Validation
Raw
Data
Tidy
Data
● Cleaning
● Munching
● Exploratory
Analysis
● KNN
● PCA/ICA
● Random Forest
● Feature
Engineering
● Regularization
How would you solve this? (Generative Edition)
Candidate
Model(s)
Domain
Knowledge
● (Re)parametrization
● Refinement
● Prior/Posterior
Simulations
● Model Selection
● Scientific Comm.
Phenomenon
Simulations
+ Gain
Understanding
● Apply
Knowledge
● Know Uncertainty
Fix Problem (?)
What is a generative model?
●
Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems, pages 841–848.
●
Rasmus Bååth, Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?: https://www.youtube.com/watch?time_continue=366&v=3OJEae7Qb_o
Hypothesis of underlying
mechanisms
AKA: “Learning the class”
No shorcuts!
=D
[2, 8, ..., 9]
θ, μ, ξ, ...
Parameters
Model
Bayesian Inference
Recap: When to use which approach
● http://www.fharrell.com/post/stat-ml/
●
http://www.fharrell.com/post/stat-ml2/
Statistical Models
Little/Expensive/Inaccessible
Is relevant
Isolate effects of few
Are transparent
Many, Explicit
Understanding predictors
Data
Uncertainty
Num. of Param.
Interpretability
Assumptions
Goal
Machine Learning
Abundant
Not relevant
Many
Black Box
Some, Implicit
Overall Prediction
*
* Very general guidelines!
● E.g. Bayesian models scale well with many parameters and also with data due to inter and intra chain GPU parallelization.
● Example hybrid methods: Deep (Hierarchical) Bayesian Neural Networks, Bayesian Optimization. Gaussian Mixture Models
Bayesian Inference (BI)
BI
Likelihoods
Frequentist
Statistics
Graphical
Models
Probabilistic
Programming
Background Implementation
Likelihoods
Normal Distribution
=L p(D | θ)
~x N(μ, σ2
)
“The probability that D belongs to
a distribution with mean μ and SD
σ”
=L p(D | μ, σ2
)
“X ”is normally distributed
PDF: Fix parameters, vary data
L: Fix data, vary parameters
●
https://www.youtube.com/watch?v=ScduwntrMzc
Applet: https://seneketh.shinyapps.io/Likelihood_Intuition
Interlude: Frequentist Inference
Y = [7, ..., 2]
X = [2, ..., 9]
Y = a * X + b
Y ~ N(a * X + b, σ2
) =L p(D | a, b, σ2
)
argmax(Σln(p(D | a, b, σ2
))
a b σ2
MLE
“True” Population
“True” unique values
Interlude: Frequentist Inference
“True” Population
=D [7, 3,
2]
Sample: N=3
Sampling Distribution
e.g. F distribution
Test
statistic
Inter-group var./
Intra-group var.
∞
H0
mean
Central Limit
Theorem
“Long range” probability
● Sampling distribution applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Interlude: Frequentist Inference
● Sampling distribution applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html
“When a frequentist says that the probability for
"heads" in a coin toss is 0.5 (50%) she means that in
infinitively many such coin tosses, 50% of the coins
will show "head"”.
Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Discrete Values:
Just sum it up!
:)
Cont. Values:
Integration over
complete parameter
space...
:(
Bayesian Inference:
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Averaging over the complete
parameter space via integration is
impractical!
Solution: We sample from the
conjugate probability distribution with
smart MCMC algorithms!
(Subject of another talk)
Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Lets compute this and sample from it!
Y = a * X + b
Y ~ N(a * X + b, σ2
)
Quantify all model parts with uncertainty
p(D, a, b, σ2
) = p(D | a, b, σ2
)*p(a)*p(b)*p(σ2
)
a ~ N(1, 0.1) b ~ N(4, 0.5) σ2
~ G(1, 0.1)
p(a) p(b) p(σ2
)
p(D | a, b, σ2
)
p(D | θ)
From model to code
Y = a * X + b
a ~ N(1, 0.1)
b ~ N(4, 0.5)
σ2
~ G(1, 0.1)
Y ~ N(a * X + b, σ2
)
●
More examples: https://mc-stan.org/users/documentation/case-studies
Implementation in R
Example: Deep Bayesian Neural Nets
● https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
●
https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/
Example: Bayesian Inference and Volatility Modeling Using Stan
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Credit: Michael Weylandt, Luis Damiano
Example: Bayesian Inference and Volatility Modeling Using Stan
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Credit: Michael Weylandt, Luis Damiano
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Credit: Michael Weylandt, Luis Damiano
Thank you for your attention… and endurance!
Additional Slides
Sources, links and more!
All sources in one place!
About Generative vs. Discriminative models:
Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison
of logistic regression and naive bayes. In Advances in neural information processing systems,
pages 841–848.
Rasmus Bååth:
Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?:
https://www.youtube.com/watch?time_continue=366&v=3OJEae7Qb_o
When to use ML vs. Statistical Modelling:
Frank Harrell's Blog:
http://www.fharrell.com/post/stat-ml/
http://www.fharrell.com/post/stat-ml2/
Frequentist approach: How do sampling distributions work (applet):
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Bayesian inference and computation:
John Kruschke:
Doing Bayesian Data Analysis:
A Tutorial with R, JAGS, and Stan Chapter 5
Rasmus Bååth:
http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
Richard McElreath:
Statistical Rethinking book and lectures
(https://www.youtube.com/watch?v=4WVelCswXo4)
Many model examples in Stan:
https://mc-stan.org/users/documentation/case-studies
About Bayesian Neural Networks:
https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/
Volatility Examples:
Hidden Markov Models:
https://github.com/luisdamiano/rfinance17
Volatility Garch Model and Bayesian Workflow:
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Dictionary: Stats ↔ ML
https://ubc-mds.github.io/resources_pages/terminology/
The Bayesian Workflow:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
Algorithm explanation applet for MCMC exploration of the parameter space:
http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/
Probabilistic Programming Conference Talks:
https://www.youtube.com/watch?v=crvNIGyqGSU
Who to follow on Twitter?
● Chris Fonnesbeck @fonnesbeck (pyMC3)
● Thomas Wiecki @twiecki (pyMC3)
Blog: https://twiecki.io/ (nice intros)
● Bayes Dose @BayesDose (general info and papers)
● Richard McElreath @rlmcelreath (ecology, Bayesian statistics expert)
All his lectures: https://www.youtube.com/channel/UCNJK6_DZvcMqNSzQdEkzvzA
● Michael Betancourt @betanalpha (Stan)
Blog: https://betanalpha.github.io/writing/
Specifically: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
● Rasmus Bååth @rabaath
Great video series: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-one/
● Frank Harrell @f2harrell (statistics sage)
Great Blog: http://www.fharrell.com/
● Andrew Gelman @StatModeling (statistics sage)
https://statmodeling.stat.columbia.edu/
● Judea Pearl @yudapearl
Book of Why: http://bayes.cs.ucla.edu/WHY/ (more about causality, BN and DAG)
● AND MANY MORE!
Dictionary: Stats ↔ ML
Check: https://ubc-mds.github.io/resources_pages/terminology/ for more terminology
Statistics
Estimation/Fitting
Hypothesis
Data Point
Regression
Classification
Covariates
Parameters
Response
Factor
Likelihood
Machine learning / AI
~ Learning
~ Classification rule
~ Example/ Instance
~ Supervised Learning
~ Supervised Learning
~ Features
~ Features
~ Label
~ Factor (categorical variables)
~ Cost Function (sometimes)
Data Science + AI + ML + Stats
Credit: Zoubin Ghahramani, CTO UBER. Talk: "Probabilistic Machine Learning: From theory to industrial impact"

Weitere ähnliche Inhalte

Was ist angesagt?

Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareGlobalLogic Ukraine
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsArtifacia
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
 
Gan 발표자료
Gan 발표자료Gan 발표자료
Gan 발표자료종현 최
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
CapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule NetworkCapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule NetworkKarel Ha
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 
2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and modelsGe Org
 

Was ist angesagt? (15)

Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in Healthcare
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Gan 발표자료
Gan 발표자료Gan 발표자료
Gan 발표자료
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
CapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule NetworkCapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule Network
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 
2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models
 

Ähnlich wie Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R

(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen XuYueshen Xu
 
Striving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingMarco Wirthlin
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...Aalto University
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017SERC at Carleton College
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Grigory Sapunov
 
Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Aravindharamanan S
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JFlorent Biville
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howedomoritz
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
DSDT meetup July 2021
DSDT meetup July 2021DSDT meetup July 2021
DSDT meetup July 2021DSDT_MTL
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuDatabricks
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 

Ähnlich wie Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R (20)

(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu
 
Striving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
 
PointNet
PointNetPointNet
PointNet
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018
 
Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4J
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howe
 
Tackling Deep Software Variability Together
Tackling Deep Software Variability TogetherTackling Deep Software Variability Together
Tackling Deep Software Variability Together
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
DSDT meetup July 2021
DSDT meetup July 2021DSDT meetup July 2021
DSDT meetup July 2021
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 

Kürzlich hochgeladen

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 

Kürzlich hochgeladen (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 

Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R

  • 1. Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R Geneva R Users Group Speaker: Marco Wirthlin @marcowirthlin Image Source: https://i.stack.imgur.com/GONoV.jpg Uploaded Version
  • 2. This talk was made for Geneva R Users: Image Source: https://i.stack.imgur.com/GONoV.jpg
  • 3. After a long Search! You got a Job!
  • 4. Your Boss: “Can you give us a hand?” “Look at this complex machine. Sometimes it malfunctions and produces items that will have faults difficult to spot. Can you predict when and why this happens?”
  • 5. The Data: 10 TB of Joy
  • 6. How would you solve this? (Discriminative Edition) Raw Data Tidy Data ML Ready Data Trained Classifier ● Cleaning ● Munching ● Exploratory Analysis ● KNN ● PCA/ICA ● Random Forest ● Feature Engineering ● Regularization ● Model Tuning ● Training Prediction / Classification ● Validation
  • 7. Raw Data Tidy Data ● Cleaning ● Munching ● Exploratory Analysis ● KNN ● PCA/ICA ● Random Forest ● Feature Engineering ● Regularization How would you solve this? (Generative Edition) Candidate Model(s) Domain Knowledge ● (Re)parametrization ● Refinement ● Prior/Posterior Simulations ● Model Selection ● Scientific Comm. Phenomenon Simulations + Gain Understanding ● Apply Knowledge ● Know Uncertainty Fix Problem (?)
  • 8. What is a generative model? ● Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems, pages 841–848. ● Rasmus Bååth, Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?: https://www.youtube.com/watch?time_continue=366&v=3OJEae7Qb_o Hypothesis of underlying mechanisms AKA: “Learning the class” No shorcuts! =D [2, 8, ..., 9] θ, μ, ξ, ... Parameters Model Bayesian Inference
  • 9. Recap: When to use which approach ● http://www.fharrell.com/post/stat-ml/ ● http://www.fharrell.com/post/stat-ml2/ Statistical Models Little/Expensive/Inaccessible Is relevant Isolate effects of few Are transparent Many, Explicit Understanding predictors Data Uncertainty Num. of Param. Interpretability Assumptions Goal Machine Learning Abundant Not relevant Many Black Box Some, Implicit Overall Prediction * * Very general guidelines! ● E.g. Bayesian models scale well with many parameters and also with data due to inter and intra chain GPU parallelization. ● Example hybrid methods: Deep (Hierarchical) Bayesian Neural Networks, Bayesian Optimization. Gaussian Mixture Models
  • 11. Likelihoods Normal Distribution =L p(D | θ) ~x N(μ, σ2 ) “The probability that D belongs to a distribution with mean μ and SD σ” =L p(D | μ, σ2 ) “X ”is normally distributed PDF: Fix parameters, vary data L: Fix data, vary parameters ● https://www.youtube.com/watch?v=ScduwntrMzc Applet: https://seneketh.shinyapps.io/Likelihood_Intuition
  • 12. Interlude: Frequentist Inference Y = [7, ..., 2] X = [2, ..., 9] Y = a * X + b Y ~ N(a * X + b, σ2 ) =L p(D | a, b, σ2 ) argmax(Σln(p(D | a, b, σ2 )) a b σ2 MLE “True” Population “True” unique values
  • 13. Interlude: Frequentist Inference “True” Population =D [7, 3, 2] Sample: N=3 Sampling Distribution e.g. F distribution Test statistic Inter-group var./ Intra-group var. ∞ H0 mean Central Limit Theorem “Long range” probability ● Sampling distribution applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html
  • 14. Interlude: Frequentist Inference ● Sampling distribution applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html “When a frequentist says that the probability for "heads" in a coin toss is 0.5 (50%) she means that in infinitively many such coin tosses, 50% of the coins will show "head"”.
  • 15. Bayesian Inference ● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5 ● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/ ● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
  • 16. Bayesian Inference ● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5 ● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/ ● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
  • 17. Bayesian Inference ● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5 ● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/ ● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4) Discrete Values: Just sum it up! :) Cont. Values: Integration over complete parameter space... :(
  • 18. Bayesian Inference: ● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5 ● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/ ● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4) Averaging over the complete parameter space via integration is impractical! Solution: We sample from the conjugate probability distribution with smart MCMC algorithms! (Subject of another talk)
  • 19. Bayesian Inference ● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5 ● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/ ● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4) Lets compute this and sample from it!
  • 20. Y = a * X + b Y ~ N(a * X + b, σ2 ) Quantify all model parts with uncertainty p(D, a, b, σ2 ) = p(D | a, b, σ2 )*p(a)*p(b)*p(σ2 ) a ~ N(1, 0.1) b ~ N(4, 0.5) σ2 ~ G(1, 0.1) p(a) p(b) p(σ2 ) p(D | a, b, σ2 ) p(D | θ)
  • 21. From model to code Y = a * X + b a ~ N(1, 0.1) b ~ N(4, 0.5) σ2 ~ G(1, 0.1) Y ~ N(a * X + b, σ2 ) ● More examples: https://mc-stan.org/users/documentation/case-studies
  • 23. Example: Deep Bayesian Neural Nets ● https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/ ● https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/
  • 24. Example: Bayesian Inference and Volatility Modeling Using Stan https://luisdamiano.github.io/personal/volatility_stan2018.pdf Credit: Michael Weylandt, Luis Damiano
  • 25. Example: Bayesian Inference and Volatility Modeling Using Stan https://luisdamiano.github.io/personal/volatility_stan2018.pdf Credit: Michael Weylandt, Luis Damiano
  • 27. Thank you for your attention… and endurance!
  • 29. All sources in one place! About Generative vs. Discriminative models: Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems, pages 841–848. Rasmus Bååth: Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?: https://www.youtube.com/watch?time_continue=366&v=3OJEae7Qb_o When to use ML vs. Statistical Modelling: Frank Harrell's Blog: http://www.fharrell.com/post/stat-ml/ http://www.fharrell.com/post/stat-ml2/ Frequentist approach: How do sampling distributions work (applet): http://onlinestatbook.com/stat_sim/sampling_dist/index.html Bayesian inference and computation: John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5 Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/ Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4) Many model examples in Stan: https://mc-stan.org/users/documentation/case-studies About Bayesian Neural Networks: https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/ https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/ Volatility Examples: Hidden Markov Models: https://github.com/luisdamiano/rfinance17 Volatility Garch Model and Bayesian Workflow: https://luisdamiano.github.io/personal/volatility_stan2018.pdf Dictionary: Stats ↔ ML https://ubc-mds.github.io/resources_pages/terminology/ The Bayesian Workflow: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html Algorithm explanation applet for MCMC exploration of the parameter space: http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/ Probabilistic Programming Conference Talks: https://www.youtube.com/watch?v=crvNIGyqGSU
  • 30. Who to follow on Twitter? ● Chris Fonnesbeck @fonnesbeck (pyMC3) ● Thomas Wiecki @twiecki (pyMC3) Blog: https://twiecki.io/ (nice intros) ● Bayes Dose @BayesDose (general info and papers) ● Richard McElreath @rlmcelreath (ecology, Bayesian statistics expert) All his lectures: https://www.youtube.com/channel/UCNJK6_DZvcMqNSzQdEkzvzA ● Michael Betancourt @betanalpha (Stan) Blog: https://betanalpha.github.io/writing/ Specifically: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html ● Rasmus Bååth @rabaath Great video series: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-one/ ● Frank Harrell @f2harrell (statistics sage) Great Blog: http://www.fharrell.com/ ● Andrew Gelman @StatModeling (statistics sage) https://statmodeling.stat.columbia.edu/ ● Judea Pearl @yudapearl Book of Why: http://bayes.cs.ucla.edu/WHY/ (more about causality, BN and DAG) ● AND MANY MORE!
  • 31. Dictionary: Stats ↔ ML Check: https://ubc-mds.github.io/resources_pages/terminology/ for more terminology Statistics Estimation/Fitting Hypothesis Data Point Regression Classification Covariates Parameters Response Factor Likelihood Machine learning / AI ~ Learning ~ Classification rule ~ Example/ Instance ~ Supervised Learning ~ Supervised Learning ~ Features ~ Features ~ Label ~ Factor (categorical variables) ~ Cost Function (sometimes)
  • 32. Data Science + AI + ML + Stats Credit: Zoubin Ghahramani, CTO UBER. Talk: "Probabilistic Machine Learning: From theory to industrial impact"