Байесовская статистика в последнее время стала обсуждаться в контексте глубокого обучения. К сожалению, это скрывает главное ее преимущество по сравнению со стандартными подходами машинного обучения. В отличие от black-box моделей, байесовский подход к моделированию white-box. White-box – это и хорошо, и плохо.
От аналитика требуется полное понимание природы задачи, только тогда байесовский подход используется на полную мощность. Он позволяет учесть не только то, что «говорят нам данные», но и то, что «говорит нам здравый смысл». В докладе пойдет речь о том, зачем и когда все это нужно и как проводить и интерпретировать такой анализ в питоне.
1. PyMC3 – Bayesian Statistical Modelling in Python
Max Kochurov
22 June, 2019
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 1 / 29
2. About me
Max Kochurov tg,slack:@ferres / github:ferrine
Geoopt
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 2 / 29
3. Bayesian Statistics
Figure: Updating prior p(λ)
p(λ | D) =
p(D | λ)p(λ)
p(D)
• p(λ) – Prior, base knowledge
• p(D | λ) – Likelihood, new information
• p(λ | D) – Posterior, updated knowledge
• p(D) – Evidence, surprise in data
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 3 / 29
4. Bayesian Statistics
p(λ | D) =
p(D | λ)p(λ)
p(D)
• p(λ) – Prior, base knowledge
• p(D | λ) – Likelihood, new information
• p(λ | D) – Posterior, updated knowledge
• p(D) – Evidence, surprise in data
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 3 / 29
5. Bayesian Statistics
Compared to Frequentist
• Elegant way to put assumptions in the
model
• Can work with few Data!
• No need for p-values,
all you need is p(λ | D)
• . . .
p(λ | D) =
p(D | λ)p(λ)
p(D)
• p(λ) – Prior, base knowledge
• p(D | λ) – Likelihood, new information
• p(λ | D) – Posterior, updated knowledge
• p(D) – Evidence, surprise in data
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 3 / 29
6. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
7. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
8. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
9. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models
• Astronomy: estimating orbits of space objects, getting an image of a black hole
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
10. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models
• Astronomy: estimating orbits of space objects, getting an image of a black hole
• Life science: epidemic analysis
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
11. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models
• Astronomy: estimating orbits of space objects, getting an image of a black hole
• Life science: epidemic analysis
• Medicine: calculating effect sizes for drugs
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
12. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models
• Astronomy: estimating orbits of space objects, getting an image of a black hole
• Life science: epidemic analysis
• Medicine: calculating effect sizes for drugs
• . . .
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
13. Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms
• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, defining user
session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models
• Astronomy: estimating orbits of space objects, getting an image of a black hole
• Life science: epidemic analysis
• Medicine: calculating effect sizes for drugs
• . . .
Some will be covered later
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 4 / 29
15. Fair Coin Flips – expect fair
p ∼ Beta(3, 3)
flips ∼ Binomial(N, p)
Data: 12 out of 20 flips
What is p?
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 6 / 29
16. Fair Coin Flips – expect fair
p ∼ Beta(3, 3)
flips ∼ Binomial(N, p)
Data: 12 out of 20 flips
What is p?
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 6 / 29
17. Fair Coin Flips – expect fair
p ∼ Beta(3, 3)
flips ∼ Binomial(N, p)
Data: 12 out of 20 flips
What is p?
We expect an angel and see an angel, we are confident
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 6 / 29
18. Fair Coin Flips – expect unfair
What is p?
p ∼ Beta(1, 5)
flips ∼ Binomial(N, p)
Data: 12 out of 20 flips
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 7 / 29
19. Fair Coin Flips – expect unfair
What is p?
p ∼ Beta(1, 5)
flips ∼ Binomial(N, p)
Data: 12 out of 20 flips
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 7 / 29
20. Fair Coin Flips – expect unfair
What is p?
p ∼ Beta(1, 5)
flips ∼ Binomial(N, p)
Data: 12 out of 20 flips
We expect a devil but see an angel, we are less confident
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 7 / 29
21. Hierarchical Models
What if we have more diverse data? We can estimate an amount of devils + uncertainty!
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 8 / 29
22. Hierarchical Models
What if we have more diverse data? We can estimate an amount of devils + uncertainty!
Data: [8, 2, 2, 10, 0,
0, 2, 5, 6, 10] out of 20 flips
λDevil ∼ Exponential(1)
λAngel ∼ Exponential(1)
pi ∼ Beta(λAngel , λAngel + λDevil )
flipsi ∼ Binomial(pi , Ni )
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 8 / 29
23. Hierarchical Models
What if we have more diverse data? We can estimate an amount of devils + uncertainty!
Data: [8, 2, 2, 10, 0,
0, 2, 5, 6, 10] out of 20 flips
λDevil ∼ Exponential(1)
λAngel ∼ Exponential(1)
pi ∼ Beta(λAngel , λAngel + λDevil )
flipsi ∼ Binomial(pi , Ni )
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 8 / 29
24. Markov Chain Monte Carlo (MCMC) & Variational Inference
Few simple rules
• Try MCMC first
• If it is slow: wait a bit
• Got tired: try VI
• Didn’t work: feel sad
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 9 / 29
25. Markov Chain Monte Carlo (MCMC) & Variational Inference
Few simple rules
• Try MCMC first
• If it is slow: wait a bit
• Got tired: try VI
• Didn’t work: feel sad
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 9 / 29
26. Markov Chain Monte Carlo (MCMC) & Variational Inference
Few simple rules
• Try MCMC first
• If it is slow: wait a bit
• Got tired: try VI
• Didn’t work: feel sad
Did not work for you?
goto https://discourse.pymc.io
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 9 / 29
27. Inspecting Your Model
What is an amount of devils? (MCMC used) pm.traceplot(trace)
0 1 2 3 4 5 6 7 8
0.0
0.2
0.4
Frequency
devils
0 100 200 300 400 500
0.0
2.5
5.0
7.5
Samplevalue
devils
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
Frequency
angels
0 100 200 300 400 500
1
2
Samplevalue
angels
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0
10
20
Frequency
p
0 100 200 300 400 500
0.00
0.25
0.50
0.75
Samplevalue
p
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 10 / 29
31. A/B testing
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 14 / 29
32. A/B testing, why Bayes
We would accept that
result
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
p-val=0.2398diff=0.1
diff=0.1
pval=.24
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 15 / 29
33. A/B testing, why Bayes
We would accept that
result
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
p-val=0.2398diff=0.1
diff=0.1
pval=.24
We wait too long to
check small diff
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
diff=0.01
pval=.24
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 15 / 29
34. A/B testing, why Bayes
We would accept that
result
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
p-val=0.2398diff=0.1
diff=0.1
pval=.24
We wait too long to
check small diff
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
diff=0.01
pval=.24
WTF?
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.0786diff=0.02
diff=0.02
pval=0.07
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 15 / 29
35. Expected Loss Framework
p-values are useless. What about calculating a loss of ignoring a better model?
• We now decide to use model A or B
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 16 / 29
36. Expected Loss Framework
p-values are useless. What about calculating a loss of ignoring a better model?
• We now decide to use model A or B
• A has some α ctr, B has some β ctr
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 16 / 29
37. Expected Loss Framework
p-values are useless. What about calculating a loss of ignoring a better model?
• We now decide to use model A or B
• A has some α ctr, B has some β ctr
• Say a loss (can be any) to ignore a better model given we use x
Loss(α, β, x) =
max(β − α, 0), x = A
max(α − β, 0), x = B
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 16 / 29
38. Expected Loss Framework
p-values are useless. What about calculating a loss of ignoring a better model?
• We now decide to use model A or B
• A has some α ctr, B has some β ctr
• Say a loss (can be any) to ignore a better model given we use x
Loss(α, β, x) =
max(β − α, 0), x = A
max(α − β, 0), x = B
• Loss depends on the magnitude of difference (p-values do not)
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 16 / 29
39. Expected Loss Framework
L(x) = Ep(α,β)Loss(α, β, x) → min
x
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 17 / 29
40. Expected Loss Framework
L(x) = Ep(α,β)Loss(α, β, x) → min
x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
p-val=0.2398diff=0.1
L(A) = 0.122
L(B) = 0.019
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 17 / 29
43. Expected Loss Framework
L(x) = Ep(α,β)Loss(α, β, x) → min
x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
p-val=0.2398diff=0.1
L(A) = 0.122
L(B) = 0.019
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
L(A) = 0.012
L(B) = 0.002
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.0786diff=0.02
L(A) = 0.02
L(B) = 0.0005
Without p-values we stop first experiment earlier as B is a clear winner
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 17 / 29
44. Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 18 / 29
45. Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?
You have your p(ctr) in mind, use it!
Informative prior
Non-Informative
prior
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 18 / 29
46. Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?
You have your p(ctr) in mind, use it!
• You iterate quicker
• Accept small improvements
• Fewer problems of continuous monitoring
Informative prior
Non-Informative
prior
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 18 / 29
47. Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?
You have your p(ctr) in mind, use it!
• You iterate quicker
• Accept small improvements
• Fewer problems of continuous monitoring
Do not use overconfident prior, choose slightly pessimistic one
Informative prior
Non-Informative
prior
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 18 / 29
48. Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?
You have your p(ctr) in mind, use it!
• You iterate quicker
• Accept small improvements
• Fewer problems of continuous monitoring
Do not use overconfident prior, choose slightly pessimistic one
Choose smarter loss (”accept improvement at least δ”):
Loss(α, β, δ, x) =
max((β − δ) − α, 0), x = A
max(α − (β − δ), 0), x = B
Informative prior
Non-Informative
prior
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 18 / 29
49. Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?
You have your p(ctr) in mind, use it!
• You iterate quicker
• Accept small improvements
• Fewer problems of continuous monitoring
Do not use overconfident prior, choose slightly pessimistic one
Choose smarter loss (”accept improvement at least δ”):
Loss(α, β, δ, x) =
max((β − δ) − α, 0), x = A
max(α − (β − δ), 0), x = B
Convert loss into $$$
Informative prior
Non-Informative
prior
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 18 / 29
51. Finance (Quantopian) case
Hedge-Fund in Boston
• Crowd sourcing for trade strategies, more that 700k
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 20 / 29
52. Finance (Quantopian) case
Hedge-Fund in Boston
• Crowd sourcing for trade strategies, more that 700k
• People tend to overfit the leaderboard, that’s ok
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 20 / 29
53. Finance (Quantopian) case
Hedge-Fund in Boston
• Crowd sourcing for trade strategies, more that 700k
• People tend to overfit the leaderboard, that’s ok
• But they do not have out of sample data
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 20 / 29
54. Finance (Quantopian) case
Hedge-Fund in Boston
• Crowd sourcing for trade strategies, more that 700k
• People tend to overfit the leaderboard, that’s ok
• But they do not have out of sample data
• Data changes over time, we should take care of it
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 20 / 29
55. Finance (Quantopian) case
Hedge-Fund in Boston
• Crowd sourcing for trade strategies, more that 700k
• People tend to overfit the leaderboard, that’s ok
• But they do not have out of sample data
• Data changes over time, we should take care of it
How to link 2 periods of algorithm evaluation to use more data?
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 20 / 29
56. Finance (Quantopian) case
Core ideas
• Use Gaussian process to capture changing
volatility and return in time
Figure: Gaussian Process
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 21 / 29
57. Finance (Quantopian) case
Core ideas
• Use Gaussian process to capture changing
volatility and return in time
• Allow structural changes in the model
• mean returns
• but not volatility
Figure: Gaussian Process
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 21 / 29
58. Finance (Quantopian) case
Core ideas
• Use Gaussian process to capture changing
volatility and return in time
• Allow structural changes in the model
• mean returns
• but not volatility
• Use posterior model returns to optimize
expected risk-return objective
Figure: Risk vs Returns
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 21 / 29
59. Finance (Quantopian) case
Figure: Sharpe Ratio
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 22 / 29
64. Costs and Profits
Profit ∼ Stock | Demand Profit ∼ Demand | Stock
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 27 / 29
65. Costs and Profits
Profit ∼ Stock | Demand Profit ∼ Demand | Stock
How to combine them?
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 27 / 29
66. Expected Loss Framework
Profits: Frequentist vs Bayesian
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 28 / 29
67. Take outs
Bayesian Framework allows:
• effectively use your prior knowledge
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 29 / 29
68. Take outs
Bayesian Framework allows:
• effectively use your prior knowledge
• take uncertainty in account
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 29 / 29
69. Take outs
Bayesian Framework allows:
• effectively use your prior knowledge
• take uncertainty in account
• understand your data / problem better
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 29 / 29
70. Take outs
Bayesian Framework allows:
• effectively use your prior knowledge
• take uncertainty in account
• understand your data / problem better
BUT, you should really understand your problem
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 29 / 29
71. Take outs
Bayesian Framework allows:
• effectively use your prior knowledge
• take uncertainty in account
• understand your data / problem better
BUT, you should really understand your problem
Tutorials/MISC:
pymc3 docs: https://docs.pymc.io/nb_tutorials/index.html
case studies: https://twiecki.io
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 29 / 29