PyMC3 — Bayesian Statistical Modelling in Python, Максим Кочуров. 22 июня, 2019

PyMC3 – Bayesian Statistical Modelling in Python
Max Kochurov
22 June, 2019
Max Kochurov PyMC3 – Bayesian Statistical Modelling in Python PyData – Moscow 1 / 29

About me
Max Kochurov tg,slack:@ferres / github:ferrine
Geoopt

Bayesian Statistics
Figure: Updating prior p(λ)
p(λ | D) =
p(D | λ)p(λ)
p(D)
• p(λ) – Prior, base knowledge
• p(D | λ) – Likelihood, new information
• p(λ | D) – Posterior, updated knowledge
• p(D) – Evidence, surprise in data

Bayesian Statistics
p(λ | D) =
p(D | λ)p(λ)
p(D)

Bayesian Statistics
Compared to Frequentist
• Elegant way to put assumptions in the
model
• Can work with few Data!
• No need for p-values,
all you need is p(λ | D)
• . . .
p(λ | D) =
p(D | λ)p(λ)
p(D)

Where do people use Bayesian inference?
• Finance (Quantopian): estimating the performance of trading algorithms

• Music Streaming (Sounds): A/B testing, churn prediction, lifetime value, deﬁning user
session

session
• E-Commerce (Salesforce): A/B tests, combining disparate sources of information,
hierarchical models

session
hierarchical models
• Astronomy: estimating orbits of space objects, getting an image of a black hole

session
hierarchical models
• Life science: epidemic analysis

session
hierarchical models
• Medicine: calculating eﬀect sizes for drugs

session
hierarchical models
• . . .

session
hierarchical models
• . . .
Some will be covered later

Coin Example

Fair Coin Flips – expect fair
p ∼ Beta(3, 3)
ﬂips ∼ Binomial(N, p)
Data: 12 out of 20 ﬂips
What is p?

Fair Coin Flips – expect fair
p ∼ Beta(3, 3)
What is p?
We expect an angel and see an angel, we are conﬁdent

Fair Coin Flips – expect unfair
What is p?
p ∼ Beta(1, 5)

Fair Coin Flips – expect unfair
What is p?
p ∼ Beta(1, 5)
We expect a devil but see an angel, we are less conﬁdent

Hierarchical Models
What if we have more diverse data? We can estimate an amount of devils + uncertainty!

Hierarchical Models
What if we have more diverse data? We can estimate an amount of devils + uncertainty!
Data: [8, 2, 2, 10, 0,
0, 2, 5, 6, 10] out of 20 ﬂips
λDevil ∼ Exponential(1)
λAngel ∼ Exponential(1)
pi ∼ Beta(λAngel , λAngel + λDevil )
ﬂipsi ∼ Binomial(pi , Ni )

Markov Chain Monte Carlo (MCMC) & Variational Inference
Few simple rules
• Try MCMC ﬁrst
• If it is slow: wait a bit
• Got tired: try VI
• Didn’t work: feel sad

Markov Chain Monte Carlo (MCMC) & Variational Inference
Few simple rules
• Try MCMC ﬁrst
• If it is slow: wait a bit
• Got tired: try VI
• Didn’t work: feel sad
Did not work for you?
goto https://discourse.pymc.io

Inspecting Your Model
What is an amount of devils? (MCMC used) pm.traceplot(trace)
0 1 2 3 4 5 6 7 8
0.0
0.2
0.4
Frequency
devils
0 100 200 300 400 500
0.0
2.5
5.0
7.5
Samplevalue
devils
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
Frequency
angels
0 100 200 300 400 500
1
2
Samplevalue
angels
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0
10
20
Frequency
p
0 100 200 300 400 500
0.00
0.25
0.50
0.75
Samplevalue
p

plt.hist(trace["angels"],
alpha=.8, label='Alpha')
plt.hist(trace["angels"]+trace["devils"],
alpha=.8, label='Beta')
plt.legend(fontsize=20)
0 1 2 3 4 5 6 7 8
0
50
100
150
200
250
300
Alpha
Beta

0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
7
Prob Distribution
for a, b in zip(
trace["angels"],
trace["angels"]+trace["devils"]
):
plt.plot(
np.linspace(0, 1),
st.beta(a, b).pdf(np.linspace(0, 1)),
color="b", alpha=.025
)
a_mean = trace["angels"].mean()
b_mean = (trace["angels"]+trace["devils"]).mean()
plt.plot(
np.linspace(0, 1),
st.beta(a_mean, b_mean).pdf(np.linspace(0, 1)),
color="black", linewidth=4.0
)
plt.axvline(0.5)
plt.title("Prob Distribution", fontsize=20)

Case Studies

A/B testing

A/B testing, why Bayes
We would accept that
result
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
p-val=0.2398diff=0.1
diﬀ=0.1
pval=.24

result
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
diff=0.1
pval=.24
We wait too long to
check small diff
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
diff=0.01
pval=.24

result
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
diff=0.1
pval=.24
We wait too long to
check small diff
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
diff=0.01
pval=.24
WTF?
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.0786diff=0.02
diff=0.02
pval=0.07

Expected Loss Framework
p-values are useless. What about calculating a loss of ignoring a better model?
• We now decide to use model A or B

• A has some α ctr, B has some β ctr

• Say a loss (can be any) to ignore a better model given we use x
Loss(α, β, x) =
max(β − α, 0), x = A
max(α − β, 0), x = B

• Say a loss (can be any) to ignore a better model given we use x
Loss(α, β, x) =
max(β − α, 0), x = A
max(α − β, 0), x = B
• Loss depends on the magnitude of diﬀerence (p-values do not)

L(x) = Ep(α,β)Loss(α, β, x) → min
x

x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
L(A) = 0.122
L(B) = 0.019

x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
L(A) = 0.122
L(B) = 0.019
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
L(A) = 0.012
L(B) = 0.002

x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
L(A) = 0.122
L(B) = 0.019
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
L(A) = 0.012
L(B) = 0.002
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.0786diff=0.02
L(A) = 0.02
L(B) = 0.0005

x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
L(A) = 0.122
L(B) = 0.019
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.2398diff=0.01
L(A) = 0.012
L(B) = 0.002
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
p-val=0.0786diff=0.02
L(A) = 0.02
L(B) = 0.0005
Without p-values we stop ﬁrst experiment earlier as B is a clear winner

Real(!) Impact of prior
Do you believe in ctr > 0.9 ? Or > 0.8? Or 0?

You have your p(ctr) in mind, use it!
Informative prior
Non-Informative
prior

• You iterate quicker
• Accept small improvements
• Fewer problems of continuous monitoring
Informative prior
Non-Informative
prior

Do not use overconﬁdent prior, choose slightly pessimistic one
Informative prior
Non-Informative
prior

Choose smarter loss (”accept improvement at least δ”):
Loss(α, β, δ, x) =
max((β − δ) − α, 0), x = A
max(α − (β − δ), 0), x = B
Informative prior
Non-Informative
prior

Choose smarter loss (”accept improvement at least δ”):
Loss(α, β, δ, x) =
max((β − δ) − α, 0), x = A
max(α − (β − δ), 0), x = B
Convert loss into $$$
Informative prior
Non-Informative
prior

Portfolio

Finance (Quantopian) case
Hedge-Fund in Boston
• Crowd sourcing for trade strategies, more that 700k

• People tend to overﬁt the leaderboard, that’s ok

• But they do not have out of sample data

• Data changes over time, we should take care of it

• Data changes over time, we should take care of it
How to link 2 periods of algorithm evaluation to use more data?

Core ideas
• Use Gaussian process to capture changing
volatility and return in time
Figure: Gaussian Process

Core ideas
• Allow structural changes in the model
• mean returns
• but not volatility
Figure: Gaussian Process

Core ideas
• Allow structural changes in the model
• mean returns
• but not volatility
• Use posterior model returns to optimize
expected risk-return objective
Figure: Risk vs Returns

Figure: Sharpe Ratio

Supply chains

year 3019, PyMC X
Supply PyMC X Demand Mars

Supply
Business Eﬀect / Observed

Demand
∼100 /month

Costs and Profits
Profit ∼ Stock | Demand Profit ∼ Demand | Stock

Costs and Profits
Profit ∼ Stock | Demand Profit ∼ Demand | Stock
How to combine them?

Proﬁts: Frequentist vs Bayesian

Take outs
Bayesian Framework allows:
• eﬀectively use your prior knowledge

Take outs
• take uncertainty in account

Take outs
• understand your data / problem better

Take outs
BUT, you should really understand your problem

Take outs
BUT, you should really understand your problem
Tutorials/MISC:
pymc3 docs: https://docs.pymc.io/nb_tutorials/index.html
case studies: https://twiecki.io

PyMC3 — Bayesian Statistical Modelling in Python, Максим Кочуров. 22 июня, 2019

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie PyMC3 — Bayesian Statistical Modelling in Python, Максим Кочуров. 22 июня, 2019

Ähnlich wie PyMC3 — Bayesian Statistical Modelling in Python, Максим Кочуров. 22 июня, 2019 (20)

Mehr von Mail.ru Group

Mehr von Mail.ru Group (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

PyMC3 — Bayesian Statistical Modelling in Python, Максим Кочуров. 22 июня, 2019