SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Applied Bayesian Inference with
PyMC
@MrSantoni
Which color will sell more?
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
#buy / N #buy / N
• What if N is small?
• What is N to have 90% confidence?
• What if N is different on A and B?
Bayesian Inference
Probability:
Claim: we think Bayesian
Frequentist
Bayesian
Frequence
Belief
test 1 test 2 test 3
Claim: we think Bayesian
no-bugs
confidence
Bayesian Inference =
update your beliefs
new evidence
prior belief
The Developer View
Statistical
Problem
def frequentist(): return 80%
def bayesian(): return
0% 100%
How to?
0% 100%
How to?
𝑃 𝐴 𝐵 =
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃(𝐵)
Closed-form solution:
Realistic Cases
Toy Examples
0% 100%
PyMC
PyMC
• Perform Bayesian Inference
• Markov Chain Monte Carlo techniques
• A.k.a. Probabilistic Programming
Show me the code!
Example A/B test
Only one difference between A and B
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Assume there is
p_a
probability of clicking BUY when landing on A
p_b
probability of clicking BUY when landing on B
How to compute p_a and p_b?
Page A
– N_a visitors
– C_a BUY-click on page A
Page B
– N_b visitors
– C_b BUY-click on page B
Frequentist:
C_a / N_a
BUT:
Observed frequency does not necessarily equal p_a
Bayesian:
Infer true frequency from observed data
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Bayesian Worflow
1. Define prior
2. Fit to observations
3. Get posteriors
from pymc import Uniform, rbernoulli, Bernoulli, MCMC
from matplotlib import pyplot as plt
p_A_true = 0.05
N = 1500
occurrences = rbernoulli(p_A_true, N)
print 'Click-BUY:'
print occurrences.sum()
print 'Observed frequency:'
print occurrences.sum() / float(N)
Click-BUY:
68
Observed frequency:
0.0453333333333
Clicking BUY
Bernoulli distribution
𝑃 𝑐𝑙𝑖𝑐𝑘 =
𝑝
1 − 𝑝
𝑐𝑙𝑖𝑐𝑘 = 1
𝑐𝑙𝑖𝑐𝑘 = 0
0
0.2
0.4
0.6
0.8
click=1 click=0
𝑝
p_A = Uniform('p_A', lower=0, upper=1)
0 1 P_a
print p_A.random()
print p_A.value
array(0.906086144982998)
array(0.906086144982998)
print p_A.random()
print p_A.value
array(0.285313846133313)
array(0.285313846133313)
p_A = Uniform('p_A', lower=0, upper=1)
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
p_A = Uniform('p_A', lower=0, upper=1)
[------- 20% ] 4053 of 20000 complete in 0.5 sec
[------------- 36% ] 7315 of 20000 complete in 1.0 sec
[-----------------53% ] 10627 of 20000 complete in 1.5 sec
[-----------------69%------ ] 13939 of 20000 complete in 2.0 sec
[-----------------81%----------- ] 16376 of 20000 complete in 2.5 sec
[-----------------96%---------------- ] 19342 of 20000 complete in 3.0 sec
[-----------------100%-----------------] 20000 of 20000 complete in 3.1 sec
[ 0.04656576 0.04656576 0.04656576 ..., 0.03803667 0.03803667
0.03803667]
mcmc = MCMC([p_A, obs])
mcmc.sample(20000, 1000)
print mcmc.trace('p_A')[:]
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
plt.figure(figsize=(8, 7))
plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)
plt.xlabel('Probability of clicking BUY')
plt.ylabel('Density')
plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')
plt.legend()
plt.savefig('p_A_hist_N_%s.png' % N)
plt.show()
Confidence 90% that P is between X and Y?
There is 90% probability that p_A is between
0.0373019596856 and 0.0548052806892
p_A_samples = mcmc.trace('p_A')[:]
lower_bound = np.percentile(p_A_samples, 5)
upper_bound = np.percentile(p_A_samples, 95)
print 'There is 90%% probability that p_A is between %s and %s' %
(lower_bound, upper_bound)
What if N_a is lower?
from pymc import Uniform, rbernoulli, Bernoulli, MCMC
from matplotlib import pyplot as plt
p_A_true = 0.05
N = 50
occurrences = rbernoulli(p_A_true, N)
print 'Click-BUY:'
print occurrences.sum()
print 'Observed frequency:'
print occurrences.sum() / float(N)
Click-BUY:
2
Observed frequency:
0.04
p_A = Uniform('p_A', lower=0, upper=1)
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
mcmc = MCMC([p_A, obs])
mcmc.sample(20000, 1000)
print mcmc.trace('p_A')[:]
[----- 14% ] 2874 of 20000 complete in 0.5 sec
[----------- 30% ] 6035 of 20000 complete in 1.0 sec
[-----------------47% ] 9440 of 20000 complete in 1.5 sec
[-----------------63%---- ] 12775 of 20000 complete in 2.0 sec
[-----------------81%---------- ] 16203 of 20000 complete in 2.5 sec
[-----------------100%-----------------] 20000 of 20000 complete in 3.0 sec
[ 0.06240723 0.06240723 0.06240723 ..., 0.01864419 0.01864419
0.01864419]
plt.figure(figsize=(8, 7))
plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)
plt.xlabel('Probability of clicking BUY')
plt.ylabel('Density')
plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')
plt.legend()
plt.savefig('p_A_hist_N_%s.png' % N)
plt.show()
Confidence 90% that P is between X and Y?
There is 90% probability that p_A is between
0.0160966147705 and 0.114655284797
p_A_samples = mcmc.trace('p_A')[:]
lower_bound = np.percentile(p_A_samples, 5)
upper_bound = np.percentile(p_A_samples, 95)
print 'There is 90%% probability that p_A is between %s and %s' %
(lower_bound, upper_bound)
N_a = 1500 N_a = 50
Does the red have a larger probability of being clicked?
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
from pymc import Uniform, rbernoulli, Bernoulli, MCMC, deterministic
from matplotlib import pyplot as plt
p_A_true = 0.05
p_B_true = 0.04
N_A = 1500
N_B = 750
occurrences_A = rbernoulli(p_A_true, N_A)
occurrences_B = rbernoulli(p_B_true, N_B)
print 'Observed frequency:'
print 'A'
print occurrences_A.sum() / float(N_A)
print 'B'
print occurrences_B.sum() / float(N_B)
Observed frequency:
A
0.0533333333333
B
0.0413333333333
p_A = Uniform('p_A', lower=0, upper=1)
p_B = Uniform('p_B', lower=0, upper=1)
@deterministic
def delta(p_A=p_A, p_B=p_B):
return p_A - p_B
obs_A = Bernoulli('obs_A', p_A, value=occurrences_A, observed=True)
obs_B = Bernoulli('obs_B', p_B, value=occurrences_B, observed=True)
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])
mcmc.sample(25000, 5000)
[----- 14% ] 3561 of 25000 complete in 0.5 sec
[--------- 25% ] 6332 of 25000 complete in 1.0 sec
[------------ 33% ] 8454 of 25000 complete in 1.5 sec
[--------------- 41% ] 10499 of 25000 complete in 2.0 sec
[-----------------50% ] 12602 of 25000 complete in 2.5 sec
[-----------------59%-- ] 14780 of 25000 complete in 3.0 sec
[-----------------67%----- ] 16883 of 25000 complete in 3.5 sec
[-----------------75%-------- ] 18954 of 25000 complete in 4.0 sec
[-----------------83%----------- ] 20877 of 25000 complete in 4.5 sec
[-----------------91%-------------- ] 22924 of 25000 complete in 5.0 sec
[-----------------100%-----------------] 25000 of 25000 complete in 5.5 sec
p_A_samples = mcmc.trace('p_A')[:]
p_B_samples = mcmc.trace('p_B')[:]
delta_samples = mcmc.trace('delta')[:]
plt.subplot(3,1,1)
plt.xlim(0, 0.1)
plt.hist(p_A_samples, bins=35, histtype='stepfilled', normed=True, color='blue', label='Posterior
of p_A')
plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A (unknown)')
plt.xlabel('Probability of clicking BUY via A')
plt.legend()
plt.subplot(3,1,2)
plt.xlim(0, 0.1)
plt.hist(p_B_samples, bins=35, histtype='stepfilled', normed=True, color='green',
label='Posterior of p_B')
plt.vlines(p_B_true, 0, 90, linestyle='--', label='True p_B (unknown)')
plt.xlabel('Probability of clicking BUY via B')
plt.legend()
plt.subplot(3,1,3)
plt.xlim(0, 0.1)
plt.hist(delta_samples, bins=35, histtype='stepfilled', normed=True, color='red', label='Posterior
of delta')
plt.vlines(p_A_true - p_B_true, 0, 90, linestyle='--', label='True delta (unknown)')
plt.xlabel('p_A - p_B')
plt.legend()
plt.savefig('A_and_B.png')
plt.show()
p_A > p_B
How much are we confident?
print 'Probability that p_A > p_B:'
print (delta_samples > 0).mean()
Probability that p_A > p_B:
0.8919
N_A = 1500
N_B = 750
N_A = 1500
N_B = 200
print 'Probability that p_A > p_B:'
print (delta_samples > 0).mean()
Probability that p_A > p_B:
0.73455
MCMC
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])
mcmc.sample(25000, 5000)
Posterior P(p_A, p_B, delta | obs_A, obs_B) as samples
25000 iterations
5000 burn-in
Metropolis-Hastings algorithm
Open the black box
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])
mcmc.sample(25000, 5000)
from pymc.Matplot import plot as mcplot
mcplot(mcmc)
PyMC
• Easy to interpret results
– confidence, no p-values!
• No crazy math
• Computationally expensive
Thank you
@MrSantoni
marcosantoni@hotmail.it
Back
Serie A 13/14
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR
24/08/2013 Sampdoria Juventus 0 1 A 0 0 D
24/08/2013 Verona Milan 2 1 H 1 1 D
25/08/2013 Cagliari Atalanta 2 1 H 1 1 D
25/08/2013 Inter Genoa 2 0 H 0 0 D
25/08/2013 Lazio Udinese 2 1 H 2 0 H
25/08/2013 Livorno Roma 0 2 A 0 0 D
25/08/2013 Napoli Bologna 3 0 H 2 0 H
25/08/2013 Parma Chievo 0 0 D 0 0 D
25/08/2013 Torino Sassuolo 2 0 H 1 0 H
26/08/2013 Fiorentina Catania 2 1 H 2 1 H
31/08/2013 Chievo Napoli 2 4 A 2 2 D
31/08/2013 Juventus Lazio 4 1 H 2 1 H
01/09/2013 Atalanta Torino 2 0 H 0 0 D
01/09/2013 Bologna Sampdoria 2 2 D 1 1 D
01/09/2013 Catania Inter 0 3 A 0 1 A
01/09/2013 Genoa Fiorentina 2 5 A 0 3 A
01/09/2013 Milan Cagliari 3 1 H 2 1 H
01/09/2013 Roma Verona 3 0 H 0 0 D
01/09/2013 Sassuolo Livorno 1 4 A 0 1 A
01/09/2013 Udinese Parma 3 1 H 1 0 H
14/09/2013 Inter Juventus 1 1 D 0 0 D
14/09/2013 Napoli Atalanta 2 0 H 0 0 D
14/09/2013 Torino Milan 2 2 D 0 0 D
15/09/2013 Fiorentina Cagliari 1 1 D 0 0 D
https://datahub.io/dataset/italian-football-data-serie-a-b
Win-rate
Did it change?
Bayesian Worflow
1. Define Prior
2. Fit to observations
3. Get Posteriors
Winning a Match
Bernoulli distribution
𝑃 𝑤 =
𝑝
1 − 𝑝
𝑤 = 1
𝑤 = 0
0
0.2
0.4
0.6
0.8
Win (w=1) Lose (w=0)
𝑝
𝑝: switchpoint?
Model the switchpoint
𝑝 =
𝑝1
𝑝2
𝑡 < 𝜏
𝑡 ≥ 𝜏
Goal -> infer 𝑝1, 𝑝2, 𝜏, 𝑝
Bayesian Worflow
1. Define Prior
2. Fit to observations
3. Get Posteriors
Let’s model this
• goal: infer unknown p1, p2, TAU
• FIRST STEP OF Bayesian Inference: assign a prior
probability to different possible values of p
• what would be a good prior for p1, p2? Use
uniform:
– p1 ~ Uniform(0,1)
– p2 ~ Uniform(0,1)
– TAU ~ DiscreteUniform(1, 38)
• P(TAU=k)=1/38 for all k
from pymc import Uniform, DiscreteUniform, deterministic, Bernoulli, Model, MCMC
p_1 = Uniform('p_1', lower=0, upper=1)
p_2 = Uniform('p_2', lower=0, upper=1)
tau = DiscreteUniform('tau', lower=1, upper=38)
print 'Random output: ', tau.random(), tau.random(), tau.random()
Random output: 14 24 33
@deterministic
def p_(tau=tau, p_1=p_1, p_2=p_2, num_matches=38):
# concatenate p_1 and p_2 based on tau
out = np.empty(num_matches)
out[:tau] = p_1
out[tau:] = p_2
return out
Load Data
import pandas as pd
df = pd.read_csv('serie_a.csv', parse_dates=['Date'], date_parser=parse_date)
matches = df[(df.HomeTeam == ‘Milan’) | (df.AwayTeam == ‘Milan’)]
matches = matches.set_index(['Date'])
matches = compute_extra_columns(matches, team)
# some pandas manipulations occur here
matches[‘Win’] = … # 1 if Milan won, 0 otherwise
Fit the Model
observed_matches = Bernoulli('obs', p=p_, value=matches[['Win']], observed=True)
model = Model([observed_matches, p_1, p_2, tau])
mcmc = MCMC(model)
mcmc.sample(40000, 10000)
p_1_samples = mcmc.trace('p_1')[:]
p_2_samples = mcmc.trace('p_2')[:]
tau_samples = mcmc.trace('tau')[:]
print p_1_samples[:10]
print p_2_samples[:10]
print tau_samples[:10]
[ 0.42067236 0.42067236 0.42067236 0.43900391 0.43900391 0.43900391
0.43900391 0.43900391 0.43900391 0.43900391]
[ 0.49213381 0.49213381 0.49213381 0.56072562 0.79863176 0.79863176
0.67416932 0.68382528 0.6069458 0.60062698]
[10 10 24 35 35 35 35 27 27 27]
plt.figure(figsize=(14.5, 10))
ax = plt.subplot(311)
ax.set_autoscaley_on(False)
plt.hist(p_1_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_1', color='#A60628', normed=True, bins=30)
plt.legend(loc='upper left')
ax = plt.subplot(312)
plt.hist(p_2_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_2', color='#7A68A6', normed=True, bins=30)
plt.legend(loc='upper left')
ax = plt.subplot(313)
plt.hist(tau_samples, histtype='stepfilled', alpha=0.85, label='posterior of tau', color='#467821', normed=True, bins=30)
plt.legend(loc='upper left')
plt.show()
Expected Win Probability
num_matches = 38
N = tau_samples.shape[0]
expected_p_per_match = np.zeros(num_matches)
for match in range(num_matches):
ix = match < tau_samples
p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]])
expected_p_per_match[match] = np.percentile(p_samples_match, 50)
Compute Confidence Bounds
lower_p_per_match = np.zeros(num_matches)
upper_p_per_match = np.zeros(num_matches)
for match in range(num_matches):
ix = match < tau_samples
p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]])
lower_p_per_match[match] = np.percentile(p_samples_match, 5)
upper_p_per_match[match] = np.percentile(p_samples_match, 95)
Bayesian returns a distribution. What have we gained? We see uncertainty in our
estimates. The wider the distribution, the less certain our posterior belief should be.

Weitere ähnliche Inhalte

Andere mochten auch

Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...Ed Batista
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in PythonPeadar Coyle
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Salesforce Engineering
 
Introduction to CLIPS Expert System
Introduction to CLIPS Expert SystemIntroduction to CLIPS Expert System
Introduction to CLIPS Expert SystemMotaz Saad
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programmingEli Gottlieb
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesGilad Barkan
 

Andere mochten auch (8)

Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
 
Ai 7
Ai 7Ai 7
Ai 7
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in Python
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
Introduction to CLIPS Expert System
Introduction to CLIPS Expert SystemIntroduction to CLIPS Expert System
Introduction to CLIPS Expert System
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programming
 
Mycin
MycinMycin
Mycin
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 

Ähnlich wie Bayesian A/B Testing with PyMC

Ähnlich wie Bayesian A/B Testing with PyMC (10)

PERANCANGAN TEKNIK INDUSTRI.pptx
PERANCANGAN TEKNIK INDUSTRI.pptxPERANCANGAN TEKNIK INDUSTRI.pptx
PERANCANGAN TEKNIK INDUSTRI.pptx
 
Creating Profitable Advertising
Creating Profitable AdvertisingCreating Profitable Advertising
Creating Profitable Advertising
 
Naomi Stevenson Design Portfolio
Naomi Stevenson Design PortfolioNaomi Stevenson Design Portfolio
Naomi Stevenson Design Portfolio
 
Alphard_Purple.pptx
Alphard_Purple.pptxAlphard_Purple.pptx
Alphard_Purple.pptx
 
Alphard green
Alphard greenAlphard green
Alphard green
 
SlideEasy 5.pptx
SlideEasy 5.pptxSlideEasy 5.pptx
SlideEasy 5.pptx
 
Portfolio
PortfolioPortfolio
Portfolio
 
Arcturus.pptx
Arcturus.pptxArcturus.pptx
Arcturus.pptx
 
The Bad & The Ugly
The Bad & The UglyThe Bad & The Ugly
The Bad & The Ugly
 
The Momentum Method
The Momentum MethodThe Momentum Method
The Momentum Method
 

Kürzlich hochgeladen

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Kürzlich hochgeladen (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Bayesian A/B Testing with PyMC

  • 1. Applied Bayesian Inference with PyMC @MrSantoni
  • 2. Which color will sell more? Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 3. Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY #buy / N #buy / N
  • 4. • What if N is small? • What is N to have 90% confidence? • What if N is different on A and B?
  • 6. Probability: Claim: we think Bayesian Frequentist Bayesian Frequence Belief
  • 7. test 1 test 2 test 3 Claim: we think Bayesian no-bugs confidence
  • 8. Bayesian Inference = update your beliefs new evidence prior belief
  • 9. The Developer View Statistical Problem def frequentist(): return 80% def bayesian(): return 0% 100%
  • 11. How to? 𝑃 𝐴 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴) 𝑃(𝐵) Closed-form solution: Realistic Cases Toy Examples 0% 100%
  • 12. PyMC
  • 13. PyMC • Perform Bayesian Inference • Markov Chain Monte Carlo techniques • A.k.a. Probabilistic Programming
  • 14. Show me the code!
  • 16. Only one difference between A and B Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 17. Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 18. Assume there is p_a probability of clicking BUY when landing on A p_b probability of clicking BUY when landing on B How to compute p_a and p_b?
  • 19. Page A – N_a visitors – C_a BUY-click on page A Page B – N_b visitors – C_b BUY-click on page B
  • 20. Frequentist: C_a / N_a BUT: Observed frequency does not necessarily equal p_a
  • 21. Bayesian: Infer true frequency from observed data
  • 22. Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 23. Bayesian Worflow 1. Define prior 2. Fit to observations 3. Get posteriors
  • 24. from pymc import Uniform, rbernoulli, Bernoulli, MCMC from matplotlib import pyplot as plt p_A_true = 0.05 N = 1500 occurrences = rbernoulli(p_A_true, N) print 'Click-BUY:' print occurrences.sum() print 'Observed frequency:' print occurrences.sum() / float(N) Click-BUY: 68 Observed frequency: 0.0453333333333
  • 25. Clicking BUY Bernoulli distribution 𝑃 𝑐𝑙𝑖𝑐𝑘 = 𝑝 1 − 𝑝 𝑐𝑙𝑖𝑐𝑘 = 1 𝑐𝑙𝑖𝑐𝑘 = 0 0 0.2 0.4 0.6 0.8 click=1 click=0 𝑝
  • 26. p_A = Uniform('p_A', lower=0, upper=1) 0 1 P_a print p_A.random() print p_A.value array(0.906086144982998) array(0.906086144982998) print p_A.random() print p_A.value array(0.285313846133313) array(0.285313846133313)
  • 27. p_A = Uniform('p_A', lower=0, upper=1) obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
  • 28. p_A = Uniform('p_A', lower=0, upper=1) [------- 20% ] 4053 of 20000 complete in 0.5 sec [------------- 36% ] 7315 of 20000 complete in 1.0 sec [-----------------53% ] 10627 of 20000 complete in 1.5 sec [-----------------69%------ ] 13939 of 20000 complete in 2.0 sec [-----------------81%----------- ] 16376 of 20000 complete in 2.5 sec [-----------------96%---------------- ] 19342 of 20000 complete in 3.0 sec [-----------------100%-----------------] 20000 of 20000 complete in 3.1 sec [ 0.04656576 0.04656576 0.04656576 ..., 0.03803667 0.03803667 0.03803667] mcmc = MCMC([p_A, obs]) mcmc.sample(20000, 1000) print mcmc.trace('p_A')[:] obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
  • 29. plt.figure(figsize=(8, 7)) plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True) plt.xlabel('Probability of clicking BUY') plt.ylabel('Density') plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A') plt.legend() plt.savefig('p_A_hist_N_%s.png' % N) plt.show()
  • 30. Confidence 90% that P is between X and Y? There is 90% probability that p_A is between 0.0373019596856 and 0.0548052806892 p_A_samples = mcmc.trace('p_A')[:] lower_bound = np.percentile(p_A_samples, 5) upper_bound = np.percentile(p_A_samples, 95) print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)
  • 31. What if N_a is lower?
  • 32. from pymc import Uniform, rbernoulli, Bernoulli, MCMC from matplotlib import pyplot as plt p_A_true = 0.05 N = 50 occurrences = rbernoulli(p_A_true, N) print 'Click-BUY:' print occurrences.sum() print 'Observed frequency:' print occurrences.sum() / float(N) Click-BUY: 2 Observed frequency: 0.04
  • 33. p_A = Uniform('p_A', lower=0, upper=1) obs = Bernoulli('obs', p_A, value=occurrences, observed=True) mcmc = MCMC([p_A, obs]) mcmc.sample(20000, 1000) print mcmc.trace('p_A')[:] [----- 14% ] 2874 of 20000 complete in 0.5 sec [----------- 30% ] 6035 of 20000 complete in 1.0 sec [-----------------47% ] 9440 of 20000 complete in 1.5 sec [-----------------63%---- ] 12775 of 20000 complete in 2.0 sec [-----------------81%---------- ] 16203 of 20000 complete in 2.5 sec [-----------------100%-----------------] 20000 of 20000 complete in 3.0 sec [ 0.06240723 0.06240723 0.06240723 ..., 0.01864419 0.01864419 0.01864419]
  • 34. plt.figure(figsize=(8, 7)) plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True) plt.xlabel('Probability of clicking BUY') plt.ylabel('Density') plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A') plt.legend() plt.savefig('p_A_hist_N_%s.png' % N) plt.show()
  • 35. Confidence 90% that P is between X and Y? There is 90% probability that p_A is between 0.0160966147705 and 0.114655284797 p_A_samples = mcmc.trace('p_A')[:] lower_bound = np.percentile(p_A_samples, 5) upper_bound = np.percentile(p_A_samples, 95) print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)
  • 36. N_a = 1500 N_a = 50
  • 37. Does the red have a larger probability of being clicked? Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 38. from pymc import Uniform, rbernoulli, Bernoulli, MCMC, deterministic from matplotlib import pyplot as plt p_A_true = 0.05 p_B_true = 0.04 N_A = 1500 N_B = 750 occurrences_A = rbernoulli(p_A_true, N_A) occurrences_B = rbernoulli(p_B_true, N_B) print 'Observed frequency:' print 'A' print occurrences_A.sum() / float(N_A) print 'B' print occurrences_B.sum() / float(N_B) Observed frequency: A 0.0533333333333 B 0.0413333333333
  • 39. p_A = Uniform('p_A', lower=0, upper=1) p_B = Uniform('p_B', lower=0, upper=1) @deterministic def delta(p_A=p_A, p_B=p_B): return p_A - p_B obs_A = Bernoulli('obs_A', p_A, value=occurrences_A, observed=True) obs_B = Bernoulli('obs_B', p_B, value=occurrences_B, observed=True) mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta]) mcmc.sample(25000, 5000) [----- 14% ] 3561 of 25000 complete in 0.5 sec [--------- 25% ] 6332 of 25000 complete in 1.0 sec [------------ 33% ] 8454 of 25000 complete in 1.5 sec [--------------- 41% ] 10499 of 25000 complete in 2.0 sec [-----------------50% ] 12602 of 25000 complete in 2.5 sec [-----------------59%-- ] 14780 of 25000 complete in 3.0 sec [-----------------67%----- ] 16883 of 25000 complete in 3.5 sec [-----------------75%-------- ] 18954 of 25000 complete in 4.0 sec [-----------------83%----------- ] 20877 of 25000 complete in 4.5 sec [-----------------91%-------------- ] 22924 of 25000 complete in 5.0 sec [-----------------100%-----------------] 25000 of 25000 complete in 5.5 sec
  • 40. p_A_samples = mcmc.trace('p_A')[:] p_B_samples = mcmc.trace('p_B')[:] delta_samples = mcmc.trace('delta')[:]
  • 41. plt.subplot(3,1,1) plt.xlim(0, 0.1) plt.hist(p_A_samples, bins=35, histtype='stepfilled', normed=True, color='blue', label='Posterior of p_A') plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A (unknown)') plt.xlabel('Probability of clicking BUY via A') plt.legend() plt.subplot(3,1,2) plt.xlim(0, 0.1) plt.hist(p_B_samples, bins=35, histtype='stepfilled', normed=True, color='green', label='Posterior of p_B') plt.vlines(p_B_true, 0, 90, linestyle='--', label='True p_B (unknown)') plt.xlabel('Probability of clicking BUY via B') plt.legend() plt.subplot(3,1,3) plt.xlim(0, 0.1) plt.hist(delta_samples, bins=35, histtype='stepfilled', normed=True, color='red', label='Posterior of delta') plt.vlines(p_A_true - p_B_true, 0, 90, linestyle='--', label='True delta (unknown)') plt.xlabel('p_A - p_B') plt.legend() plt.savefig('A_and_B.png') plt.show()
  • 42.
  • 43. p_A > p_B How much are we confident? print 'Probability that p_A > p_B:' print (delta_samples > 0).mean() Probability that p_A > p_B: 0.8919
  • 44. N_A = 1500 N_B = 750 N_A = 1500 N_B = 200
  • 45. print 'Probability that p_A > p_B:' print (delta_samples > 0).mean() Probability that p_A > p_B: 0.73455
  • 46. MCMC
  • 47. mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta]) mcmc.sample(25000, 5000) Posterior P(p_A, p_B, delta | obs_A, obs_B) as samples 25000 iterations 5000 burn-in Metropolis-Hastings algorithm
  • 48. Open the black box mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta]) mcmc.sample(25000, 5000) from pymc.Matplot import plot as mcplot mcplot(mcmc)
  • 49.
  • 50.
  • 51.
  • 52. PyMC • Easy to interpret results – confidence, no p-values! • No crazy math • Computationally expensive
  • 53.
  • 55. Back
  • 57. Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR 24/08/2013 Sampdoria Juventus 0 1 A 0 0 D 24/08/2013 Verona Milan 2 1 H 1 1 D 25/08/2013 Cagliari Atalanta 2 1 H 1 1 D 25/08/2013 Inter Genoa 2 0 H 0 0 D 25/08/2013 Lazio Udinese 2 1 H 2 0 H 25/08/2013 Livorno Roma 0 2 A 0 0 D 25/08/2013 Napoli Bologna 3 0 H 2 0 H 25/08/2013 Parma Chievo 0 0 D 0 0 D 25/08/2013 Torino Sassuolo 2 0 H 1 0 H 26/08/2013 Fiorentina Catania 2 1 H 2 1 H 31/08/2013 Chievo Napoli 2 4 A 2 2 D 31/08/2013 Juventus Lazio 4 1 H 2 1 H 01/09/2013 Atalanta Torino 2 0 H 0 0 D 01/09/2013 Bologna Sampdoria 2 2 D 1 1 D 01/09/2013 Catania Inter 0 3 A 0 1 A 01/09/2013 Genoa Fiorentina 2 5 A 0 3 A 01/09/2013 Milan Cagliari 3 1 H 2 1 H 01/09/2013 Roma Verona 3 0 H 0 0 D 01/09/2013 Sassuolo Livorno 1 4 A 0 1 A 01/09/2013 Udinese Parma 3 1 H 1 0 H 14/09/2013 Inter Juventus 1 1 D 0 0 D 14/09/2013 Napoli Atalanta 2 0 H 0 0 D 14/09/2013 Torino Milan 2 2 D 0 0 D 15/09/2013 Fiorentina Cagliari 1 1 D 0 0 D https://datahub.io/dataset/italian-football-data-serie-a-b
  • 59. Bayesian Worflow 1. Define Prior 2. Fit to observations 3. Get Posteriors
  • 60. Winning a Match Bernoulli distribution 𝑃 𝑤 = 𝑝 1 − 𝑝 𝑤 = 1 𝑤 = 0 0 0.2 0.4 0.6 0.8 Win (w=1) Lose (w=0) 𝑝
  • 62. Model the switchpoint 𝑝 = 𝑝1 𝑝2 𝑡 < 𝜏 𝑡 ≥ 𝜏 Goal -> infer 𝑝1, 𝑝2, 𝜏, 𝑝
  • 63. Bayesian Worflow 1. Define Prior 2. Fit to observations 3. Get Posteriors
  • 64. Let’s model this • goal: infer unknown p1, p2, TAU • FIRST STEP OF Bayesian Inference: assign a prior probability to different possible values of p • what would be a good prior for p1, p2? Use uniform: – p1 ~ Uniform(0,1) – p2 ~ Uniform(0,1) – TAU ~ DiscreteUniform(1, 38) • P(TAU=k)=1/38 for all k
  • 65. from pymc import Uniform, DiscreteUniform, deterministic, Bernoulli, Model, MCMC p_1 = Uniform('p_1', lower=0, upper=1) p_2 = Uniform('p_2', lower=0, upper=1) tau = DiscreteUniform('tau', lower=1, upper=38) print 'Random output: ', tau.random(), tau.random(), tau.random() Random output: 14 24 33 @deterministic def p_(tau=tau, p_1=p_1, p_2=p_2, num_matches=38): # concatenate p_1 and p_2 based on tau out = np.empty(num_matches) out[:tau] = p_1 out[tau:] = p_2 return out
  • 66. Load Data import pandas as pd df = pd.read_csv('serie_a.csv', parse_dates=['Date'], date_parser=parse_date) matches = df[(df.HomeTeam == ‘Milan’) | (df.AwayTeam == ‘Milan’)] matches = matches.set_index(['Date']) matches = compute_extra_columns(matches, team) # some pandas manipulations occur here matches[‘Win’] = … # 1 if Milan won, 0 otherwise
  • 67. Fit the Model observed_matches = Bernoulli('obs', p=p_, value=matches[['Win']], observed=True) model = Model([observed_matches, p_1, p_2, tau]) mcmc = MCMC(model) mcmc.sample(40000, 10000) p_1_samples = mcmc.trace('p_1')[:] p_2_samples = mcmc.trace('p_2')[:] tau_samples = mcmc.trace('tau')[:] print p_1_samples[:10] print p_2_samples[:10] print tau_samples[:10] [ 0.42067236 0.42067236 0.42067236 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391] [ 0.49213381 0.49213381 0.49213381 0.56072562 0.79863176 0.79863176 0.67416932 0.68382528 0.6069458 0.60062698] [10 10 24 35 35 35 35 27 27 27]
  • 68. plt.figure(figsize=(14.5, 10)) ax = plt.subplot(311) ax.set_autoscaley_on(False) plt.hist(p_1_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_1', color='#A60628', normed=True, bins=30) plt.legend(loc='upper left') ax = plt.subplot(312) plt.hist(p_2_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_2', color='#7A68A6', normed=True, bins=30) plt.legend(loc='upper left') ax = plt.subplot(313) plt.hist(tau_samples, histtype='stepfilled', alpha=0.85, label='posterior of tau', color='#467821', normed=True, bins=30) plt.legend(loc='upper left') plt.show()
  • 69.
  • 70. Expected Win Probability num_matches = 38 N = tau_samples.shape[0] expected_p_per_match = np.zeros(num_matches) for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) expected_p_per_match[match] = np.percentile(p_samples_match, 50)
  • 71.
  • 72. Compute Confidence Bounds lower_p_per_match = np.zeros(num_matches) upper_p_per_match = np.zeros(num_matches) for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) lower_p_per_match[match] = np.percentile(p_samples_match, 5) upper_p_per_match[match] = np.percentile(p_samples_match, 95)
  • 73. Bayesian returns a distribution. What have we gained? We see uncertainty in our estimates. The wider the distribution, the less certain our posterior belief should be.

Hinweis der Redaktion

  1. imagine to build e-commerce website choose color
  2. set up experiment
  3. Interpretation of probability Freq: probability is the frequency of event Difficult to understand for other scenario E.g. Presidential Elections (happen only once) Bayes: measure of belief or confidence in an event occurring. Assign a belief of 0 to an event: certainty NO occur
  4. You look for bugs in your code You are starting to believe that there may be no bugs in this code If you think this way, then congratulations: You already are thinking Bayesian!
  5. Bayesian inference is simply updating your beliefs after considering new evidence
  6. a Python library for performing Bayesian analysis that is undaunted by the mathematiccal monster we have created The code is not random; it is probabilistic in the sense that we create probability models using programming variables as the model’s components.
  7. We go through a simple example to understand some basic features of PyMC
  8. Only one difference between A and B: any change in dynamics can be attributed to that change
  9. No need to be same number on A or on B
  10. Observed frequency <> true frequency (probability) Only for large numbers (law of large numbers)
  11. Only one difference between A and B: any change in dynamics can be attributed to that change
  12. Define a model (random variables) prior probabilities i.e. our prior belief Fit to the dataset compute posterior probabilities
  13. random variable which takes the value 1 with success probability of p and the value 0 with failure probability of 1-p. What is the value of p?
  14. random value value not determined
  15. obs: observations of clicking BUY random variable but unlike p_A we observed value argument observed to True -> value should not be changed
  16. Only one difference between A and B: any change in dynamics can be attributed to that change
  17. N_A > N_B Posterior of p_B is flatter Most of Posterior of p_A – p_B is above 0. So we are confident p_A > p_B
  18. If this probability is too low, one can try to get more samples from B (to make it less flat).
  19. Fitting a model means characterizing its posterior distribution somehow. the MCMC sampler randomly updates the values of p_A, p_B, delta  over a specified number of iterations (iter). burn parameter specifies a sufficiently large number of iterations for the algorithm to converge
  20. Recommend it Nice intro to BI and Probabilistic Programming assumes NO prior knowledge of Bayesian inference and probability HOW TO: Probability applied to real examples
  21. Was there a change in the win rate?
  22. Define a model (random variables) prior probabilities i.e. our prior belief Fit to the dataset compute posterior probabilities
  23. random variable which takes the value 1 with success probability of p and the value 0 with failure probability of 1-p. What is the value of p?
  24. What is the value of p? seems to increase at some point during observations
  25. Let’s assume that on some day TAU during the observation period the parameter p suddenly jumps to a higher value. So, we really have two p parameters: one for the period before TAU, and one for the rest of the observation period
  26. Define a model (random variables) prior probabilities i.e. our prior belief Fit to the dataset compute posterior probabilities