SlideShare a Scribd company logo
1 of 40
Download to read offline
Bandit Basics – A Different
take on Online Optimization




            Conductrics twitter: @mgershoff
Who is this guy?

Matt Gershoff
CEO: Conductrics
Many Years in Database Marketing (New York
and Paris)
and a bit of Web Analytics

www.conductrics.com
twitter:@mgershoff
Email:matt@conductrics.com
Speak Up




Conductrics twitter: @mgershoff
What Are We Going to Hear?
• Optimization Basics
• Multi-Armed Bandit
  • Its a Problem, Not a Method
  • Some Methods
    • AB Testing
    • Epsilon Greedy
    • Upper Confidence Interval (UCB)
• Some Results
Choices                         Targeting




Learning                         Optimization




           Conductrics twitter: @mgershoff
OPTIMIZATION


If THIS Then THAT
ITTT brings together:
1.Decision Rules
2.Predictive Analytics
3.Choice Optimization
             Conductrics twitter: @mgershoff
OPTIMIZATION
Find and Apply the Rule with the most Value

  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT



                 Conductrics twitter: @mgershoff
OPTIMIZATION
Variables whose Values                      Variables whose Values
   Are Given to You                              You Control
        THIS                                                    THAT

If    Facebook
      High Spend
      Urban GEO
      .
                   Then                                        Offer A
                                                               Offer B
                                                               Offer C
                                                               .
      .                 Predictive Model                       .
      .                                                        .
      .            F1
                                                               .
      .
                   F2
                                             S        Valuei   .
      Home Page    Fm
                                                               Offer Y
      App Use                                                  Offer Z

       Inputs           Conductrics twitter: @mgershoff
                                                               Outputs
But …
1. We Don’t Have Data on ‘THAT’
2. Need to Collect – Sample     THAT
3. How to Sample Efficiently?   Offer A ?
                                                   Offer B ?
                                                   Offer C ?
                                                   .
                                                   .
                                                   .
                                                   .
                                                   .
                                                   Offer Y ?
                                                   Offer Z ?


                 Conductrics twitter: @mgershoff
Where
Marketing Applications:
• Websites
• Mobile
• Social Media Campaigns
• Banner Ads
Pharma: Clinical Trials
          Conductrics twitter: @mgershoff
What is a Multi Armed Bandit
One Armed Bandit –>Slot Machine
The problem:
How to pick between Slot Machines so that
you walk out with most $$$ from Casino at the
end of the Night?


                                  OR



               A                             B
                   Conductrics twitter: @mgershoff
Objective

Pick so as to get the most
return/profit as you can
over time
Technical term: Minimize Regret


             Conductrics twitter: @mgershoff
Sequential Selection
… but how to Pick?                              OR


                                            A        B




Need to Sample, but do it efficiently


              Conductrics twitter: @mgershoff
Explore – Collect Data

                              OR

                    A                B


• Data Collection is costly – an Investment
• Be Efficient – Balance the potential value of
  collecting new data with exploiting what
  you currently know.

                 Conductrics twitter: @mgershoff
Multi-Armed Bandits

“Bandit problems embody in essential
form a conflict evident in all human
action: choosing actions which yield
immediate reward vs. choosing actions
… whose benefit will come only later.”*
- Peter Whittle

 *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010.

                           Conductrics twitter: @mgershoff
Exploration Exploitation


1) Explore/Learn – Try out different actions
to learn how they perform over time – This is
a data collection task.

2) Exploit/Earn – Take advantage of what
you have learned to get highest payoff –
Your current best guess
                Conductrics twitter: @mgershoff
Not A New Problem
1933 – first work on competing
 options
1940 – WWII Problem Allies
 attempt to tackle
1953 – Bellman formulates as a
 Dynamic Programing problem
Source: http://www.lancs.ac.uk/~winterh/GRhist.html

                          Conductrics twitter: @mgershoff
Testing

• Explore First
  – All actions have an equal chance of
    selection (uniform random).
  – Use hypothesis testing to select a
    ‘Winner’.
• Then Exploit - Keep only ‘Winner’
  for selection

               Conductrics twitter: @mgershoff
Learn First
Data Collection/Sample                      Apply Leaning



    Explore/                                  Exploit/
     Learn                                     Earn

                        Time


                 Conductrics twitter: @mgershoff
P-Values: A Digression

P-Values:
• NOT the probability that the Null is
  True. P( Null=True| DATA)
• P(DATA (or more extreme)| Null=True)
• Not a great tool for deciding when to
  stop sampling
See:
http://andrewgelman.com/2010/09/noooooooooooooo_1/
http://www.stat.duke.edu/~berger/papers/02-01.pdf
                       Conductrics twitter: @mgershoff
A Couple Other Methods

1. Epsilon Greedy
  Nice and Simple

2. Upper Confidence Bounds(UCB)
  Adapts to Uncertainty


           Conductrics twitter: @mgershoff
1) Epsilon-Greedy




    Conductrics twitter: @mgershoff
Greedy

What do you mean by ‘Greedy’?

Make whatever choice seems
best at the moment.
Epsilon Greedy

What do you mean by ‘Epsilon
Greedy’?
• Explore – randomly select action
   percent of the time (say 20%)

• Exploit – Play greedy (pick the
  current best) 1-  (say 80%)
Epsilon Greedy
                          User



Explore/Learn                                          Exploit/Earn
(20%)                                                  (80%)


        Select                                     Select Current
      Randomly                                          Best
   Like AB Testing                                  (Be Greedy)

                     Conductrics twitter: @mgershoff
Epsilon Greedy
20% Random                         80% Select Best



                                    Action      Value
                                   A           $5.00
                                   B           $4.00
                                   C           $3.00
                                   D           $2.00
                                   E           $1.00
             Conductrics twitter: @mgershoff
Continuous Sampling

   Explore/Learn

    Exploit/Earn

             Time


      Conductrics twitter: @mgershoff
Epsilon Greedy

– Super Simple/low cost to implement
– Tends to be surprisingly effective
– Less affected by ‘Seasonality’

– Not optimal (hard to pick best )
– Doesn’t use measure of variance
– Should/How to decrease Exploration over
  time?       Conductrics twitter: @mgershoff
Upper Confidence Bound

Basic Idea:
1) Calculate both mean and a measure
   of uncertainty (variance) for each
   action.
2) Make Greedy selections based on
   mean + uncertainty bonus


             Conductrics twitter: @mgershoff
Confidence Interval Review

Confidence Interval = mean +/- z*Std


     - 2*Std         Mean                        +2*Std




               Conductrics twitter: @mgershoff
Upper Confidence

Score each option using the upper
portion of the interval as a Bonus


  Mean     +Bonus




              Conductrics twitter: @mgershoff
Upper Confidence Bound
1) Use upper portion of CI as ‘Bonus’

2) Make Greedy Selections

             A                                     Select A
                          B
                      C

        $0       $5                    $10
             Estimated Reward

                 Conductrics twitter: @mgershoff
Upper Confidence Bound
1) Selecting Action ‘A’ reduces uncertainty
bonus (because more data)

2) Action ‘C’ now has highest score


              A
                              B
                          C                            Select C

         $0          $5                    $10
                  Estimated Reward

                     Conductrics twitter: @mgershoff
Upper Confidence Bound


• Like A/B Test – uses variance
  measure
• Unlike A/B Test – no hypothesis
  test
• Automatically Balances
  Exploration with Exploitation
             Conductrics twitter: @mgershoff
Case Study:
          Conversion
Treatment    Rate     Served
V2V3             9.9% 14,893
V2V2             9.7%  9,720
V2V1             8.0%  2,441
V1V3             3.3%  2,090
V2V3             2.6%  1,849
V2V2             2.0%  1,817
V1V1             1.8%  1,926
V3V1             1.8%  1,821
V1V2             1.5%  1,873
          Conductrics twitter: @mgershoff
Case Study

Test Method                  Conversion
                             Rate
Adaptive                     7%

Non Adaptive                 4.5%


           Conductrics twitter: @mgershoff
AB Testing V Bandit

Option A ->


Option B ->



Option C ->




              Conductrics twitter: @mgershoff
Why Should I Care?

• More Efficient Learning

• Automation

• Changing World


               Conductrics twitter: @mgershoff
Questions?




 Conductrics twitter: @mgershoff
Thank You!



Matt Gershoff
      p) 646-384-5151
      e) matt@conductrics.com
      t) @mgershoff

    Conductrics twitter: @mgershoff

More Related Content

Viewers also liked

Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksIlias Flaounas
 
Predictive Modeling in Underwriting
Predictive Modeling in UnderwritingPredictive Modeling in Underwriting
Predictive Modeling in UnderwritingKevin Pledge
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshopodsc
 
Advanced Pricing in General Insurance
Advanced Pricing in General InsuranceAdvanced Pricing in General Insurance
Advanced Pricing in General InsuranceSyed Danish Ali
 
Google AdSense Presentation
Google AdSense PresentationGoogle AdSense Presentation
Google AdSense PresentationJoann Chedid
 
Website Development with Wordpress as Content Management System
Website Development with Wordpress as Content Management SystemWebsite Development with Wordpress as Content Management System
Website Development with Wordpress as Content Management SystemBroekman Communications
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Insurance pricing
Insurance pricingInsurance pricing
Insurance pricingLincy PT
 

Viewers also liked (10)

Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricks
 
Predictive Modeling in Underwriting
Predictive Modeling in UnderwritingPredictive Modeling in Underwriting
Predictive Modeling in Underwriting
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshop
 
Advanced Pricing in General Insurance
Advanced Pricing in General InsuranceAdvanced Pricing in General Insurance
Advanced Pricing in General Insurance
 
Actuarial Analytics in R
Actuarial Analytics in RActuarial Analytics in R
Actuarial Analytics in R
 
Princing insurance contracts with R
Princing insurance contracts with RPrincing insurance contracts with R
Princing insurance contracts with R
 
Google AdSense Presentation
Google AdSense PresentationGoogle AdSense Presentation
Google AdSense Presentation
 
Website Development with Wordpress as Content Management System
Website Development with Wordpress as Content Management SystemWebsite Development with Wordpress as Content Management System
Website Development with Wordpress as Content Management System
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Insurance pricing
Insurance pricingInsurance pricing
Insurance pricing
 

Bandit Basics – Optimizing Conversions

  • 1. Bandit Basics – A Different take on Online Optimization Conductrics twitter: @mgershoff
  • 2. Who is this guy? Matt Gershoff CEO: Conductrics Many Years in Database Marketing (New York and Paris) and a bit of Web Analytics www.conductrics.com twitter:@mgershoff Email:matt@conductrics.com
  • 4. What Are We Going to Hear? • Optimization Basics • Multi-Armed Bandit • Its a Problem, Not a Method • Some Methods • AB Testing • Epsilon Greedy • Upper Confidence Interval (UCB) • Some Results
  • 5. Choices Targeting Learning Optimization Conductrics twitter: @mgershoff
  • 6. OPTIMIZATION If THIS Then THAT ITTT brings together: 1.Decision Rules 2.Predictive Analytics 3.Choice Optimization Conductrics twitter: @mgershoff
  • 7. OPTIMIZATION Find and Apply the Rule with the most Value If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT Conductrics twitter: @mgershoff
  • 8. OPTIMIZATION Variables whose Values Variables whose Values Are Given to You You Control THIS THAT If Facebook High Spend Urban GEO . Then Offer A Offer B Offer C . . Predictive Model . . . . F1 . . F2 S Valuei . Home Page Fm Offer Y App Use Offer Z Inputs Conductrics twitter: @mgershoff Outputs
  • 9. But … 1. We Don’t Have Data on ‘THAT’ 2. Need to Collect – Sample THAT 3. How to Sample Efficiently? Offer A ? Offer B ? Offer C ? . . . . . Offer Y ? Offer Z ? Conductrics twitter: @mgershoff
  • 10. Where Marketing Applications: • Websites • Mobile • Social Media Campaigns • Banner Ads Pharma: Clinical Trials Conductrics twitter: @mgershoff
  • 11. What is a Multi Armed Bandit One Armed Bandit –>Slot Machine The problem: How to pick between Slot Machines so that you walk out with most $$$ from Casino at the end of the Night? OR A B Conductrics twitter: @mgershoff
  • 12. Objective Pick so as to get the most return/profit as you can over time Technical term: Minimize Regret Conductrics twitter: @mgershoff
  • 13. Sequential Selection … but how to Pick? OR A B Need to Sample, but do it efficiently Conductrics twitter: @mgershoff
  • 14. Explore – Collect Data OR A B • Data Collection is costly – an Investment • Be Efficient – Balance the potential value of collecting new data with exploiting what you currently know. Conductrics twitter: @mgershoff
  • 15. Multi-Armed Bandits “Bandit problems embody in essential form a conflict evident in all human action: choosing actions which yield immediate reward vs. choosing actions … whose benefit will come only later.”* - Peter Whittle *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010. Conductrics twitter: @mgershoff
  • 16. Exploration Exploitation 1) Explore/Learn – Try out different actions to learn how they perform over time – This is a data collection task. 2) Exploit/Earn – Take advantage of what you have learned to get highest payoff – Your current best guess Conductrics twitter: @mgershoff
  • 17. Not A New Problem 1933 – first work on competing options 1940 – WWII Problem Allies attempt to tackle 1953 – Bellman formulates as a Dynamic Programing problem Source: http://www.lancs.ac.uk/~winterh/GRhist.html Conductrics twitter: @mgershoff
  • 18. Testing • Explore First – All actions have an equal chance of selection (uniform random). – Use hypothesis testing to select a ‘Winner’. • Then Exploit - Keep only ‘Winner’ for selection Conductrics twitter: @mgershoff
  • 19. Learn First Data Collection/Sample Apply Leaning Explore/ Exploit/ Learn Earn Time Conductrics twitter: @mgershoff
  • 20. P-Values: A Digression P-Values: • NOT the probability that the Null is True. P( Null=True| DATA) • P(DATA (or more extreme)| Null=True) • Not a great tool for deciding when to stop sampling See: http://andrewgelman.com/2010/09/noooooooooooooo_1/ http://www.stat.duke.edu/~berger/papers/02-01.pdf Conductrics twitter: @mgershoff
  • 21. A Couple Other Methods 1. Epsilon Greedy Nice and Simple 2. Upper Confidence Bounds(UCB) Adapts to Uncertainty Conductrics twitter: @mgershoff
  • 22. 1) Epsilon-Greedy Conductrics twitter: @mgershoff
  • 23. Greedy What do you mean by ‘Greedy’? Make whatever choice seems best at the moment.
  • 24. Epsilon Greedy What do you mean by ‘Epsilon Greedy’? • Explore – randomly select action  percent of the time (say 20%) • Exploit – Play greedy (pick the current best) 1-  (say 80%)
  • 25. Epsilon Greedy User Explore/Learn Exploit/Earn (20%) (80%) Select Select Current Randomly Best Like AB Testing (Be Greedy) Conductrics twitter: @mgershoff
  • 26. Epsilon Greedy 20% Random 80% Select Best Action Value A $5.00 B $4.00 C $3.00 D $2.00 E $1.00 Conductrics twitter: @mgershoff
  • 27. Continuous Sampling Explore/Learn Exploit/Earn Time Conductrics twitter: @mgershoff
  • 28. Epsilon Greedy – Super Simple/low cost to implement – Tends to be surprisingly effective – Less affected by ‘Seasonality’ – Not optimal (hard to pick best ) – Doesn’t use measure of variance – Should/How to decrease Exploration over time? Conductrics twitter: @mgershoff
  • 29. Upper Confidence Bound Basic Idea: 1) Calculate both mean and a measure of uncertainty (variance) for each action. 2) Make Greedy selections based on mean + uncertainty bonus Conductrics twitter: @mgershoff
  • 30. Confidence Interval Review Confidence Interval = mean +/- z*Std - 2*Std Mean +2*Std Conductrics twitter: @mgershoff
  • 31. Upper Confidence Score each option using the upper portion of the interval as a Bonus Mean +Bonus Conductrics twitter: @mgershoff
  • 32. Upper Confidence Bound 1) Use upper portion of CI as ‘Bonus’ 2) Make Greedy Selections A Select A B C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  • 33. Upper Confidence Bound 1) Selecting Action ‘A’ reduces uncertainty bonus (because more data) 2) Action ‘C’ now has highest score A B C Select C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  • 34. Upper Confidence Bound • Like A/B Test – uses variance measure • Unlike A/B Test – no hypothesis test • Automatically Balances Exploration with Exploitation Conductrics twitter: @mgershoff
  • 35. Case Study: Conversion Treatment Rate Served V2V3 9.9% 14,893 V2V2 9.7% 9,720 V2V1 8.0% 2,441 V1V3 3.3% 2,090 V2V3 2.6% 1,849 V2V2 2.0% 1,817 V1V1 1.8% 1,926 V3V1 1.8% 1,821 V1V2 1.5% 1,873 Conductrics twitter: @mgershoff
  • 36. Case Study Test Method Conversion Rate Adaptive 7% Non Adaptive 4.5% Conductrics twitter: @mgershoff
  • 37. AB Testing V Bandit Option A -> Option B -> Option C -> Conductrics twitter: @mgershoff
  • 38. Why Should I Care? • More Efficient Learning • Automation • Changing World Conductrics twitter: @mgershoff
  • 40. Thank You! Matt Gershoff p) 646-384-5151 e) matt@conductrics.com t) @mgershoff Conductrics twitter: @mgershoff