Practical contextual bandits for business

PRACTICAL BANDITS
FOR BUSINESS
Yan Xu
Houston Machine Learning Meetup
June 22, 2019

OUTLINE
- Recap on Bandit Problem
- A Contextual-Bandit Approach to Personalized News
Article Recommendation
http://rob.schapire.net/papers/www10.pdf
- An efficient bandit algorithm for realtime multivariate
optimization
https://www.kdd.org/kdd2017/papers/view/an-
efficient-bandit-algorithm-for-realtime-multivariate-
optimization

DILEMMA: EXPLORATION VS.
EXPLOITATION
The exploration/exploitation trade-off is a dilemma we
frequently face in choosing between options.
Stay the same route to drive home, or try a new route?
Choose your favorite restaurant, or the new one?
Listen to your favorite music channel, or try a new artist?
Attend a new meetup?

HOW TO RESOLVE THE
DILEMMA
https://pavlov.tech/2019/03/02/animated-multi-
armed-bandit-policies/
Epsilon Greedy
UCB (Upper Confidence Bound)
Thompson Sampling

MULTI-ARMED BANDITS
FORMULATION

BANDITS FOR PERSONALIZED
RECOMMENDATION

BANDITS FOR NEWS
RECOMMENDATION

CONTEXTUAL BANDITS
[0.1, 0.6]
[0.6, 0.4]
[0.7, 0.1]
[0.4, 0.2]

FEATURE FREE VS LINEAR
CONTEXTUAL BANDIT

DEALING WITH HIGH
DIMENSIONALITY
～1000 binary features per user; ~100 binary feature per article

DEALING WITH HIGH
DIMENSIONALITY

RESULT: PERSONALIZED
NEWS
Omniscient: always chooses the article with highest empirical

AMAZON: BANDITS FOR
MULTIVARIATE OPTIMIZATION
Published at KDD 2017, KDD 2019 is in Alaska!

AMAZON: BANDITS FOR
MULTIVARIATE OPTIMIZATION

STEP 3: HILLING-CLIMBING TO
DECIDE

SIMULATION RESULT
Control widget interaction in simulation
through alpha_2.

EXPERIMENT ON REAL
TRAFFIC
• After only a single week of online
optimization, we saw a 21%
conversion increase compared to the
median layout

SUMMARY
Contextual bandits
 Linear payoff
 Add interaction components
 UCB: Variance estimation of expected rewards
 Thompson sampling: Sample weights from posterior distribution
Applications
 Recommendation
 Multi-variate optimization
For more details
 - A Contextual-Bandit Approach to Personalized News Article
Recommendation
http://rob.schapire.net/papers/www10.pdf
- An efficient bandit algorithm for realtime multivariate
optimization
https://www.kdd.org/kdd2017/papers/view/an-efficient-
bandit-algorithm-for-realtime-multivariate-optimization

Practical contextual bandits for business

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Mehr von Yan Xu

Mehr von Yan Xu (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Practical contextual bandits for business