This document summarizes Yan Xu's presentation on practical applications of multi-armed bandits. Bandits can be used for personalized recommendation, such as recommending news articles, by balancing exploration of new articles with exploitation of known good articles. Amazon's bandit algorithm allows for real-time optimization of multiple variables by modeling interactions between variables. The algorithm was able to increase website conversion by 21% after a single week of optimization.
2. OUTLINE
- Recap on Bandit Problem
- A Contextual-Bandit Approach to Personalized News
Article Recommendation
http://rob.schapire.net/papers/www10.pdf
- An efficient bandit algorithm for realtime multivariate
optimization
https://www.kdd.org/kdd2017/papers/view/an-
efficient-bandit-algorithm-for-realtime-multivariate-
optimization
4. DILEMMA: EXPLORATION VS.
EXPLOITATION
The exploration/exploitation trade-off is a dilemma we
frequently face in choosing between options.
Stay the same route to drive home, or try a new route?
Choose your favorite restaurant, or the new one?
Listen to your favorite music channel, or try a new artist?
Attend a new meetup?
5. HOW TO RESOLVE THE
DILEMMA
https://pavlov.tech/2019/03/02/animated-multi-
armed-bandit-policies/
Epsilon Greedy
UCB (Upper Confidence Bound)
Thompson Sampling