Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021Mirko Marras
Ähnlich wie Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits (20)
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits
1. - Introduction to Marketplaces
- Relevance vs Fairness trade-off
- Multi-objective Contextual Bandits
29 March 2019
Recommendations in a
Marketplace
Rishabh Mehrotra
Research Scientist, Spotify Research
London, UK
rishabhm@spotify.com
11. Approaches for RecSys
Neural Embeddings
User Embedding … with Side Information
RecSys 2016: Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation
12. Approaches for RecSys
Neural Embeddings
User Embedding … with Side Information Joint User-Item Embedding
WSDM 2017: Joint Deep Modeling of Users and Items Using Reviews for Recommendation
14. Approaches for RecSys
Variants of Recommendation Styles:
- Short vs long term
- Cold start or cohort based
- Multi-view & multi-interest models
- Mult-task recommendation
SIGIR 2012: Modeling the Impact of Short- and Long-Term Behavior on Search Personalization
15. Approaches for RecSys
Variants of Recommendation Styles:
- Short vs long term
- Cold start & cohort based
- Multi-view & multi-interest models
- Mult-task recommendation
SIGIR 2014: Cohort Modeling for Enhanced Personalized Search
16. Approaches for RecSys
Variants of Recommendation Styles:
- Short vs long term
- Cold start or cohort based
- Multi-view & multi-interest models
- Mult-task recommendation
RecSys 2013: Nonlinear Latent Factorization by Embedding Multiple User Interests
17. Approaches for RecSys
Variants of Recommendation Styles:
- Short vs long term
- Cold start or cohort based
- Multi-view & multi-interest models
- Multi-task recommendation
KDD 2018: Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks
21. Traditional RecSys: User Centric
● User centric nature of systems:
○ Recommendations models catered to users:
■ user needs
■ user interests
■ user behavior & interactions
■ personalization
○ Evaluation approaches for user satisfaction
■ Measuring user engagement
■ Optimizing for user satisfaction
■ User centric metrics
*WSDM 2018 Tutorial on metrics of user engagement; Mounia Lalmas, et al [link]
36. Disclaimer
● Multi-objective ML has been around for decades
● Past work on constrained optimization in industrial setting
○ WWW 2015: Constrained Optimization for Homepage Relevance (LinkedIn)
○ SIGIR 2012: Personalized Click Shaping through Lagrangian Duality for Online Recommendation
○ arXiv 2018: Joint Revenue Optimization at Etsy (Etsy)
○ SIGIR 2018: Turning Clicks into Purchases: Revenue Optimization for Product Search in
E-Commerce (Etsy)
○ KDD 2011: Click Shaping to Optimize Multiple Objectives (Yahoo!)
● Why this talk then?
○ Most past approaches work in Learning to Rank setting
○ Relatively less work in interaction ML or RL, specifically bandit setting
37. Today’s Talk
Phase I: User-centric RecSys
(Bandit: Explore, Exploit, Explain)
Phase II: Inject one competing objective
(Relevance vs Fairness)
Phase III: Multi-stakeholder
Bandits
User
centric
Multi-
Stakeholder
38. Phase II: Relevance - Fairness trade-off
Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance,
Fairness & Satisfaction in Recommendation Systems
Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, Fernando Diaz
(CIKM 2018)
39. Pitfalls of User Centric RecSys
Recommendations based predicted relevance results in Superstar Economics
Suppliers would want a fair opportunity to be presented to the users
Blindly optimizing for relevance might have a detrimental impact on supplier
fairness
41. Key Definitions
Relevance:
We identify a recommendation as relevant if it closely resembles user’s interest
profile (embedding based representation for users & tracks)
User Satisfaction:
Defined as the subjective measure on the utility of recommendations. Rely on
implicit feedback based on behavioral signals (i.e. # tracks played)
42. Key Definitions
Fairness:
- numerous attempts to define fairness [FAT*’18, ICML’18]
- unlikely that there will be a universal definition appropriate across all applications
43. Key Definitions
Fairness*:
- numerous attempts to define fairness [FAT*’18, ICML’18]
- unlikely that there will be a universal definition appropriate across all applications
● Statistical bias
● Group fairness
○ Demographic parity
○ Equal Pos Pred. Value
○ Equal Neg Pred. Value
○ Equal False + Rate
○ Equal False - Rate
○ Accuracy equity
● Blindness
● Individual fairness
○ Equal thresholds
○ Similarity metric
● Process fairness (feature rating)
● Diversity (various definitions)
● Representational harms
○ Stereotype mirroring
○ Cross-dataset generalization
○ Bias in representation learning
○ Bias amplification
FAT* 2018 Tutorial: 21 definitions of fairness and their politics [link]
ICML 2018 Tutorial: Defining and Designing Fair Algorithms [link]
#algo-bias Confluence page [link]
44. Key Definitions
2 1 1 4 0 0
(√2 + √1 + √1) > (√4 + √0 + √0)
Fairness*:
Define group fairness: a set of tracks is fair if it contains tracks from artists that
belong to different groups (i.e. popularity bins/tiers).
*Framework amenable to other interpretations and definitions of fairness
* Representative & Informative Query Selection for Learning to Rank using Submodular Functions
Rishabh Mehrotra, Emine Yilmaz, SIGIR 2015
50. Recommendation Policies
System designers are wary of negatively impacting user satisfaction
→ avoid showing less relevant content
Policy V: Guaranteed Relevance
System designers are wary of negatively impacting user satisfaction → avoid
showing less relevant content
This policy guarantees relevance to be above a certain threshold
52. Recommendation Policies
Conjecture: Users have varying extent of sensitivity towards fair content
● Some users more flexible than others around the distribution of artists
recommended
53. Recommendation Policies
Conjecture: Users have varying extent of sensitivity towards fair content
● Some users more flexible than others around the distribution of artists
recommended
User Fairness Affinity:
Computed as: difference in user satisfaction when recommended relevant content,
versus when recommended fair content
54. Recommendation Policies
Policy VI: Adaptive Policy
Extreme case view:
● optimize for relevance for users with negative affinity scores
● optimize for fairness for users with a positive score
58. Experiments: Trade-off Analysis
● Optimizing for Fairness hurts
satisfaction
○ 35% decline in SAT
○ Motivate the need for trade-off
● Gradual improvement in SAT as we
move from β=0 to β=1
○ 10% lift in SAT for half-way
○ Sharp increase in SAT beyond 0.7
Fairness Relevance
59. Experiments: Impact of Guarantees
● Guaranteeing relevance helps improve SAT
○ Higher maximum SAT score (0.84 vs 0.64)
60. Experiments: Incorporating User Tolerance
Adaptive policies fare better than
● Only Fairness & only Relevance
● Interleaved (max SAT 0.65)
○ Over 12% improvement in SAT
61. Experiments: Incorporating User Tolerance
Adaptive policies fare better than
● Only Fairness & only Relevance
● Interleaved (max SAT 0.65)
○ Over 12% improvement in SAT
Adaptive policies: major gains in Fairness,
without severe losses in Relevance
63. Experiments: Holistic View
Cost vs Benefit analysis
Simple interpolation -- no good region (high SAT loss or high fairness loss)
ProbPolicy: balancing with β=0.7 gives best results
Guaranteed R: hurts fairness
Adaptive policy: best overall trade-off
64. Summary: Phase II
Relevance vs Fairness
- Trading off Relevance ← SAT → Fairness is better than blindly optimizing for
relevance
- User tolerance aware model helps!
- There is benefit in considering objectives beyond just User SAT
Motivates the need for considering multiple stakeholder
objectives beyond just User SAT
65. Today’s Talk
Phase I: User-centric RecSys
(Bandit: Explore, Exploit, Explain)
Phase II: Inject one competing objective
(Relevance vs Fairness)
Phase III: Multi-stakeholder
Bandits
User
centric
Multi-
Stakeholder
66. Phase III: Multi-objective Models for
Marketplaces
Multi-objective Linear Contextual Bandits via Generalised Gini Function
Niannan Xue, Rishabh Mehrotra, Mounia Lalmas
(under review)
69. Multi-objective Contextual Bandits
f(.): Generalized Gini Index
- Ordered weighted averaging (OWA)
- Respects Pigou-Dalton transfer: prefer allocations that are more equitable
70. Proposed: Multi-Objective Contextual Bandits
via GGI
● Goal: Find an arm selection strategy
○ probability distribution based on which an arm (i.e. recommendation) is selected
71. Proposed: Multi-Objective Contextual Bandits
via GGI
● Goal: Find an arm selection strategy
○ probability distribution based on which a recommendation is selected
● For a bandit instance at round t, we are given features with
72. Proposed: Multi-Objective Contextual Bandits
via GGI
● Goal: Find an arm selection strategy
○ probability distribution based on which a recommendation is selected
● For a bandit instance at round t, we are given features with
● If we choose arm k, we observe linear reward where
73. Proposed: Multi-Objective Contextual Bandits
via GGI
● Goal: Find an arm selection strategy
○ probability distribution based on which a recommendation is selected
● For a bandit instance at round t, we are given features with
● If we choose arm k, we observe linear reward where
● If vectorial mean feedback for each arm is known:
○ Find optimal arm via full sweep
74. Proposed: Multi-Objective Contextual Bandits
via GGI
● Goal: Find an arm selection strategy
○ probability distribution based on which a recommendation is selected
● For a bandit instance at round t, we are given features with
● If we choose arm k, we observe linear reward where
● If vectorial mean feedback for each arm is known:
○ Find optimal arm via full sweep
● But its not known, its context dependent
○ Optimal policy given by:
75. Problem setup:
➔ K = Number of arms
➔ D = Number of
objectives
➔ Robustness of the
algorithm
➔ Ridge regression
regularisation
Proposed Multi-Objective Model
76. Params initialisation:
➔ Uniform strategy
➔ Auxiliary matrices for
analytical solution to
ridge regression
Proposed Multi-Objective Model
77. Linear realizability:
➔ Observe all contexts
➔ Estimate mean
rewards
◆ via l2-regularised
least-squares ridge
regression
Proposed Multi-Objective Model
78. Online Gradient Descent:
➔ Non-vanishing step
size
➔ Project a[t] back onto A
Proposed Multi-Objective Model
79. Action and Update
- Sample arm kt based
on the distribution a[t]
- Observe reward from
user
- Update the model
Proposed Multi-Objective Model
81. ● Theoretically: Is the regret bounded?
● Regret bounds in past papers
○ ICML 2017: Provably Optimal Algorithms for Generalized Linear Contextual Bandits
■
○ ICML 2013: Thompson Sampling for Contextual Bandits with Linear Payoffs
■
○ NIPS 2011: Improved Algorithms for Linear Stochastic Bandits
■
○ AISTATS 2011: Contextual Bandits with Linear Payoff Functions
● We derive the regret bounds for multi-objective contextual bandits
Is it going to work?
82. - Sublinear in T (i.e. no. of rounds)
- Increases with robustness
Overall regret bounded by
84. Experiments I: Multi- vs Single- Objectives
Use-case: all objectives are user interaction based metrics
(no competing business objective yet)
- Clicks
- Stream time
- Business streams
- Total number of songs played
85. Experiments I: Multi- vs Single- Objectives
Use-case: all objectives are user interaction based metrics
- Clicks
- Stream time
- Business streams
- Total number of songs played
● Optimizing for different objectives impacts other
objectives
○ If you want more clicks, optimize for clicks
86. Experiments I: Multi- vs Single- Objectives
Use-case: all objectives are user interaction based metrics
- Clicks
- Stream time
- Business streams
- Total number of songs played
● Optimizing for different objectives impacts other
objectives
○ If you want more clicks, optimize for clicks
● Multi-objective model performs much better
88. Experiments I: Multi- vs Single- Objectives
Use-case: all objectives are user interaction based metrics
- Clicks
- Stream time
- Business streams
- Total number of songs played
● Optimizing for different objectives impacts other
objectives
○ If you want more clicks, optimize for clicks
● Multi-objective model performs much better
Optimizing for multiple interaction metrics performs better for
each metric than directly optimizing that metric
89. Experiments II: Add Competing Objective
● Competing objectives:
○ User interaction objectives: clicks, streams,
no. of songs played, stream length
○ Add: a business objective, (say) gender
exposure
● Significant gains in business objective
90. Experiments II: Add Competing Objective
● Competing objectives:
○ User interaction objectives: clicks, streams, no.
of songs played, stream length
○ Add: a business objective, (say) gender
exposure
● Significant gains in business objective
… without loss in user centric metrics
91. Experiments II: Add Competing Objective
● Competing objectives:
○ User interaction objectives: clicks, streams, no.
of songs played, stream length
○ Add: a business objective, (say) gender
exposure
● Significant gains in business objective
… without loss in user centric metrics
Not necessarily a Zero-Sum Game
… perhaps we “can” get gains in business objectives without
loss in user centric objectives
93. Experiments III: Ways of doing Multi-Objective
● Naive multi-objective doesn’t work!
● Proposed multi-objective model
performs better than:
○ Ε-greedy multi-objective
How we do multi-objective ML matters a lot!
94. Summary: Phase III
Multi-objective Models for Marketplaces
- Optimizing for multiple interaction metrics performs better for each metric than
directly optimizing that metric
- Not necessarily a Zero-Sum Game
perhaps we “can” get gains in business objectives without loss in
user centric objectives
- How we do multi-objective ML matters
95. Today’s Talk
Phase I: User-centric RecSys
(Bandit: Explore, Exploit, Explain)
Phase II: Inject one competing objective
(Relevance vs Fairness)
Phase III: Multi-stakeholder
Bandits
User
centric
Multi-
Stakeholder
96.
97. Thank you! Rishabh Mehrotra
Research Scientist, Spotify Research
London, UK
rishabhm@spotify.com