Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits

- Introduction to Marketplaces
- Relevance vs Fairness trade-off
- Multi-objective Contextual Bandits
29 March 2019
Recommendations in a
Marketplace
Rishabh Mehrotra
Research Scientist, Spotify Research
London, UK
rishabhm@spotify.com

•
•
–
–
–
–
–
•
•
•
–
•
•
–
•
•
•
•
•
•
•

Today’s Talk
Phase I: User-centric RecSys
(Bandit: Explore, Exploit, Explain)
Phase II: Inject one competing objective
(Relevance vs Fairness)
Phase III: Multi-stakeholder
Bandits
User
centric
Multi-
Stakeholder

Approaches for RecSys
Collaborative Filtering, i.e. matrix factorization

Collaborative Filtering -- extended, i.e. Tensor factorization
AAAI 2010: Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach

Latent variable models
RecSys 2015: A probabilistic model for using social networks in personalized item recommendation

Neural Embeddings
User Embedding

Neural Embeddings
User Embedding … with Side Information
RecSys 2016: Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation

Neural Embeddings
User Embedding … with Side Information Joint User-Item Embedding
WSDM 2017: Joint Deep Modeling of Users and Items Using Reviews for Recommendation

Neural Collaborative Ranking
WWW 2017: Neural Collaborative Filtering

Variants of Recommendation Styles:
- Short vs long term
- Cold start or cohort based
- Multi-view & multi-interest models
- Mult-task recommendation
SIGIR 2012: Modeling the Impact of Short- and Long-Term Behavior on Search Personalization

- Cold start & cohort based
SIGIR 2014: Cohort Modeling for Enhanced Personalized Search

RecSys 2013: Nonlinear Latent Factorization by Embedding Multiple User Interests

- Multi-task recommendation
KDD 2018: Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks

What do they have in common?

What do they have in common?
User centric focus

Traditional RecSys: User Centric
● User centric nature of systems:
○ Recommendations models catered to users:
■ user needs
■ user interests
■ user behavior & interactions
■ personalization
○ Evaluation approaches for user satisfaction
■ Measuring user engagement
■ Optimizing for user satisfaction
■ User centric metrics
*WSDM 2018 Tutorial on metrics of user engagement; Mounia Lalmas, et al [link]

Two-sided Marketplace
Marketplace: Intermediaries that help facilitate economic interaction between two
or more sets of agents

Two-sided Marketplace
ARTISTS FANS
Marketplace: Intermediaries that help facilitate economic interaction between two
or more sets of agents

Recommendation in 2-sided Marketplace
Stakeholder(s) User
Artists
Advertisers
Campaign(s)
Platform provider

Recommendation in 2-sided Marketplace
Stakeholder(s) User
Artists
Advertisers
Campaign(s)
Platform provider
Metrics
Streams
Engagement levels
Reach / Depth / Retention
Downstreams (saves, artist views)
Other proxies of user satisfaction
Exposure
Audience growth
Revenue
LTV
Diversity

Select an arm (i.e. card)
Recommendation Strategy

Select an arm (i.e. card)
user-centric

User centric ML model is not meant to
optimize for different objectives

Recommendation strategy = ??

f(𝞹1
, 𝞹2
, 𝞹3
, 𝞹4
)

Select an arm (i.e.
card)user-centric

user-centric
artist-centric Spotify economics
Select an arm (i.e.
card)

user-centric
Solution:
find optimal recommendations which
satisfy multiple objectives!

user-centric
Multi-objective Optimization
Aliases:
Multi-objective
Multi-sided
Multi-criteria
Multi-stakeholder
Multi-attribute
Multi-agent

Disclaimer
● Multi-objective ML has been around for decades
● Past work on constrained optimization in industrial setting
○ WWW 2015: Constrained Optimization for Homepage Relevance (LinkedIn)
○ SIGIR 2012: Personalized Click Shaping through Lagrangian Duality for Online Recommendation
○ arXiv 2018: Joint Revenue Optimization at Etsy (Etsy)
○ SIGIR 2018: Turning Clicks into Purchases: Revenue Optimization for Product Search in
E-Commerce (Etsy)
○ KDD 2011: Click Shaping to Optimize Multiple Objectives (Yahoo!)
● Why this talk then?
○ Most past approaches work in Learning to Rank setting
○ Relatively less work in interaction ML or RL, specifically bandit setting

Phase II: Relevance - Fairness trade-off
Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance,
Fairness & Satisfaction in Recommendation Systems
Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, Fernando Diaz
(CIKM 2018)

Pitfalls of User Centric RecSys
Recommendations based predicted relevance results in Superstar Economics
Suppliers would want a fair opportunity to be presented to the users
Blindly optimizing for relevance might have a detrimental impact on supplier
fairness

Research Question:
Relevance ← Satisfaction → Fairness

Key Definitions
Relevance:
We identify a recommendation as relevant if it closely resembles user’s interest
profile (embedding based representation for users & tracks)
User Satisfaction:
Defined as the subjective measure on the utility of recommendations. Rely on
implicit feedback based on behavioral signals (i.e. # tracks played)

Key Definitions
Fairness:
- numerous attempts to define fairness [FAT*’18, ICML’18]
- unlikely that there will be a universal definition appropriate across all applications

Key Definitions
Fairness*:
- numerous attempts to define fairness [FAT*’18, ICML’18]
- unlikely that there will be a universal definition appropriate across all applications
● Statistical bias
● Group fairness
○ Demographic parity
○ Equal Pos Pred. Value
○ Equal Neg Pred. Value
○ Equal False + Rate
○ Equal False - Rate
○ Accuracy equity
● Blindness
● Individual fairness
○ Equal thresholds
○ Similarity metric
● Process fairness (feature rating)
● Diversity (various definitions)
● Representational harms
○ Stereotype mirroring
○ Cross-dataset generalization
○ Bias in representation learning
○ Bias amplification
FAT* 2018 Tutorial: 21 definitions of fairness and their politics [link]
ICML 2018 Tutorial: Defining and Designing Fair Algorithms [link]
#algo-bias Confluence page [link]

Key Definitions
2 1 1 4 0 0
(√2 + √1 + √1) > (√4 + √0 + √0)
Fairness*:
Define group fairness: a set of tracks is fair if it contains tracks from artists that
belong to different groups (i.e. popularity bins/tiers).
*Framework amenable to other interpretations and definitions of fairness
* Representative & Informative Query Selection for Learning to Rank using Submodular Functions
Rishabh Mehrotra, Emine Yilmaz, SIGIR 2015

Recommendation Policies
Policy I: Optimizing Relevance

Policy II: Optimizing Fairness

Policy III: Probabilistic Policy

Policy IV: Trade-off Relevance & Fairness

System designers are wary of negatively impacting user satisfaction
→ avoid showing less relevant content

System designers are wary of negatively impacting user satisfaction
→ avoid showing less relevant content
Policy V: Guaranteed Relevance
System designers are wary of negatively impacting user satisfaction → avoid
showing less relevant content
This policy guarantees relevance to be above a certain threshold

Leverage User Specific Traits?
(i.e. user tolerance)

Conjecture: Users have varying extent of sensitivity towards fair content
● Some users more flexible than others around the distribution of artists
recommended

Conjecture: Users have varying extent of sensitivity towards fair content
● Some users more flexible than others around the distribution of artists
recommended
User Fairness Affinity:
Computed as: difference in user satisfaction when recommended relevant content,
versus when recommended fair content

Policy VI: Adaptive Policy
Extreme case view:
● optimize for relevance for users with negative affinity scores
● optimize for fairness for users with a positive score

Summary of Recommendation Policies
Policy IV: Trade-off Relevance & Fairness
Policy V: Guaranteed Relevance
Policy VI: Adaptive Policy I
Policy VI: Adaptive Policy II

Experiments: Trade-off Analysis
● Optimizing for Fairness hurts
satisfaction
○ 35% decline in SAT
○ Motivate the need for trade-off

Experiments: Trade-off Analysis
● Optimizing for Fairness hurts
satisfaction
○ 35% decline in SAT
○ Motivate the need for trade-off
● Gradual improvement in SAT as we
move from β=0 to β=1
○ 10% lift in SAT for half-way
○ Sharp increase in SAT beyond 0.7
Fairness Relevance

Experiments: Impact of Guarantees
● Guaranteeing relevance helps improve SAT
○ Higher maximum SAT score (0.84 vs 0.64)

Experiments: Incorporating User Tolerance
Adaptive policies fare better than
● Only Fairness & only Relevance
● Interleaved (max SAT 0.65)
○ Over 12% improvement in SAT

Experiments: Incorporating User Tolerance
Adaptive policies fare better than
● Only Fairness & only Relevance
● Interleaved (max SAT 0.65)
○ Over 12% improvement in SAT
Adaptive policies: major gains in Fairness,
without severe losses in Relevance

Experiments: Holistic View
Cost vs Benefit analysis
Compute loss in fairness, loss in relevance & gain in SAT.

Experiments: Holistic View
Cost vs Benefit analysis
Simple interpolation -- no good region (high SAT loss or high fairness loss)
ProbPolicy: balancing with β=0.7 gives best results
Guaranteed R: hurts fairness
Adaptive policy: best overall trade-off

Summary: Phase II
Relevance vs Fairness
- Trading off Relevance ← SAT → Fairness is better than blindly optimizing for
relevance
- User tolerance aware model helps!
- There is benefit in considering objectives beyond just User SAT
Motivates the need for considering multiple stakeholder
objectives beyond just User SAT

Phase III: Multi-objective Models for
Marketplaces
Multi-objective Linear Contextual Bandits via Generalised Gini Function
Niannan Xue, Rishabh Mehrotra, Mounia Lalmas
(under review)

user-centric
artist-centric
business
economics
Select an arm (i.e.
card)
Multi-objective Contextual Bandits

Multi-objective (MO) Contextual Bandits
f(𝞹1
, 𝞹2
, 𝞹3
, 𝞹4
)

Multi-objective Contextual Bandits
f(.): Generalized Gini Index
- Ordered weighted averaging (OWA)
- Respects Pigou-Dalton transfer: prefer allocations that are more equitable

Proposed: Multi-Objective Contextual Bandits
via GGI
● Goal: Find an arm selection strategy
○ probability distribution based on which an arm (i.e. recommendation) is selected

via GGI
○ probability distribution based on which a recommendation is selected
● For a bandit instance at round t, we are given features with

via GGI
● If we choose arm k, we observe linear reward where

via GGI
● If vectorial mean feedback for each arm is known:
○ Find optimal arm via full sweep

via GGI
● If vectorial mean feedback for each arm is known:
○ Find optimal arm via full sweep
● But its not known, its context dependent
○ Optimal policy given by:

Problem setup:
➔ K = Number of arms
➔ D = Number of
objectives
➔ Robustness of the
algorithm
➔ Ridge regression
regularisation
Proposed Multi-Objective Model

Params initialisation:
➔ Uniform strategy
➔ Auxiliary matrices for
analytical solution to
ridge regression

Linear realizability:
➔ Observe all contexts
➔ Estimate mean
rewards
◆ via l2-regularised
least-squares ridge
regression

Online Gradient Descent:
➔ Non-vanishing step
size
➔ Project a[t] back onto A

Action and Update
- Sample arm kt based
on the distribution a[t]
- Observe reward from
user
- Update the model

● Theoretically: Is the regret bounded?
● Regret bounds in past papers
○ ICML 2017: Provably Optimal Algorithms for Generalized Linear Contextual Bandits
■
○ ICML 2013: Thompson Sampling for Contextual Bandits with Linear Payoffs
■
○ NIPS 2011: Improved Algorithms for Linear Stochastic Bandits
■
○ AISTATS 2011: Contextual Bandits with Linear Payoff Functions
● We derive the regret bounds for multi-objective contextual bandits
Is it going to work?

- Sublinear in T (i.e. no. of rounds)
- Increases with robustness
Overall regret bounded by

Experiments I: Multi- vs Single- Objectives
Use-case: all objectives are user interaction based metrics
(no competing business objective yet)
- Clicks
- Stream time
- Business streams
- Total number of songs played

- Clicks
- Stream time
- Business streams
● Optimizing for different objectives impacts other
objectives
○ If you want more clicks, optimize for clicks

- Clicks
- Stream time
- Business streams
objectives
● Multi-objective model performs much better

- Clicks
- Stream time
- Business streams
objectives
● Multi-objective model performs much better
Optimizing for multiple interaction metrics performs better for
each metric than directly optimizing that metric

Experiments II: Add Competing Objective
● Competing objectives:
○ User interaction objectives: clicks, streams,
no. of songs played, stream length
○ Add: a business objective, (say) gender
exposure
● Significant gains in business objective

○ User interaction objectives: clicks, streams, no.
of songs played, stream length
exposure
… without loss in user centric metrics

○ User interaction objectives: clicks, streams, no.
of songs played, stream length
exposure
… without loss in user centric metrics
Not necessarily a Zero-Sum Game
… perhaps we “can” get gains in business objectives without
loss in user centric objectives

Experiments III: Ways of doing Multi-Objective
● Naive multi-objective doesn’t work!
● Proposed multi-objective model
performs better than:
○ Ε-greedy multi-objective

Experiments III: Ways of doing Multi-Objective
● Naive multi-objective doesn’t work!
● Proposed multi-objective model
performs better than:
○ Ε-greedy multi-objective
How we do multi-objective ML matters a lot!

Summary: Phase III
Multi-objective Models for Marketplaces
- Optimizing for multiple interaction metrics performs better for each metric than
directly optimizing that metric
- Not necessarily a Zero-Sum Game
perhaps we “can” get gains in business objectives without loss in
user centric objectives
- How we do multi-objective ML matters

Thank you! Rishabh Mehrotra
Research Scientist, Spotify Research
London, UK
rishabhm@spotify.com

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits

Ähnlich wie Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits (20)

Mehr von MLconf

Mehr von MLconf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits