In this presentation, we survey latent models, starting with shallow and progressing towards deep, as applied to personalization and recommendations. After providing an overview of the Netflix recommender system, we discuss research at the intersection of deep learning, natural language processing and recommender systems and how they relate to traditional collaborative filtering techniques. We will present case studies in the space of deep latent variable models applied to recommender systems.
Shallow and Deep Latent Models for Recommender System
1. Shallow & Deep Latent Models
for Recommender Systems
Anoop Deoras, Dawen Liang
PRS Workshop, Netflix
06/08/2018
@adeoras, @dawen_liang
2. ● Personalization and Recommendations at Netflix
● Discuss evolution of latent models in the Recommender System space
● Showcase some experimental results and interesting findings
● Take away points
Theme of the talk
3. ● Recommendation Systems are means to an end.
● Our primary goal:
○ Maximize Netflix member’s enjoyment of the selected show
■ Enjoyment integrated over time
○ Minimize the time it takes to find them
■ Interaction cost integrated over time
Personalization
● Personalization
● How ?
4. Ordering of the titles in each row is personalized
From what shows to recommend
7. Personalization
● When the catalog size is very large, recommendations are the only saving grace.
● A good Recommender Systems should consider:
○ What is recommended
○ How it is recommended
○ When it is recommended
○ Where it is recommended
8. Personalization
● We try to model
○ User’s taste
○ Context
■ Time
■ Device
■ Country
■ Language
■ …
○ Difference in local tastes
■ What is popular in US may not be popular in India
■ Not available != Not Popular
9. Personalization
● We try to model
○ User’s taste
○ Context
■ Time
■ Device
■ Country
■ Language
■ …
○ Difference in local tastes
■ What is popular in US may not be popular in India
■ Not available != Not Popular
20. ● Commonly used in Language Models and Economics
● Close proxy to the top-N ranking loss
○ The likelihood (cross-entropy) rewards the model for putting probability
mass on the non-zero entries
○ The items have to compete for limited budget ( since )
● Effectively ranking non-zero entries higher
Why Multinomial?
21. Why VAEs (or rather, Bayesian)?
● Generalized linear latent factor models :
○ Recover LDA as a special linear case
● No ‘Fold-In’ necessary
○ Only evaluate inference and generative functions (amortized inference)
● Per user, RecSys is more of a “small data” than a “big data” problem
23. Neural Multi Class Models
play (t-n)
...
play (t-1)
cntxt
Soft-max over entire
vocabulary
play
(t-n)...
play
(t-1)cntxt
Soft-max over entire
vocabulary
N-GRAM BoW-n
Feed
Forward User,Cntxt
P(next-video | <user, cntxt>)
24. Neural Multi Class Models
play
(t-1)
cntxt
Soft-max over entire
vocabulary
state
(t-1)
RNN Family
play
(t-2)
...
play
(t-1)
Soft-max over entire
vocabulary
cntxt
play
(t-4)play
(t-3)
play
(t-n)play
(t-n+1)
CNN Family
state
(t)
Recurrent
Convolutn
P(next-video | <user, cntxt>)
25. Why Conditional Models ?
● Maximizes the likelihood of user playing the next play ‘directly’
● No ‘Fold-In’ necessary
○ Only need to evaluate forward graph
● Enables encoding of temporal and sequential information seamlessly
● Rich literature around model adaptation and bootstrapping
28. Interpreting a CNN CF Model
● Deeper CNN layers have discovered higher level features in images:
○ Edges
○ Faces etc
● What would a CNN learn if it is trained on user-item interaction dataset?
○ Can it discover semantic topics ?
29. Interpreting a CNN CF Model
HorroR Filter
Kids Filter
Narcotics Filter
Thanks to Ko-Jen Hsiao for the CNN viz
31. Take Away Points
● Shallow models
○ Presented a unified view of various latent factor models
○ Discussed limited modeling capacity ⇒ inferior prediction power
● Deep models
○ Encoding of rich nonlinear user item interaction ⇒ superior prediction power
○ Discussed how VAEs can be thought of as non linear LDA
○ Showcased how ‘Next Play models’ model directly the task at hand