SlideShare ist ein Scribd-Unternehmen logo
1 von 26
The Big Idea
Universal Recommender
RECOMMENDATIONS
REQUIRED
A LITTLE HISTORY:
MOTIVATION
• Coocurrence: Mahout 2012
• Factorized ALS: Mahout then Spark’s MLlib
• Experience with then current Recommender Tech
• Evaluation and Experiments
• Could only use “purchase” data threw out 100x view data
• No “realtime”
• too many edge cases, users that had no recommendations
• didn’t adapt to metadata/content of items
• Lots of discussions with Ted Dunning, Sean Owen, Sebastian
Schelter, Pat Ferrel (me)
• Cooccurrence and cross-cooccurrence led to many innovations
ANATOMY OF A RECOMMENDATION
PERSONALIZED
r = recommendations
hp = a user’s history of some action
(purchase for instance)
P = the history of all users’ primary action
rows are users, columns are items
(PtP) = compares column to column using
log-likelihood based correlation test
r = (PtP)hp
COOCCURRENCE WITH LLR
• Let’s call (PtP) an indicator matrix for some primary action like
purchase
• Rows = items, columns = items, element =
similarity/correlation score
• The score is row compared to column using a “similarity” or
“correlation” metric
• Log-Likelihood Ratio (LLR) finds important/correlating
cooccurrences and filters out the rest—a major improvement
in quality over simple cooccurrence or other similarity metrics.
• Experiments on real-world data show LLR is significantly
better than other similarity metrics
* http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
LLR AND SIMILARITY METRICS
PRECISION (MAP@K)
Higher is better
MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10
Similarity Metrics
Mean Average Precision
Mahout Cooccurrence Recommender with E-Commerce Data
Cosine Tanimoto Log-likelihood
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
• That’s exactly what a search engine
does!
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !
USER HISTORY + COOCCURRENCES
+ SEARCH = RECOMMENDATIONS
• The final calculation uses hp as the query on the Cooccurrence
Matrix (PtP), returns a ranked set of items
• Query is a “similarity” query, not relational or key based fetch
• Uses Search Engine as Cosine-based K-Nearest Neighbor
(KNN) Engine with norms and TF-IDF weighting
• Highly optimized for serving these queries in realtime
• Several (Solr, Elasticsearch) have High Availability, massively
scalable clustered auto-sharding features like the best of
NoSQL DBs.
r = (PtP)hp
THE UNIVERSAL RECOMMENDER:
THE BREAKTHROUGH IDEA
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …
THE UNIVERSAL RECOMMENDER:
CORRELATED CROSS-OCCURRENCE
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
CROSS-OCCURRENCE
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …
• Comparing the history of the primary action to other actions finds
actions that lead to the one you want to recommend
• Given strong data about user preferences on a general population
we can also use
• items clicked
• terms searched
• categories viewed
• items shared
• people followed
• items disliked (yes dislikes may predict likes)
• location
• device preference
• gender
• age bracket
• Virtually any anything we know about the population can be
tested for correlation and used to predict a particular users
preferences
CORRELATED CROSS-OCCURRENCE:
SO WHAT?
CORRELATED CROSS-OCCURRENCE;
ADDING CONTENT MODELS
• Collaborative Topic Filtering
• Use Latent Dirichlet Allocation (LDA) to model topics directly from the
textual content
• Calculate based on Word2Vec type word vectors instead of bag-of-
words analysis to boost quality
• Create cross-occurrence indicators from topics the user has preferred
• Repeat periodically
• Entity Preferences:
• Use a Named Entity Recognition (NER) system to find entities in
textual content
• Create cross-occurrence indicators for these entities
• Entities and Topics are long lived and richly describe user
interests, these are very good for use in the Universal
Recommender.
THE UNIVERSAL RECOMMENDER
ADDING CONTENT-BASED RECS
Indicators can also be based on content
similarity
(TTt) is a calculation that compares every 2
documents to each other and finds the most
similar—based upon content alone
r = (TTt)ht + l*L …
INDICATOR TYPES
• Cooccurrence
• Find the best indicator of a user preference for the item type to be recommended: examples are “buy”,
“read”, “video_watch”, “share”, “follow”, “like”.
• Cross-occurrence
• Item metadata as “user” preference, for example: treat item category as a user category-preferences
• Calculated from user actions on any data that may give information about user— category-preferences,
search terms, gender, location
• Create with Mahout-Samsara SimilarityAnalysis.cooccurrence
• Content or metadata
• Content text, tags, categories, description text, anything describing an item
• Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity
• Intrinsic
• Popularity rank, geo-location, anything describing an item
• Some may be derived from usage data like popularity rank, or hotness
• Is a known or specially calculated property of the item
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all indicators at once
Unified query:
purchase-correlator: users-history-of-purchases
view-correlator: users-history-of-views
category-correlator: users-history-of-categories-viewed
tags-correlator: users-history-of-purchases
geo-location-correlator: users-location
…
r = (PtP)hp + (PtV)hv + (PtC)hc + …
(TTt)ht + l*L …
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all correlators at once
Once indicators are indexed as search fields this entire
equation is a single query
Fast!
r = (PtP)hp + (PtV)hv + (PtC)hc + …
(TTt)ht + l*L …
THE UNIVERSAL RECOMMENDER:
BETTER USER COVERAGE
• Any number of user actions—entire user clickstream
• Metadata—from user profile or items
• Context—on-site, time, location
• Content—unstructured text or semi-structured
categorical
• Mixes any number of “indicators” to increase quality
or tune to specific context
• Solution to the “cold-start” problem—items with too
short a lifespan or new users with no history
• Can recommend to new users using
realtime history
• Can use new interaction data from
any user in realtime
• 95% implemented in Universal Recommender
v0.3.0—most current release
All Users
Universal Recommender
ALS or 1-action
Recommenders
POLISH THE APPLE
• Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon they become
part of the new training data and are more likely to be recommended
in the future
• Visibility control:
• Don’t show dups, blacklist items already shown
• Filter items the user has already seen
• Zero-downtime Deployment: deploy prediction server
once then hot-swap new index when ready.
• Generate some intrinsic indicators like hot, popular—
helps solve the “cold-start” problem
• Asymmetric train vs query—query with most recent user
data, train on all historical data
Architecture Based on
PredictionIO
Universal Recommender
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
background
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
realtime
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
backgroundREALTIME
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
events
&
item
metadata
RECOMMENDATION SERVING
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Spark-Mahout’s
Correlation Engine
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
BACKGROUNDREALTIME
Appendix
TECH STACK
• Hbase 1.X
• Postgres, MySQL, or other JDBC possible
• Spark 1.6.X
• Fast, massively scalable, seems like the “winner”
• HDFS 2.6—Hadoop Distributed File System
• Reiable, massively scalable, the defacto standard
• Spray
• Supplies REST endpoints, muti-threaded via Akka actors
• Elasticsearch 1.7.X or 2.X
• Reliable, massively scalable, fast
• Scala & Java 8
• Fits functional and oop programming style for productivity
• Stable, Scalable, High Availability, Well Supported
* The ES json query looks like this:
* {
* "size": 20
* "query": {
* "bool": {
* "should": [
* {
* "terms": {
* "rate": ["0", "67", "4"]
* }
* },
* {
* "terms": {
* "buy": ["0", "32"],
* "boost": 2
* }
* },
* { // categorical boosts
* "terms": {
* "category": ["cat1"],
* "boost": 1.05
* }
* }
* ],
* "must": [ // categorical filters
* {
* "terms": {
* "category": ["cat1"],
* "boost": 0
* }
* },
* {
* "must_not": [//blacklisted items
* {
* "ids": {
* "values": ["items-id1", "item-id2", ...]
* }
* },
* {
* "constant_score": {// date in query must fall between the expire and avqilable dates of an item
* "filter": {
* "range": {
* "availabledate": {
* "lte": "2015-08-30T12:24:41-07:00"
* }
* }
* },
* "boost": 0
* }
* },
* {
* "constant_score": {// date range filter in query must be between these item property values
* "filter": {
* "range" : {
* "expiredate" : {
* "gte": "2015-08-15T11:28:45.114-07:00"
* "lt": "2015-08-20T11:28:45.114-07:00"
* }
* }
* }, "boost": 0
* }
* },
* {
* "constant_score": { // this orders popular items for backfill
* "filter": {
* "match_all": {}
* },
* "boost": 0.000001 // must have as least a small number to be boostable
* }
* }
* }
* }
* }
*
An example Elasticsearch query on a multi-
field index created from the output of the CCO
engine. The index includes about 90% of the
data in the “whole enchilada” equation.
This executes in 50ms on a non-cached
cluster and ~26ms on an unoptimized cluster.

Weitere ähnliche Inhalte

Was ist angesagt?

분석 현장에서 요구되는 데이터과학자의 역량과 자질
분석 현장에서 요구되는 데이터과학자의 역량과 자질분석 현장에서 요구되는 데이터과학자의 역량과 자질
분석 현장에서 요구되는 데이터과학자의 역량과 자질Sun Young Kim
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox Tsahi Glik
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템NAVER D2
 

Was ist angesagt? (20)

Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
분석 현장에서 요구되는 데이터과학자의 역량과 자질
분석 현장에서 요구되는 데이터과학자의 역량과 자질분석 현장에서 요구되는 데이터과학자의 역량과 자질
분석 현장에서 요구되는 데이터과학자의 역량과 자질
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템
 
Recommender system
Recommender systemRecommender system
Recommender system
 

Ähnlich wie The Universal Recommender

Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbaiTejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in puneprathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabadprathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science coursesprathyusha1234
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Dataconomy Media
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems - Yousef Fadila
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issueNutanBhor
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket datamniranjanmurthy
 
Quick introduction to the click-through filter
Quick introduction to the click-through filterQuick introduction to the click-through filter
Quick introduction to the click-through filterpontneo
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshellKonstantin Savenkov
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation systemAkashPatil334
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation SystemsRumman Chowdhury
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 

Ähnlich wie The Universal Recommender (20)

Discovery
DiscoveryDiscovery
Discovery
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issue
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
 
Quick introduction to the click-through filter
Quick introduction to the click-through filterQuick introduction to the click-through filter
Quick introduction to the click-through filter
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 

Kürzlich hochgeladen

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 

Kürzlich hochgeladen (20)

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 

The Universal Recommender

  • 3. A LITTLE HISTORY: MOTIVATION • Coocurrence: Mahout 2012 • Factorized ALS: Mahout then Spark’s MLlib • Experience with then current Recommender Tech • Evaluation and Experiments • Could only use “purchase” data threw out 100x view data • No “realtime” • too many edge cases, users that had no recommendations • didn’t adapt to metadata/content of items • Lots of discussions with Ted Dunning, Sean Owen, Sebastian Schelter, Pat Ferrel (me) • Cooccurrence and cross-cooccurrence led to many innovations
  • 4. ANATOMY OF A RECOMMENDATION PERSONALIZED r = recommendations hp = a user’s history of some action (purchase for instance) P = the history of all users’ primary action rows are users, columns are items (PtP) = compares column to column using log-likelihood based correlation test r = (PtP)hp
  • 5. COOCCURRENCE WITH LLR • Let’s call (PtP) an indicator matrix for some primary action like purchase • Rows = items, columns = items, element = similarity/correlation score • The score is row compared to column using a “similarity” or “correlation” metric • Log-Likelihood Ratio (LLR) finds important/correlating cooccurrences and filters out the rest—a major improvement in quality over simple cooccurrence or other similarity metrics. • Experiments on real-world data show LLR is significantly better than other similarity metrics * http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
  • 6. LLR AND SIMILARITY METRICS PRECISION (MAP@K) Higher is better MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10 Similarity Metrics Mean Average Precision Mahout Cooccurrence Recommender with E-Commerce Data Cosine Tanimoto Log-likelihood
  • 7. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  • 8. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? • That’s exactly what a search engine does! r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  • 9. USER HISTORY + COOCCURRENCES + SEARCH = RECOMMENDATIONS • The final calculation uses hp as the query on the Cooccurrence Matrix (PtP), returns a ranked set of items • Query is a “similarity” query, not relational or key based fetch • Uses Search Engine as Cosine-based K-Nearest Neighbor (KNN) Engine with norms and TF-IDF weighting • Highly optimized for serving these queries in realtime • Several (Solr, Elasticsearch) have High Availability, massively scalable clustered auto-sharding features like the best of NoSQL DBs. r = (PtP)hp
  • 10. THE UNIVERSAL RECOMMENDER: THE BREAKTHROUGH IDEA • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  • 11. THE UNIVERSAL RECOMMENDER: CORRELATED CROSS-OCCURRENCE • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… CROSS-OCCURRENCE r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  • 12. • Comparing the history of the primary action to other actions finds actions that lead to the one you want to recommend • Given strong data about user preferences on a general population we can also use • items clicked • terms searched • categories viewed • items shared • people followed • items disliked (yes dislikes may predict likes) • location • device preference • gender • age bracket • Virtually any anything we know about the population can be tested for correlation and used to predict a particular users preferences CORRELATED CROSS-OCCURRENCE: SO WHAT?
  • 13. CORRELATED CROSS-OCCURRENCE; ADDING CONTENT MODELS • Collaborative Topic Filtering • Use Latent Dirichlet Allocation (LDA) to model topics directly from the textual content • Calculate based on Word2Vec type word vectors instead of bag-of- words analysis to boost quality • Create cross-occurrence indicators from topics the user has preferred • Repeat periodically • Entity Preferences: • Use a Named Entity Recognition (NER) system to find entities in textual content • Create cross-occurrence indicators for these entities • Entities and Topics are long lived and richly describe user interests, these are very good for use in the Universal Recommender.
  • 14. THE UNIVERSAL RECOMMENDER ADDING CONTENT-BASED RECS Indicators can also be based on content similarity (TTt) is a calculation that compares every 2 documents to each other and finds the most similar—based upon content alone r = (TTt)ht + l*L …
  • 15. INDICATOR TYPES • Cooccurrence • Find the best indicator of a user preference for the item type to be recommended: examples are “buy”, “read”, “video_watch”, “share”, “follow”, “like”. • Cross-occurrence • Item metadata as “user” preference, for example: treat item category as a user category-preferences • Calculated from user actions on any data that may give information about user— category-preferences, search terms, gender, location • Create with Mahout-Samsara SimilarityAnalysis.cooccurrence • Content or metadata • Content text, tags, categories, description text, anything describing an item • Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity • Intrinsic • Popularity rank, geo-location, anything describing an item • Some may be derived from usage data like popularity rank, or hotness • Is a known or specially calculated property of the item
  • 16. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all indicators at once Unified query: purchase-correlator: users-history-of-purchases view-correlator: users-history-of-views category-correlator: users-history-of-categories-viewed tags-correlator: users-history-of-purchases geo-location-correlator: users-location … r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  • 17. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all correlators at once Once indicators are indexed as search fields this entire equation is a single query Fast! r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  • 18. THE UNIVERSAL RECOMMENDER: BETTER USER COVERAGE • Any number of user actions—entire user clickstream • Metadata—from user profile or items • Context—on-site, time, location • Content—unstructured text or semi-structured categorical • Mixes any number of “indicators” to increase quality or tune to specific context • Solution to the “cold-start” problem—items with too short a lifespan or new users with no history • Can recommend to new users using realtime history • Can use new interaction data from any user in realtime • 95% implemented in Universal Recommender v0.3.0—most current release All Users Universal Recommender ALS or 1-action Recommenders
  • 19. POLISH THE APPLE • Dithering for auto-optimize via explore-exploit: Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future • Visibility control: • Don’t show dups, blacklist items already shown • Filter items the user has already seen • Zero-downtime Deployment: deploy prediction server once then hot-swap new index when ready. • Generate some intrinsic indicators like hot, popular— helps solve the “cold-start” problem • Asymmetric train vs query—query with most recent user data, train on all historical data
  • 21. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION background events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties realtime RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  • 22. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties backgroundREALTIME RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  • 23. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations events & item metadata RECOMMENDATION SERVING PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Spark-Mahout’s Correlation Engine Elasticsearch Spark MODEL UPDATE HBase user history itemProperties BACKGROUNDREALTIME
  • 25. TECH STACK • Hbase 1.X • Postgres, MySQL, or other JDBC possible • Spark 1.6.X • Fast, massively scalable, seems like the “winner” • HDFS 2.6—Hadoop Distributed File System • Reiable, massively scalable, the defacto standard • Spray • Supplies REST endpoints, muti-threaded via Akka actors • Elasticsearch 1.7.X or 2.X • Reliable, massively scalable, fast • Scala & Java 8 • Fits functional and oop programming style for productivity • Stable, Scalable, High Availability, Well Supported
  • 26. * The ES json query looks like this: * { * "size": 20 * "query": { * "bool": { * "should": [ * { * "terms": { * "rate": ["0", "67", "4"] * } * }, * { * "terms": { * "buy": ["0", "32"], * "boost": 2 * } * }, * { // categorical boosts * "terms": { * "category": ["cat1"], * "boost": 1.05 * } * } * ], * "must": [ // categorical filters * { * "terms": { * "category": ["cat1"], * "boost": 0 * } * }, * { * "must_not": [//blacklisted items * { * "ids": { * "values": ["items-id1", "item-id2", ...] * } * }, * { * "constant_score": {// date in query must fall between the expire and avqilable dates of an item * "filter": { * "range": { * "availabledate": { * "lte": "2015-08-30T12:24:41-07:00" * } * } * }, * "boost": 0 * } * }, * { * "constant_score": {// date range filter in query must be between these item property values * "filter": { * "range" : { * "expiredate" : { * "gte": "2015-08-15T11:28:45.114-07:00" * "lt": "2015-08-20T11:28:45.114-07:00" * } * } * }, "boost": 0 * } * }, * { * "constant_score": { // this orders popular items for backfill * "filter": { * "match_all": {} * }, * "boost": 0.000001 // must have as least a small number to be boostable * } * } * } * } * } * An example Elasticsearch query on a multi- field index created from the output of the CCO engine. The index includes about 90% of the data in the “whole enchilada” equation. This executes in 50ms on a non-cached cluster and ~26ms on an unoptimized cluster.