SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
| 0
Maya Hristakeva (@mayahhf)
Beyond Collaborative Filtering:
Learning to Rank Research Articles
8th November 2018
| 1
The Team
Data Science, Engineering & Product
| 2
What do we do?
| 3
| 4
Our Users
We combine content and data with analytics and technology to help:
RESEARCHERS
to make new discoveries and
have more impact on society
CLINICIANS
to treat patients better
and save more lives
NURSES
throughout their careers
and to help save lives
| 5
Researcher’s Journey
Help me
stay up to
date
Help me
showcase my
work
Help me
organise my
writing
Help me
make peer review more
rewarding
Help me
publish faster
Help me
manage research
data
Help me
with my editorial decisionsHelp me
connect with the right
people
Help me
secure funding
Help me
read and evaluate
articles
| 6
Being the best researcher you can be!
• Good researchers are on top of their game
• Large amount of research produced
• Takes time to get what you need
• Help researchers by recommending relevant content
| 7
Recommenders @ Elsevier
| 8
Mendeley Suggest – personalized article and people
recommenders
| 9
Science Direct – personalized and related article
recommenders
| 10
Mendeley Funding & Institutional Recommenders
| 11
Science Direct Related Articles
| 12
ScienceDirect – related article recommender
• Scientific publication
database
• 15 million articles
• 14 million monthly visitors
| 13
Science Direct V1 Recommender
• Goal
- Present users with related articles based on the article they are
reading
• Start simple & iterate
- Browsing logs to generate item-to-item CF recommendations
- Article content as business logic filtering based on recency, article
type
| 14
Item-based kNN Collaborative Filtering
Recommend articles that are similar to the ones you browsed
- Similarity is based on article co-occurrences in users’ browsing sessions
- “Users who read x also read y”
Identify similar articles using cosine similarity: cos $%, $' =
)*×),
)* × ),
Why we use it?
- Gives good results
- Scales relatively well
- Relatively simple to implement
| 15
Evaluation: Session prediction task
• Article browsing logs:
• Predict what users would browse next
• Time-split evaluation
< "#""$%&'(, *+,$-.#'(, *--#""/$0# >
Train model Query
Ground
truth
Time, user interactions
Test
| 16
CF & Significance Weighting
• Scale down cosine similarity with significance weighting
• Preference is given to high co-occurrence neighbors
- k – min # sessions in common to get original cosine similarity
• Alternative – minimum co-occurrence threshold
- Significantly reduces the catalogue coverage
score &', &) = min 1,
|0'⋂0)|
2
x 345678(&', &))
| 17
Other CF Improvements
• Min/max filters for # articles per user-session & # users-sessions per article
• ~ 12 months of browsing logs
- gives good coverage
- removes cyclical nature of academic year
- focuses on more “current” interactions
• Bias for recent activity using time decay functions (e.g. exponential)
• Using article content as business logic filters for recency and article types
| 18
Collaborative Filtering in production
Recommendations
per article
IBCF
Article
views/downloads
| 19
Can we do any better?
| 20
A Wealth of Data
• Usage Data
- Logged-in activity
- Alt-metrics,
popularity, trending
• Social Features
- User profiles
- Social network
- Collaboration groups
• > 60 million records: journals, conferences,
books, patents …
• The most accurate and complete citation & co-
author graphs
• Reputation metrics for articles, authors and
journals
• > 15 million full text articles
• Article browsing logs
• Recommender impression and click logs!!!
| 21
Learning to Rank (LtR)
CF
candidates
Enriched
candidates
Re-ranked recs
Features
LtR
model
Use CF as candidate selection
Enrich with item and user features
Re-rank results based on learnt model optimised for CtR
| 22
LtR Features
Reputation &
Alt-Metrics Text
Topics
Temporal
Images: wsj, alamy, bookedelic
CF similarity
score &', &) = min 1,
|0'⋂0)|
2
x 345678(&', &))
Citation Network
| 23
LtR Models
• Set of labelled query documents and their associated recommended
documents with feature vectors and relevance judgements
• Different optimization objectives – point-wise, pair-wise & list-wise
• RankLib java-based LtR package
- RankNet – pair-wise neural network algorithm
- LambdaRank – extension of RankNet optimizing list-wise IR metrics such as
NDCG
- LambdaMART – list-wise approach combining LambdaRank and MART
< "#$%&'()*, %$)'(),*-ℎ/$0-#%$12, %$34)(%$*2 >
| 24
Recommender Logs
LtR requires labelled training data that represents user preferences
in relation to the recommendation lists
Recommender Logs
- Impressions – recs shown to the user
- Clicks & conversions – recs the user engaged with
- Timestamp – when the event happened
- Page-load ID – groups recs that were shown at the same time
| 25
Training data for LtR Models
• Query-recommendations pairs with relevance labels inferred from
recommender logs
• For each query article
- Aggregate the recommended articles across all user sessions
- Count # impressions & clicks for each recommendation
- Compute graded relevance scores based on CTR
| 26
Explore/Exploit via Dithering
Slightly shuffle the list of recommendations
• Allows for the exploration of the list
• Gives the impression of freshness
• Reduces some of the bias in LtR training data
!"#$%&'()*+*& = log $012 + 4 0, log 7
where < =
∆ $012
$012
and tipically < ∈ [1.5,2]
| 27
Evaluation: Click prediction task
• Data:
• Rank higher the recommendations users engage with
• Time-split evaluation
< "#$%&'(&)$*+%,-, &%*(&)$*+%/$)ℎ1%2)#&%3, &%+425%+ >
Train model
Validation
Set
Test
Set
Time, user interactions
| 28
Results
• LtR improved the quality of recommendations
- 9-10% improvement in user engagement
- Winner is LambdaMART - GBDT with list-wise optimization
• LtR increased journal diversity in recommendation lists
• LtR promotes recently published articles in the last year
• Best ranking model combines usage data with rich structured
network and meta data
| 29
Offline evaluation should match the online challenge
• Candidate generation – Collaborative Filtering – session prediction task
• Re-ranking candidates – Learning-to-Rank – click prediction task
| 30
LtR in Production
LtR
rescoringIBCF
Recommendation
clicks
Training data
LtR
model
Article
views/downloads
| 31
Next Steps & Future Directions
| 32
Alternative Approaches
Graph-based approaches
- Random walks for candidate generation
Deep Learning
- Learn more complex features for LtR
- Neural embeddings for candidate
generation
- Hybrid systems for ranking
| 33
Evaluation – correcting for bias & confounding
• Algorithm confounding
- How algorithmic confounding in recommendation systems increases homogeneity and
decreases utility. Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt
(RecSys '18).
• Explore/exploit – multi-armed bandits
- Explore, exploit, and explain: personalizing explainable recommendations with bandits.
James McInerney, et al. (RecSys ‘18).
• Counterfactuals
- Counterfactual reasoning and learning systems: The example of computational advertising.
Bottou, Léon, et al. (JMLR 2013).
| 34
Qualitative & Quantitative Evaluation
https://github.com/jeanigarcia/recsys2018-evaluation-tutorial
| 35
Challenges
| 36
Recommender Team Publications
Hristakeva, M., Kershaw, D., Pettit, B., Vargas, S., & Jack, K. (2019). Academic recommendations:
The Mendeley case. In Collaborative Recommendations: Algorithms, Practical Challenges and
Applications.
Pettit, B., Hristakeva, M., Kershaw, D. & Jack, K. (2018). Learning to Rank Research Articles: A case
study of collaborative filtering and learning to rank in Science Direct.
Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building
recommender systems for scholarly information. WSDM2017.
Rossetti, M., Vargas, S., Pettit, B., Kershaw, D., Hristakeva, M., & Jack, K. (2017). Effectively
identifying users’ research interests for scholarly reference management and discovery. WSDM2017.
Vargas, S., Hristakeva, M., & Jack, K. (2016). Mendeley: Recommendations for
Researchers. RecSys ’16
| 37
References
From RankNet to LambdaRank to LambdaMART: An Overview (2010). Christopher J. C. Burges.
On Application of Learning to Rank for E-Commerce Search by Shubhra Kanti Karmaker Santu,
Parikshit Sondhi, and ChengXiang Zhai (SIGIR 2017).
Recommender Systems Handbook (2010). Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul
B. Kantor.
Practical Machine Learning: Innovations in Recommendation (2014).
Ted Dunning and Ellen Friedman. O'Reilly Media, Inc.
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time by Chantat
Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark
Ulrich, and Jure Leskovec (WWW 2018).
Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks by
Joan Serrà and Alexandros Karatzoglou (RecSys 2017)
We're hiring, come speak to
us!
https://www.elsevier.com/about/careers/technology-careers
| 39
www.elsevier.com/rd-solutions
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Getting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring SuccessGetting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring Successkramsey
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringIRJET Journal
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trendsKU Leuven
 
An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...TELKOMNIKA JOURNAL
 
Scholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentScholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentOCLC Research
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...ijtsrd
 
A Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid ComputingA Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid Computingiosrjce
 

Was ist angesagt? (8)

Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Getting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring SuccessGetting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring Success
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
 
An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...
 
Scholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentScholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online Environment
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
 
A Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid ComputingA Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid Computing
 

Ähnlich wie Learning to Rank Research Articles for Personalized Recommendations

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Charalampos Chelmis
 
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...IRJET Journal
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsThe University of Edinburgh
 
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...South Tyrol Free Software Conference
 
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsBibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsGESIS
 
Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectDaniel Kershaw
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...ResearchSpace
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt doPonnuthuraiSelvaraj1
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...IRJET Journal
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
 
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALcscpconf
 
Contextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portalContextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portalcsandit
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 

Ähnlich wie Learning to Rank Research Articles for Personalized Recommendations (20)

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
PhD defense
PhD defense PhD defense
PhD defense
 
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
 
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsBibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
 
Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science Direct
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
 
8th sem (1)
8th sem (1)8th sem (1)
8th sem (1)
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
 
Contextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portalContextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portal
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 

Kürzlich hochgeladen

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Kürzlich hochgeladen (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

Learning to Rank Research Articles for Personalized Recommendations

  • 1. | 0 Maya Hristakeva (@mayahhf) Beyond Collaborative Filtering: Learning to Rank Research Articles 8th November 2018
  • 2. | 1 The Team Data Science, Engineering & Product
  • 3. | 2 What do we do?
  • 4. | 3
  • 5. | 4 Our Users We combine content and data with analytics and technology to help: RESEARCHERS to make new discoveries and have more impact on society CLINICIANS to treat patients better and save more lives NURSES throughout their careers and to help save lives
  • 6. | 5 Researcher’s Journey Help me stay up to date Help me showcase my work Help me organise my writing Help me make peer review more rewarding Help me publish faster Help me manage research data Help me with my editorial decisionsHelp me connect with the right people Help me secure funding Help me read and evaluate articles
  • 7. | 6 Being the best researcher you can be! • Good researchers are on top of their game • Large amount of research produced • Takes time to get what you need • Help researchers by recommending relevant content
  • 9. | 8 Mendeley Suggest – personalized article and people recommenders
  • 10. | 9 Science Direct – personalized and related article recommenders
  • 11. | 10 Mendeley Funding & Institutional Recommenders
  • 12. | 11 Science Direct Related Articles
  • 13. | 12 ScienceDirect – related article recommender • Scientific publication database • 15 million articles • 14 million monthly visitors
  • 14. | 13 Science Direct V1 Recommender • Goal - Present users with related articles based on the article they are reading • Start simple & iterate - Browsing logs to generate item-to-item CF recommendations - Article content as business logic filtering based on recency, article type
  • 15. | 14 Item-based kNN Collaborative Filtering Recommend articles that are similar to the ones you browsed - Similarity is based on article co-occurrences in users’ browsing sessions - “Users who read x also read y” Identify similar articles using cosine similarity: cos $%, $' = )*×), )* × ), Why we use it? - Gives good results - Scales relatively well - Relatively simple to implement
  • 16. | 15 Evaluation: Session prediction task • Article browsing logs: • Predict what users would browse next • Time-split evaluation < "#""$%&'(, *+,$-.#'(, *--#""/$0# > Train model Query Ground truth Time, user interactions Test
  • 17. | 16 CF & Significance Weighting • Scale down cosine similarity with significance weighting • Preference is given to high co-occurrence neighbors - k – min # sessions in common to get original cosine similarity • Alternative – minimum co-occurrence threshold - Significantly reduces the catalogue coverage score &', &) = min 1, |0'⋂0)| 2 x 345678(&', &))
  • 18. | 17 Other CF Improvements • Min/max filters for # articles per user-session & # users-sessions per article • ~ 12 months of browsing logs - gives good coverage - removes cyclical nature of academic year - focuses on more “current” interactions • Bias for recent activity using time decay functions (e.g. exponential) • Using article content as business logic filters for recency and article types
  • 19. | 18 Collaborative Filtering in production Recommendations per article IBCF Article views/downloads
  • 20. | 19 Can we do any better?
  • 21. | 20 A Wealth of Data • Usage Data - Logged-in activity - Alt-metrics, popularity, trending • Social Features - User profiles - Social network - Collaboration groups • > 60 million records: journals, conferences, books, patents … • The most accurate and complete citation & co- author graphs • Reputation metrics for articles, authors and journals • > 15 million full text articles • Article browsing logs • Recommender impression and click logs!!!
  • 22. | 21 Learning to Rank (LtR) CF candidates Enriched candidates Re-ranked recs Features LtR model Use CF as candidate selection Enrich with item and user features Re-rank results based on learnt model optimised for CtR
  • 23. | 22 LtR Features Reputation & Alt-Metrics Text Topics Temporal Images: wsj, alamy, bookedelic CF similarity score &', &) = min 1, |0'⋂0)| 2 x 345678(&', &)) Citation Network
  • 24. | 23 LtR Models • Set of labelled query documents and their associated recommended documents with feature vectors and relevance judgements • Different optimization objectives – point-wise, pair-wise & list-wise • RankLib java-based LtR package - RankNet – pair-wise neural network algorithm - LambdaRank – extension of RankNet optimizing list-wise IR metrics such as NDCG - LambdaMART – list-wise approach combining LambdaRank and MART < "#$%&'()*, %$)'(),*-ℎ/$0-#%$12, %$34)(%$*2 >
  • 25. | 24 Recommender Logs LtR requires labelled training data that represents user preferences in relation to the recommendation lists Recommender Logs - Impressions – recs shown to the user - Clicks & conversions – recs the user engaged with - Timestamp – when the event happened - Page-load ID – groups recs that were shown at the same time
  • 26. | 25 Training data for LtR Models • Query-recommendations pairs with relevance labels inferred from recommender logs • For each query article - Aggregate the recommended articles across all user sessions - Count # impressions & clicks for each recommendation - Compute graded relevance scores based on CTR
  • 27. | 26 Explore/Exploit via Dithering Slightly shuffle the list of recommendations • Allows for the exploration of the list • Gives the impression of freshness • Reduces some of the bias in LtR training data !"#$%&'()*+*& = log $012 + 4 0, log 7 where < = ∆ $012 $012 and tipically < ∈ [1.5,2]
  • 28. | 27 Evaluation: Click prediction task • Data: • Rank higher the recommendations users engage with • Time-split evaluation < "#$%&'(&)$*+%,-, &%*(&)$*+%/$)ℎ1%2)#&%3, &%+425%+ > Train model Validation Set Test Set Time, user interactions
  • 29. | 28 Results • LtR improved the quality of recommendations - 9-10% improvement in user engagement - Winner is LambdaMART - GBDT with list-wise optimization • LtR increased journal diversity in recommendation lists • LtR promotes recently published articles in the last year • Best ranking model combines usage data with rich structured network and meta data
  • 30. | 29 Offline evaluation should match the online challenge • Candidate generation – Collaborative Filtering – session prediction task • Re-ranking candidates – Learning-to-Rank – click prediction task
  • 31. | 30 LtR in Production LtR rescoringIBCF Recommendation clicks Training data LtR model Article views/downloads
  • 32. | 31 Next Steps & Future Directions
  • 33. | 32 Alternative Approaches Graph-based approaches - Random walks for candidate generation Deep Learning - Learn more complex features for LtR - Neural embeddings for candidate generation - Hybrid systems for ranking
  • 34. | 33 Evaluation – correcting for bias & confounding • Algorithm confounding - How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt (RecSys '18). • Explore/exploit – multi-armed bandits - Explore, exploit, and explain: personalizing explainable recommendations with bandits. James McInerney, et al. (RecSys ‘18). • Counterfactuals - Counterfactual reasoning and learning systems: The example of computational advertising. Bottou, Léon, et al. (JMLR 2013).
  • 35. | 34 Qualitative & Quantitative Evaluation https://github.com/jeanigarcia/recsys2018-evaluation-tutorial
  • 37. | 36 Recommender Team Publications Hristakeva, M., Kershaw, D., Pettit, B., Vargas, S., & Jack, K. (2019). Academic recommendations: The Mendeley case. In Collaborative Recommendations: Algorithms, Practical Challenges and Applications. Pettit, B., Hristakeva, M., Kershaw, D. & Jack, K. (2018). Learning to Rank Research Articles: A case study of collaborative filtering and learning to rank in Science Direct. Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building recommender systems for scholarly information. WSDM2017. Rossetti, M., Vargas, S., Pettit, B., Kershaw, D., Hristakeva, M., & Jack, K. (2017). Effectively identifying users’ research interests for scholarly reference management and discovery. WSDM2017. Vargas, S., Hristakeva, M., & Jack, K. (2016). Mendeley: Recommendations for Researchers. RecSys ’16
  • 38. | 37 References From RankNet to LambdaRank to LambdaMART: An Overview (2010). Christopher J. C. Burges. On Application of Learning to Rank for E-Commerce Search by Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai (SIGIR 2017). Recommender Systems Handbook (2010). Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. Practical Machine Learning: Innovations in Recommendation (2014). Ted Dunning and Ellen Friedman. O'Reilly Media, Inc. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time by Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec (WWW 2018). Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks by Joan Serrà and Alexandros Karatzoglou (RecSys 2017)
  • 39. We're hiring, come speak to us! https://www.elsevier.com/about/careers/technology-careers