*** Please check out our LinkedIn Engineering blog post: https://engineering.linkedin.com/blog/2019/04/ai-behind-linkedin-recruiter-search-and-recommendation-systems ***
LinkedIn Talent Solutions business contributes to around 65% of LinkedIn’s annual revenue, and provides tools for job providers to reach out to potential candidates and for job seekers to find suitable career opportunities. LinkedIn’s job ecosystem has been designed as a platform to connect job providers and job seekers, and to serve as a marketplace for efficient matching between potential candidates and job openings. A key mechanism to help achieve these goals is the LinkedIn Recruiter product, which enables recruiters to search for relevant candidates and obtain candidate recommendations for their job postings.
We highlight a few unique information retrieval, system, and modeling challenges associated with talent search and recommendation systems.
In this talk, we will present how we formulated and addressed the problems, the overall system design and architecture, the challenges encountered in practice, and the lessons learned from the production deployment of these systems at LinkedIn. By presenting our experiences of applying techniques at the intersection of recommender systems, information retrieval, machine learning, and statistical modeling in a large-scale industrial setting and highlighting the open problems, we hope to stimulate further research and collaborations within the SIGIR community.
VoIP Service and Marketing using Odoo and Asterisk PBX
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned
1. Talent Search and Recommendation
Systems at LinkedIn
Practical Challenges and Lessons Learned
Qi Guo, Sahin Cem Geyik, Bo Hu, Cagri Ozcaglar,
Ketan Thakkar, Xianren Wu, Krishnaram Kenthapadi
AI @ LinkedIn
+SIGIR 2018
2. +
The Team
Qi Guo Sahin Cem Geyik Bo Hu Cagri Ozcaglar
Ketan Thakkar Xianren Wu Krishnaram Kenthapadi
7. Recruiter Search
• Criteria-Based Search
• A recruiter has specific requisitions to fill
• Candidate Recommendation System
• A recruiter may want many qualified candidates, goes through pages
• Considers Both Sides of the Talent Marketplace
• Talents are limited resources
10. Number of InMail Accepts Per Seat: 30% YoY
O V E R A L L I M P R O V E M E N T
11. Go Non-Linear with Tree Model
• Before: Linear Model optimized for NDCG with Coordinate Ascent
• After: XGBoost Tree Model
• Captures feature interactions
• XGBoost: gradient boosting tree models for richer model complexity
• Online Results:
METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT
Lift +7.5% +7.4% +5.1%
P-Value 2.1e-4 4.8e-4 0.01
13. Search for “Dentist”, a Software Engineer ranks high
P R O B L E M O B S E R V E D
• Focused too much on promoting active job-seeking candidate
• We want our ranking to be more context-aware
f( , , ) => Accept?
Reject?
Recruiter
Context
Query
Context
14. Context-Aware Ranking – Pairwise Training
f( , , )1
- f( , , )2Recruiter
Context
Query
Context
{
Shared Context
=>
• Pair up two candidates from the same search request:
Accept?
1
Accept?
2?>
16. Search for “Machine Learning Engineer”,
desirable to include some Data Scientists
P R O B L E M O B S E R V E D
17. Representation Learning
• Fuzzy semantic match on title ids, skill ids, company ids etc.
• Unsupervised Graph Embedding
• Co-Occurrence Graph based on profile data
18. Representation Learning
• Before: XGBoost
• After: XGBoost with Title Similarity Feature
• Based on unsupervised graph embedding
• Online Results:
METRIC PRECISION@5 PRECISION@25 OVERALL ACCEPT
Lift +2% +1.8% +3%
P-Value 0.2 0.25 0.11
19. Deep Learning?
• Differentiable Programming with TensorFlow
• Flexible for model engineering
• Offline result does not justify the effort yet.
• Offline Results (Pairwise NN v.s. Pointwise XGBoost):
METRIC PRECISION@1 PRECISION@5 PRECISION@25
Lift +5.3% +2.8% +1.7%
21. Entity-Level Personalization with GLMix
• GLMix: Generalized Linear Mixed Models
• GLMix: global model + per-entity models
• We added per-recruiter model and per-contract/company model
22. Entity-Level Personalization with GLMix
• Model Ensemble
• Nonlinearity via tree interaction features
• Each leaf node is a feature
• Offline Results (GLMix vs. Pairwise XGBoost):
METRIC PRECISION@1 PRECISION@5 PRECISION@25
Lift +8.5% +4.7% +2.0%
28. Search and Retrieval Architecture
• LinkedIn’s Galene is built on top of Lucene.
• Three main components:
• Search index on searcher
• The fanout queries through broker, and
• Live updates to the index using live-updater.
• Query language is similar to Lucene with OR, AND, NOT.
• The search index contains two types of fields:
• Inverted Fields
• Forward Fields
29. Search and Retrieval Architecture
• Static Rank
• An auxiliary rank for members to help with retrieving at scale
• Based on member profile and activity
• Early termination
• Index partitioned into N-shards, each retrieves and scores candidates
• Not all members in a shard can be retrieved, so query is early terminated on the basis of
static rank.
• Galene Facet Counting:
• Galene supports facet counting (such as region, titles, etc) for any given query.
• Uses statistical counting approximation based on sample in each shard
30. Layered Ranking Architecture
• L1: Better to scoop into the talent pool and score/rank more candidates.
• L2: Refines the short-listed talent to apply more dynamic features using external cache.