SlideShare ist ein Scribd-Unternehmen logo
1 von 214
Downloaden Sie, um offline zu lesen
SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Dynamic Information Retrieval ModelingTutorial 20142
Age of Empire
Dynamic Information Retrieval ModelingTutorial 20143
Dynamic Information Retrieval
Dynamic Information Retrieval ModelingTutorial 20144
Documents
to explore Information
need
Observed
documents
User
Devise a strategy for
helping the user
explore the
information space in
order to learn which
documents are
relevant and which
aren’t, and satisfy
their information
need.
Evolving IR
Dynamic Information Retrieval ModelingTutorial 20145
 Paradigm shifts in IR as new models emerge
 e.g.VSM → BM25 → Language Model
 Different ways of defining relationship between
query and document
 Static → Interactive → Dynamic
 Evolution in modeling user interaction with search
engine
Outline
Dynamic Information Retrieval ModelingTutorial 20146
 Introduction
 Static IR
 Interactive IR
 Dynamic IR
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
Conceptual Model – Static IR
Dynamic Information Retrieval ModelingTutorial 20147
Static IR
Interactive
IR
Dynamic
IR
 No feedback
Characteristics of Static IR
Dynamic Information Retrieval ModelingTutorial 20148
 Does not learn directly from user
 Parameters updated periodically
Static Information Retrieval
Model
Dynamic Information Retrieval ModelingTutorial 20149
Learning to
Rank
Dynamic Information Retrieval ModelingTutorial 201410
Commonly Used Static IR Models
BM25
PageRank
Language
Model
Feedback in IR
Dynamic Information Retrieval ModelingTutorial 201411
Outline
Dynamic Information Retrieval ModelingTutorial 201412
 Introduction
 Static IR
 Interactive IR
 Dynamic IR
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
Conceptual Model – Interactive IR
Dynamic Information Retrieval ModelingTutorial 201413
Static IR
Interactive
IR
Dynamic
IR
 Exploit Feedback
Interactive User Feedback
Dynamic Information Retrieval ModelingTutorial 201414
Like, dislike,
pause, skip
Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval ModelingTutorial 201415
Interactive Recommender
Systems
Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201416
Ambiguous
Query
Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201417
Topic: Car
Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201418
Topic:Animal
Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201419
Click on ‘car’
webpage
Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201420
Click on ‘Next
Page’
Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201421
Page 2 results:
Cars
Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201422
Click on ‘animal’
webpage
Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201423
Page 2 results:
Animals
Example – Dynamic Search
Dynamic Information Retrieval ModelingTutorial 201424
Topic: Guitar
Example – Dynamic Search
Dynamic Information Retrieval ModelingTutorial 201425
Diversified Page
1
Topics: Cars,
animals, guitars
Toy Example
Dynamic Information Retrieval ModelingTutorial 201426
 Multi-Page search scenario
 User image searches for “jaguar”
 Rank two of the four results over two pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
Toy Example – Static Ranking
Dynamic Information Retrieval ModelingTutorial 201427
 Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
Toy Example – Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201428
 Interactive Search
 Improve 2nd page based on feedback from 1st page
 Use clicks as relevance feedback
 Rocchio1 algorithm on terms in image webpage
 𝑤 𝑞
′
= 𝛼𝑤 𝑞 +
𝛽
|𝐷 𝑟|
𝑤 𝑑𝑑∈𝐷 𝑟
−
𝛾
𝐷 𝑛
𝑤 𝑑𝑑∈𝐷 𝑛
 New query closer to relevant documents and
different to non-relevant documents
1Rocchio, J. J., ’71, Baeza-Yates &
Ribeiro-Neto‘99
Toy Example – Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201429
 Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
1.
*
* Click
Toy Example – Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201430
 No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1.
?
?
Toy Example – Value Function
Dynamic Information Retrieval ModelingTutorial 201431
 Optimize both pages using dynamic IR
 Bellman equation for value function
 Simplified example:
 𝑉 𝑡
𝜃 𝑡
, Σ 𝑡
= max
𝑠 𝑡
𝜃𝑠
𝑡
+ 𝐸(𝑉 𝑡+1
𝜃 𝑡+1
, Σ 𝑡+1
𝐶 𝑡
)
 𝜃 𝑡
, Σ 𝑡
= relevance and covariance of documents for page 𝑡
 𝐶 𝑡 = clicks on page 𝑡
 𝑉 𝑡 =‘value’ of ranking on page 𝑡
 Maximize value over all pages based on estimating feedback
1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval ModelingTutorial 201432
 Covariance matrix represents similarity between images
Toy Example – Myopic Value
Dynamic Information Retrieval ModelingTutorial 201433
 For myopic ranking, 𝑉2
= 16.380
Page 1
2.
1.
Toy Example – Myopic Ranking
Dynamic Information Retrieval ModelingTutorial 201434
 Page 2 ranking stays the same regardless of clicks
Page 1 Page 2
2.
1.
2.
1.
Toy Example – Optimal Value
Dynamic Information Retrieval ModelingTutorial 201435
 For optimal ranking, 𝑉2
= 16.528
Page 1
2.
1.
Toy Example – Optimal Ranking
Dynamic Information Retrieval ModelingTutorial 201436
 If car clicked, Jaguar logo is more relevant on next page
Page 1 Page 2
2.
1.
2.
1.
Toy Example – Optimal Ranking
Dynamic Information Retrieval ModelingTutorial 201437
 In all other scenarios, rank animal first on next page
Page 1 Page 2
2.
1.
2.
1.
Interactive vs Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201438
• Treats interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes over
all interaction
• Long term gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic
Outline
Dynamic Information Retrieval ModelingTutorial 201439
 Introduction
 Static IR
 Interactive IR
 Dynamic IR
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
Conceptual Model – Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201440
Static IR
Interactive
IR
Dynamic
IR
 Explore and exploit Feedback
Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201441
Rich interactions
 Query formulation
 Document clicks
 Document examination
 eye movement
 mouse movements
 etc.
Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201442
Temporal dependency
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201443
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy
Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201444
 Dynamic IR explores actions
 Dynamic IR learns from user and adjusts its
actions
 May hurt performance in a single stage, but
improves over all stages
Applications to IR
Dynamic Information Retrieval ModelingTutorial 201445
 Dynamics found in lots of different aspects of IR
 Dynamic Users
 Users change behaviour over time, user history
 Dynamic Documents
 Information Filtering, document content change
 Dynamic Queries
 Changing query definition i.e.‘Twitter’
 Dynamic Information Needs
 Topic ontologies evolve over time
 Dynamic Relevance
 Seasonal/time of day change in relevance
User Interactivity in DIR
Dynamic Information Retrieval ModelingTutorial 201446
 Modern IR interfaces
 Facets
 Verticals
 Personalization
 Responsive to particular user
 Complex log data
 Mobile
 Richer user interactions
 Ads
 Adaptive targeting
Big Data
Dynamic Information Retrieval ModelingTutorial 201447
 Data set sizes are always increasing
 Computational footprint of learning to rank
 Rich, sequential data
1Yin He et. al, ’11
 Complex user model behaviour found in data, takes into
account reading, skipping and re-reading behaviours1
 Uses a POMDP
Example
Online Learning to Rank
Dynamic Information Retrieval ModelingTutorial 201448
 Learning to rank iteratively on sequential data
 Clicks as implicit user feedback/preference
 Often uses multi-armed bandit techniques
1Katja Hofmann et. al., ’11
2YisongYue et. al.,‘09
 Uses click models to interpret clicks and a contextual
bandit to improve learning1
 Pairwise comparison of rankings using duelling bandits
formulation2
Example
Evaluation
Dynamic Information Retrieval ModelingTutorial 201449
 Use complex user interaction data to assess rankings
 Compare ranking techniques in online testing
 Minimise user dissatisfaction
1Jeff Huang et. al.,‘11
2Olivier Chapelle et. al.,‘12
 Modelled cursor activity and correlated with eye tracking to
validate good or bad abandonment1
 Interleave search results from two ranking algorithms to
determine which is better2
Example
Filtering and News
Dynamic Information Retrieval ModelingTutorial 201450
 Adaptive techniques to personalize information filtering
or news recommendation
 Understand the complex dynamics of real world events
in search logs
 Capture temporal document change1
1Dennis Fetterly et. al.,‘03
2Stephen Robertson,‘02
3Jure Leskovec et. al.,‘09
 Uses relevance feedback to adapt threshold sensitivity over
time in information filtering to maximise overal utility1
 Detected patterns and memes in news cycles and modeled
how information spreads2
Example
Advertising
Dynamic Information Retrieval ModelingTutorial 201451
 Behavioural targeting and personalized ads
 Learn when to display new ads
 Maximise profit from available ads
1ShuaiYuan et. al.,‘12
2ZeyuanAllen Zhu et. al.,‘10
 Uses a POMDP and ad correlation to find the optimal ad to
display to a user1
 Dynamic click model that can interpret complex user
behaviour in logs and apply results to tail queries and unseen
ads2
Example
Outline
Dynamic Information Retrieval ModelingTutorial 201452
 Introduction
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
Outline
Dynamic Information Retrieval ModelingTutorial 201453
 Introduction
 Theory and Models
 Why not use supervised learning
 Markov Models
 Session Search
 Reranking
 Evaluation
Why not use Supervised Learning
for Dynamic IR Modeling?
Dynamic Information Retrieval ModelingTutorial 201454
 Lack of enough training data
 Dynamic IR problems contain a sequence of dynamic interactions
 E.g. a series of queries in session
 Rare to find repeated sequences (close to zero)
 Even in large query logs (WSCD 2013 & 2014, query logs fromYandex)
 Chance of finding repeated adjacent query pairs is
also low
Dataset Repeated Adjacent
Query Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD 2013 476,390 17,784,583 2.68%
WSCD 2014 1,959,440 35,376,008 5.54%
Our Solution
Dynamic Information Retrieval ModelingTutorial 201455
Try to find an optimal solution through a
sequence of dynamic interactions
Trial and Error:
learn from repeated, varied attempts which
are continued until success
No Supervised Learning
Trial and Error
Dynamic Information Retrieval ModelingTutorial 201456
 q1 – "dulles hotels"
 q2 – "dulles airport"
 q3 – "dulles airport location"
 q4 – "dulles metrostop"
Dynamic Information Retrieval ModelingTutorial 201457
 Rich interactions
Query formulation, Document clicks, Document examination,
eye movement, mouse movements, etc.
 Temporal dependency
 Overall goal
Recap – Characteristics of
Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201458
 Model interactions, which means it needs to have place holders for
actions;
 Model information need hidden behind user queries and other
interactions;
 Set up a reward mechanism to guide the entire search algorithm to adjust
its retrieval strategies;
 Represent Markov properties to handle the temporal dependency.
What is a Desirable Model for
Dynamic IR
A model inTrial and Error setting will do!
A Markov Model will do!
Outline
Dynamic Information Retrieval ModelingTutorial 201459
 Introduction
 Theory and Models
 Why not use supervised learning
 Markov Models
 Session Search
 Reranking
 Evaluation
Markov Process
 Markov Property1 (the “memoryless” property)
for a system, its next state depends on its current state.
Pr(Si+1|Si,…,S0)=Pr(Si+1|Si)
 Markov Process
a stochastic process with Markov property.
e.g.
Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,‘06
s0 s1
…… si
……si+1
Dynamic Information Retrieval ModelingTutorial 201461
 Markov Chain
 Hidden Markov Model
 Markov Decision Process
 Partially Observable Markov Decision Process
 Multi-armed Bandit
Family of Markov Models
A
Pagerank(A)
 Discrete-time Markov process
 Example: Google PageRank1
Markov Chain
B
Pagerank(B)
𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘 𝑆 =
1 − 𝛼
𝑁
+ 𝛼
𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘(𝑌)
𝐿(𝑌)
𝑌∈Π
# of pages # of outlinks
pages linked to S
Dynamic Information Retrieval ModelingTutorial 201462
D
Pagerank(D)
C
Pagerank(C)
E
Pagerank(E)
Random jump factor
1L. Page et. al.,‘99
The stable state distribution of such an MC is PageRank
 State S – web page
 Transition probability M
 PageRank: how likely a random web
surfer will land on a page
(S, M)
Hidden Markov Model
 A Markov chain that states are hidden and observable
symbols are emitted with some probability according to its
states1.
Dynamic Information Retrieval ModelingTutorial 201463
s0 s1 s2
……
o0 o1 o2
p0
𝑒0
p1 p2
𝑒1 𝑒2
Si– hidden state pi -- transition probability oi --observation
ei --observation probability (emission probability)
1Leonard E. Baum et. al.,‘66
(S, M, O, e)
An HMM example for IR
Construct an HMM for each document1
Dynamic Information Retrieval ModelingTutorial 201464
s0 s1 s2 ……
t0 t1 t2
p0
𝑒0
p1 p2
𝑒1 𝑒2
Si– “Document” or
“General English”
pi –a0 or a1
ti – query term
ei – Pr(t|D) or Pr(t|GE)
P(D|q)∝ (𝑎0 𝑃 𝑡 𝐺𝐸 + 𝑎1 𝑃(𝑡|𝐷))𝑡∈𝑞
Document-to-query relevance
1Miller et. al.‘99
query
 MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2
Markov Decision Process
Dynamic Information Retrieval ModelingTutorial 201465
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman,‘57
(S, M, A, R, γ)
Definition of MDP
 A tuple (S, M, A, R, γ)
 S : state space
 M: transition matrix
Ma(s, s') = P(s'|s, a)
 A: action space
 R: reward function
R(s,a) = immediate reward taking action a at state s
 γ: discount factor, 0< γ ≤1
 policy π
π(s) = the action taken at state s
 Goal is to find an optimal policy π* maximizing the expected
total rewards.
Dynamic Information Retrieval ModelingTutorial 201466
Policy
Policy: (s) = a
According to which,
select an action a at
state s.
(s0) =move right and ups0
(s1) =move right and ups1
(s2) = move rights2
Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrin’s ML lecture]
Value of Policy
Value:V(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
(s0)
V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by   [0,1)
Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrin’s ML lecture]
Value of Policy
Value:V(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
(s0)
V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by   [0,1)
s1
R(s1)
s1’’
s1’
R(s1’)
R(s1’’)
Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrin’s ML lecture]
Value of Policy
Value:V(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
(s0)
V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by   [0,1)
s1
R(s1)
s1’’
s1’
R(s1’)
R(s1’’)
(s1)
R(s2)
s2
(s1’)
(s1’’)
s2’’
s2’
R(s2’)
R(s2’’)
Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrin’s ML lecture]
Computing the value of a policy
Dynamic Information Retrieval ModelingTutorial 201471
V(s0) = 𝐸 𝜋
[𝑅 𝑠0, 𝑎 + 𝛾𝑅 𝑠1, 𝑎 + 𝛾2 𝑅 𝑠2, 𝑎 + 𝛾3 𝑅 𝑠3, 𝑎 + ⋯ ]
=𝐸 𝜋[𝑅 𝑠0, 𝑎 + 𝛾 𝛾 𝑡−1 𝑅(𝑠𝑡, 𝑎)∞
𝑡=1 ]
=𝑅 𝑠0, 𝑎 + 𝛾𝐸 𝜋
[ 𝛾 𝑡−1
𝑅(𝑠𝑡, 𝑎)∞
𝑡=1 ]
=𝑅 𝑠0, 𝑎 + 𝛾 𝑀 𝜋 𝑠 (𝑠, 𝑠′) 𝑉(𝑠′)𝑠′
Value function
A possible next state
The current
state
Optimality — Bellman Equation
 The Bellman equation1 to MDP is a recursive definition of
the optimal value function V*(.)
𝑉∗ s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
𝑠′
Dynamic Information Retrieval ModelingTutorial 201472
 Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)
𝑠′
1R. Bellman,‘57
state-value function
Optimality — Bellman Equation
 The Bellman equation can be rewritten as
𝑉∗ 𝑠 = max
a
𝑄(𝑠, 𝑎)
𝑄(𝑠, 𝑎) = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
𝑠′
Dynamic Information Retrieval ModelingTutorial 201473
 Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑄 𝑠, 𝑎
action-value function
Relationship
betweenV and Q
MDP algorithms
Dynamic Information Retrieval ModelingTutorial 201474
 Value Iteration
 Policy Iteration
 Modified Policy Iteration
 Prioritized Sweeping
 Temporal Difference (TD) Learning
 Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98,
Richard Sutton,‘88,Watkins,‘92]
Solve Bellman
equation
Optimal
valueV*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML lecture]
Value Iteration
 Initialization
Initialize 𝑉0 𝑠 arbitrarily
 Loop
 Iteration
𝑉𝑖+1 𝑠 ← max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
π s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
 Stopping criteria
 π s is good enough
Dynamic Information Retrieval ModelingTutorial 201475
1Bellman,‘57
Greedy Value Iteration
 Initialization
Initialize 𝑉0 𝑠 arbitrarily
 Iteration
𝑉𝑖+1 𝑠 ← max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
 Stopping criteria
∀𝑠 𝑉𝑖+1 𝑠 − 𝑉𝑖 𝑠 < ε
 Optimal policy
π s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)
𝑠′
Dynamic Information Retrieval ModelingTutorial 201476
1Bellman,‘57
Greedy Value Iteration
1. For each state s∈S
Initialize V0(s) arbitrarily
End for
2. 𝑖 ← 0
3. Repeat
3.1 𝑖 ← 𝑖 + 1
3.2 For each 𝑠 ∈ 𝑆
𝑉𝑖 𝑠 ← max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖−1(𝑠′)𝑠′
end for
until ∀𝑠 𝑉𝑖 𝑠 − 𝑉𝑖−1 𝑠 < ε
4. For each 𝑠 ∈ 𝑆
π s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
end for
Algorithm
Dynamic Information Retrieval ModelingTutorial 201477
V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6
V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) }
=max{3+0.96*4.6, 6+0.96*8.0}
=max{7.416, 13.68}
=13.68
Greedy Value Iteration
𝑉 s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉(𝑠′)
𝑠′
V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4
V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8
Dynamic Information Retrieval ModelingTutorial 201478
Ma1
=
0.3 0.7 0
1.0 0 0
0.8 0.2 0
Ma2
=
0 0 1.0
0 0.2 0.8
0 1.0 0
a1 a2
Greedy Value Iteration
𝑉 s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉(𝑠′)
𝑠′
Dynamic Information Retrieval ModelingTutorial 201479
i V(i)(S1) V(i)(S2) V(i)(S3)
0 6 4 8
1 13.680 9.760 13.376
2 18.841 17.133 20.380
3 25.565 22.087 25.759
… … … …
200 168.039 165.316 168.793
Ma1
=
0.3 0.7 0
1.0 0 0
0.8 0.2 0
Ma2
=
0 0 1.0
0 0.2 0.8
0 1.0 0
a1a2 a1
π S1 π S 𝟐 π S 𝟑
a2 a1 a1
Policy Iteration
 Initialization
𝑉π0
𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦
 Iteration (over i )
 Policy Evaluation
𝑉π 𝑖
𝑠
∞
← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉π 𝑖
(𝑠′)
𝑠′
 Policy Improvement
π𝑖+1 s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉π 𝑖
(𝑠′)𝑠′
 Stop criteria
Policy stops changing
Dynamic Information Retrieval ModelingTutorial 201480
1Howard ,‘60
Policy Iteration
1.For each state s∈S
𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0
End for
2. Repeat
2.1 Repeat
For each 𝑠 ∈ 𝑆
𝑉′(𝑠) ← 𝑉(𝑠)
𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′
End for
until ∀𝑠 𝑉 𝑠 − 𝑉′ 𝑠 < ε
2.2 For each 𝑠 ∈ 𝑆
π𝑖+1 s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′
𝑉(𝑠′)
𝑠′
End for
2.3 𝑖 ← 𝑖 + 1
Until π𝑖 = π𝑖−1
Algorithm
Dynamic Information Retrieval ModelingTutorial 201481
Modified Policy Iteration
 The “Policy Evaluation” step in Policy Iteration is time-
consuming, especially when the state space is large.
 The Modified Policy Iteration calculates an approximated
policy evaluation by running just a few iterations
Dynamic Information Retrieval ModelingTutorial 201482
Modified Policy
Iteration
Policy Iteration
GreedyValue Iterationk=1
k=∞
Modified Policy Iteration
1.For each state s∈S
𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0
End for
2. Repeat
2.1 Repeat k times
For each 𝑠 ∈ 𝑆
𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′
End for
2.2 For each 𝑠 ∈ 𝑆
π𝑖+1 s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)
𝑠′
End for
2.3 𝑖 ← 𝑖 + 1
Until π𝑖 = π𝑖−1
Algorithm
Dynamic Information Retrieval ModelingTutorial 201483
MDP algorithms
Dynamic Information Retrieval ModelingTutorial 201484
 Value Iteration
 Policy Iteration
 Modified Policy Iteration
 Prioritized Sweeping
 Temporal Difference (TD) Learning
 Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98,
Richard Sutton,‘88,Watkins,‘92]
Solve Bellman
equation
Optimal
valueV*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML lecture]
Temporal Difference Learning
Dynamic Information Retrieval ModelingTutorial 201485
 Monte Carlo Sampling can be used for model-free policy iteration
 Estimate 𝑉 𝜋 s in “Policy Evaluation” by the average reward of trajectories from s
 However, on the trajectories, some of them can be reused
 So, we estimate them by an expectation over next state
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + γ𝐸 𝑉 𝜋 𝑠′
|𝑠, 𝑎
 The simplest estimation:
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + 𝛾𝑉 𝜋 s′
 A smoothed version:
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 s′
+ (1 − 𝛼) 𝑉 𝜋 𝑠
 TD-Learning rule:
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠)
 r is the immediate reward, α is the learning rate
Temporal difference
Richard Sutton,‘88
Singh & Sutton,‘96
Sutton & Barto,‘98
Dynamic Information Retrieval ModelingTutorial 201486
1. For each state s∈S
Initialize V 𝜋(s) arbitrarily
End for
2. For each step in the state sequence
2.1 Initialize s
2.2 repeat
2.2.1 take action a at state s according to 𝜋
2.2.2 observe immediate reward r and the next state 𝑠′
2.2.3 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′
− 𝑉 𝜋(𝑠)
2.2.4 𝑠 ← 𝑠′
Until s is a terminal state
End for
Algorithm
Temporal Difference Learning
Q-Learning
Dynamic Information Retrieval ModelingTutorial 201487
 TD-Learning rule
 Q-learning rule
𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max
𝑎′
𝑄 𝑠′, 𝑎′ − 𝑄(𝑠, 𝑎)
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠)
𝑉 𝑠 = max
a
𝑄(𝑠, 𝑎)
𝜋∗
𝑠 = arg 𝑚𝑎𝑥
𝑎
𝑄∗
(𝑠, 𝑎)
𝑄∗
𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′) max
𝑎′
𝑄∗
(𝑠′
, 𝑎′)
𝑠′
Q-Learning
Dynamic Information Retrieval ModelingTutorial 201488
1. For each state s∈S and a∈A
initialize Q0(s,a) arbitrarily
End for
2. 𝑖 ← 0
3. For each step in the state sequence
3.1 Initialize s
3.2 Repeat
3.2.1 𝑖 ← 𝑖 + 1
3.2.2 select an action a at state s according to Qi-1
3.2.3 take action a, observe immediate reward r and the next state 𝑠′
3.2.4 𝑄𝑖 𝑠, 𝑎 ← 𝑄𝑖−1 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max
𝑎′
𝑄𝑖−1 𝑠′
, 𝑎′
− 𝑄𝑖−1(𝑠, 𝑎)
3.2.5 𝑠 ← 𝑠′
Until s is a terminal state
End for
4. For each 𝑠 ∈ 𝑆
π s ← arg 𝑚𝑎𝑥
𝑎
𝑄𝑖 𝑠, 𝑎
End for
Algorithm
Apply an MDP to an IR Problem
Dynamic Information Retrieval ModelingTutorial 201489
 We can model IR systems using a Markov Decision
Process
 Is there a temporal component?
 States –What changes with each time step?
 Actions – How does your system change the state?
 Rewards – How do you measure feedback or
effectiveness in your problem at each time step?
 Transition Probability – Can you determine this?
 If not, then model free approach is more suitable
Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval ModelingTutorial 201490
 User agent in session search
 States – user’s relevance judgement
 Action – new query
 Reward – information gained
Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval ModelingTutorial 201491
 Search engine’s perspective
 What if we can’t directly observe user’s relevance
judgement?
 Click ≠ relevance
? ? ? ?
Dynamic Information Retrieval ModelingTutorial 201492
 Markov Chain
 Hidden Markov Model
 Markov Decision Process
 Partially Observable Markov Decision Process
 Multi-armed Bandit
Family of Markov Models
POMDP Model
Dynamic Information Retrieval ModelingTutorial 201493
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
 Hidden states
 Observations
 Belief
1R. D. Smallwood et. al.,‘73
o1 o2 o3
POMDP Definition
Dynamic Information Retrieval ModelingTutorial 201494
 A tuple (S, M,A, R, γ, O, Θ, B)
 S : state space
 M: transition matrix
 A: action space
 R: reward function
 γ: discount factor, 0< γ ≤1
 O: observation set
an observation is a symbol emitted according to a hidden state.
 Θ: observation function
Θ(s,a,o) is the probability that o is observed when the system transitions
into state s after taking action a, i.e. P(o|s,a).
 B: belief space
Belief is a probability distribution over hidden states.
Dynamic Information Retrieval ModelingTutorial 201495
 The agent uses a state estimator to update its belief about the
hidden states
b′
= 𝑆𝐸(𝑏, 𝑎, 𝑜′)
 b′
s′
= P s′
o′
, a, b =
𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=
Θ(𝑠′, 𝑎, 𝑜′) 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)𝑠
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
Dynamic Information Retrieval ModelingTutorial 201496
 The Bellman equation for POMDP
𝑉 𝑏 = max
𝑎
𝑟 𝑏, 𝑎 + 𝛾 𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)
𝑜′
 A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)
 B : the continuous belief space
 𝑀′: transition function 𝑀 𝑎
′ (𝑏, 𝑏′)= 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)𝑜∈𝑂
where 1 𝑎,𝑜′ 𝑏′
, 𝑏 =
1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′
0, 𝑒𝑙𝑠𝑒
.
 A: action space
 r: reward function r(b, a)= 𝑏 𝑠 𝑅(𝑠, 𝑎)𝑠∈𝑆
POMDP → Bellman Equation
Dynamic Information Retrieval ModelingTutorial 201497
The optimal policy of a POMDP
The optimal policy of its belief MDP
1L. Kaelbling et. al., ’98
A variation of the value iteration algorithm
Solving POMDPs – The Witness
Algorithm
Policy Tree
Dynamic Information Retrieval ModelingTutorial 201498
• A policy tree of depth i is an i-step non-stationary policy
• As if we run value iteration until the ith iteration
a(h)
ok(h) ok
a11
a21
a2k a2l
… …
…
…
…
… … … … … …
o1 ol
…aik
…
a(i-1)k
ai1
ail
o1 olok
i steps to go
i-1 steps to go
2 steps to go
1 step to go
Value of a Policy Tree
Dynamic Information Retrieval ModelingTutorial 201499
 Can only determine the value of a policy tree h from some belief state
b, because it never knows the exact state.
𝑉ℎ 𝑏 = 𝑏(𝑠)𝑉ℎ(𝑠)𝑠∈𝑆
 𝑉ℎ 𝑠 = 𝑅 𝑠, 𝑎 ℎ + 𝛾 𝑀 𝑎 ℎ (𝑠, 𝑠′) Θ(𝑠′, 𝑎 ℎ , 𝑜𝑖)𝑉𝑜 𝑘 ℎ (𝑠′)𝑜 𝑘∈𝑂𝑠′∈𝑆
the action at the
root node of h
the (i-1)-step subtree associated
with ok under the root node of h
Idea of the Witness Algorithm
Dynamic Information Retrieval ModelingTutorial 2014100
 For each action a, compute Γ𝑖
𝑎
, the set of candidate i-step policy
trees with action a at their roots
 The optimal value function at the ith step, 𝑉𝑖
∗
(b), is the upper
surface of the value functions of all i-step policy trees.
Optimal value function
Dynamic Information Retrieval ModelingTutorial 2014101
 Geometrically, 𝑉𝑖
∗
(b) is piecewise linear and convex.
An example for a two-state POMDP
b(s1)+b(s2)=1
Simplex constraint
The belief space is one-dimensional
Vh2(b)
Vh3(b)
Vh1(b)
Vh5(b)
Vh4(b)
𝑉𝑖
∗
𝑏 = max
ℎ∈H
𝑉ℎ 𝑏
Pruning the Set of
PolicyTrees
Outlines of the Witness Algorithm
Dynamic Information Retrieval ModelingTutorial 2014102
Algorithm
1.𝐻1 ←{}
2. i ← 1
3. Repeat
3.1 i ← i+1
3.2 For each a in A
Γ𝑖
𝑎
← witness(𝐻i−1, a)
end for
3.3 Prune Γ𝑖
𝑎
𝑎 to get 𝐻i
until 𝑠𝑢𝑝 𝑏|Vi(b) − Vi−1(b)| < 𝜀
the inner loop
Inner Loop of the Witness
Algorithm
Dynamic Information Retrieval ModelingTutorial 2014103
Inner loop of the witness algorithm
1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add
ℎi to an agenda.
2. In each iteration
2.1 Select a policy tree ℎ 𝑛𝑒𝑤 from the agenda.
2.2 Look for a witness point b using Za and ℎ 𝑛𝑒𝑤.
2.3 If find such a witness point b,
2.3.1 Calculate the best policy tree ℎ 𝑏𝑒𝑠𝑡 for b.
2.3.2 Add ℎ 𝑏𝑒𝑠𝑡 to Za.
2.3.3 Add all the alternative trees of ℎ 𝑏𝑒𝑠𝑡 to the agenda.
2.4 Else remove ℎ 𝑛𝑒𝑤 from the agenda.
3. Repeat the above iteration until the agenda is empty.
Other Solutions
Dynamic Information Retrieval ModelingTutorial 2014104
 QMDP1
 MC-POMDP (Monte Carlo POMDP)2
 Grid BasedApproximation3
 Belief Compression4
……
1 Thrun et. al.,‘06
2 Thrun et. al.,‘05
3 Lovejoy,‘91
4 Roy,‘03
Dynamic Information Retrieval ModelingTutorial 2014105
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
Applying POMDP to Dynamic IR
Session Search Example - States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
 scooter price ⟶ scooter stores  Hartford visitors ⟶ Hartford
Connecticut tourism
 Philadelphia NYC travel ⟶
Philadelphia NYC train
 distance NewYork Boston ⟶
maps.bing.com
q0
106 [ J. Luo ,et al., ’14]
Session Search Example - Actions
(Au, Ase)
 User Action(Au)
 Add query terms (+Δq)
 Remove query terms (-Δq)
 keep query terms (qtheme)
 clicked documents
 SAT clicked documents
 Search Engine Action(Ase)
 increase/decrease/keep term weights,
 Switch on or switch off query expansion
 Adjust the number of top documents used in PRF
 etc.
107 [ J. Luo et al., ’14]
Multi Page Search Example -
States & Actions
Dynamic Information Retrieval ModelingTutorial 2014108
State:
Relevance
of
document
Action:
Ranking of
documents
Observation:
Clicks
Belief: Multivariate
Guassian
Reward: DCG over 2
pages
[Xiaoran Jin et. al., ’13]
SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Exercise
Dynamic Information Retrieval ModelingTutorial 2014110
 Markov Chain
 Hidden Markov Model
 Markov Decision Process
 Partially Observable Markov Decision Process
 Multi-Armed Bandit
Family of Markov Models
Multi Armed Bandits (MAB)
Dynamic Information Retrieval ModelingTutorial 2014111
……
……
Which slot
machine should
I select in this
round?
Reward
Multi Armed Bandits (MAB)
Dynamic Information Retrieval ModelingTutorial 2014112
I won! Is this
the best slot
machine?
Reward
MAB Definition
Dynamic Information Retrieval ModelingTutorial 2014113
 A tuple (S,A, R, B)
S : hidden reward distribution of each bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each bandit’s
distribution
Comparison with Markov Models
Dynamic Information Retrieval ModelingTutorial 2014114
 Single state Markov Decision Process
No transition probability
 Similar to POMDP in that we maintain a belief
state
 Action = choose a bandit, does not affect state
 Does not‘plan ahead’ but intelligently adapts
 Somewhere between interactive and dynamic IR
Markov Multi Armed Bandits
Dynamic Information Retrieval ModelingTutorial 2014115
……
……
Markov
Process 1
Markov
Process 2
Markov
Process k
Which slot
machine should
I select in this
round?
Reward
Markov Multi Armed Bandits
Dynamic Information Retrieval ModelingTutorial 2014116
……
……
Markov
Process 1
Markov
Process 2
Markov
Process k
Markov
Process
Action
Which slot
machine should
I select in this
round?
Reward
MAB Policy Reward
Dynamic Information Retrieval ModelingTutorial 2014117
 MAB algorithm describes a policy 𝜋 for choosing
bandits
 Maximise rewards from chosen bandits over all
time steps
 Minimize regret
 𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))𝑇
𝑡=1
 Cumulative difference between optimal reward and
actual reward
Exploration vs Exploitation
Dynamic Information Retrieval ModelingTutorial 2014118
 Exploration
 Try out bandits to find which has highest average reward
 Exploitation
 Too much exploration leads to poor performance
 Play bandits that are known to pay out higher reward on average
 MAB algorithms balance exploration and exploitation
 Start by exploring more to find best bandits
 Exploit more as best bandits become known
Exploration vs Exploitation
Dynamic Information Retrieval ModelingTutorial 2014119
MAB – Index Algorithms
Dynamic Information Retrieval ModelingTutorial 2014120
 Gittens index1
 Play bandit with highest‘Dynamic Allocation Index’
 Modelled using MDP but suffers‘curse of dimensionality’
 𝜖-greedy2
 Play highest reward bandit with probability 1 − ϵ
 Play random bandit with probability 𝜖
 UCB (Upper Confidence Bound)3
 Play bandit 𝑖 with highest 𝑥𝑖 +
2 ln 𝑡
𝑇 𝑖
 Chances of playing infrequently played bandits increases over
time
1J. C. Gittins.‘89
2Nicolò Cesa-Bianchi et. al.,‘98
3P.Auer et. al.,‘02
MAB use in IR
Dynamic Information Retrieval ModelingTutorial 2014121
 Choosing ads to display to users1
 Each ad is a bandit
 User click through rate is reward
 Recommending news articles2
 News article is a bandit
 Similar to Information Filtering case
 Diversifying search results3
 Each rank position is an MAB dependent on higher ranks
 Documents are bandits chosen by each rank
1Deepayan Chakrabarti et. al. ,‘09
2Lihong Li et. al., ’10
3Radlinski et. al.,‘08
MAB Variations
Dynamic Information Retrieval ModelingTutorial 2014122
 Contextual Bandits1
 World has some context 𝑥 ∈ 𝑋 (i.e. user location)
 Learn policy 𝜋: 𝑋 → 𝐴 that maps context to arms (online or
offline)
 Duelling Bandits2
 Play two (or more) bandits at each time step
 Observe relative reward rather than absolute
 Learn order of bandits
 Mortal Bandits3
 Value of bandits decays over time
 Exploitation > exploration
1Lihong Li et. al.,‘10
2YisongYue et. al.,‘09
3Deepayan Chakrabarti et. al. ,‘09
Comparison of Markov Models
Dynamic Information Retrieval ModelingTutorial 2014123
 MC – a fully observable stochastic process
 HMM – a partially observable stochastic process
 MDP – a fully observable decision process
 MAB – a decision process, either fully or partially observable
 POMDP – a partially observable decision process
actions rewards states
MC No No Observable
HMM No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Exercise
Outline
Dynamic Information Retrieval ModelingTutorial 2014125
 Introduction
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
TREC Session Tracks (2010-2012)
 Given a series of queries {q1,q2,…,qn}, top 10 retrieval
results {D1, … Di-1 } for q1 to qi-1, and click information
 The task is to retrieve a list of documents for the current/last
query, qn
 Relevance judgment is made based on how relevant the
documents are for qn, and how relevant they are for information
needs for the entire session (in topic description)
 no need to segment the sessions
126
1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
127
Information needs:
You are planning a winter vacation to the
Pocono Mountains region in Pennsylvania in
the US.Where will you stay?What will you
do while there? How will you get there?
In a session, queries change
constantly
Query change is an important
form of feedback
 We define query change as the syntactic editing changes
between two adjacent queries:
 includes
 , added terms
 , removed terms
 The unchanged/shared terms are called:
 , theme term
1 iii qqq
iq
128
iq
iq
iq
themeq
q1 = “bollywood legislation”
q2 = “bollywood law”
---------------------------------------
ThemeTerm = “bollywood”
Added (+Δq) = “law”
Removed (-Δq) = “legislation”
Where do these query changes come
from?
 GivenTREC Session settings, we consider two sources of
query change:
 the previous search results that a user viewed/read/examined
 the information need
 Example:
 Kurosawa  Kurosawa wife
 `wife’ is not in any previous results, but in the topic description
 However, knowing information needs before search is
difficult to achieve
129
Previous search results could influence
query change in quite complex ways
 Merck lobbyists  Merck lobbying US policy
 D1 contains several mentions of‘policy’, such as
 “A lobbyist who until 2004 worked as senior policy advisor to
Canadian Prime Minister Stephen Harper was hired last month by
Merck …”
 These mentions are about Canadian policies; while the user adds
US policy in q2
 Our guess is that the user might be inspired by‘policy’, but
he/she prefers a different sub-concept other than `Canadian
policy’
 Therefore, for the added terms `US policy’,‘US’ is the novel term
here, and‘policy’ is not since it appeared in D1.
 The two terms should be treated differently
130
 We propose to model session search as a Markov decision process (MDP)
 Two agents: the User and the Search Engine
Dynamic Information Retrieval ModelingTutorial 2014131
 Environments
Search results
 States Queries
 Actions
 User actions:
Add/remove/unchange
the query terms
 Search Engine actions:
Increase/ decrease
/remain term weights
Applying MDP to Session Search
Search Engine Agent’s Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck lobbyists
US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N
No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
132
Query Change retrieval Model
(QCM)
 Bellman Equation gives the optimal value for an MDP:
 The reward function is used as the document relevance score
function and is tweaked backwards from Bellman equation:
133
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
 
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1

Document
relevant score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevanc
e score
Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii



















134
• According to Query Change and Search Engine
Actions
Current reward/
relevance score
Increase weights
for theme terms
Decrease weights
for removed terms
Increase weights
for novel added
terms
Decrease weights
for old added
terms
Maximizing the Reward Function
 Generate a maximum rewarded document denoted as d*
i-1,
from Di-1
 That is the document(s) most relevant to qi-1
 The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − {1 − 𝑃(𝑡|𝑑𝑖−1)}
𝑡∈𝑞 𝑖−1
𝑃 𝑡 𝑑𝑖−1 =
#(𝑡,𝑑 𝑖−1)
|𝑑 𝑖−1|
 From several options, we choose to only use the document
with top relevance
max
Di-1
P(qi-1 | Di-1)
135
Scoring the Entire Session
 The overall relevance score for a session of queries is
aggregated recursively :
Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= gn-i
i=1
n
å Score(qi, d)
136
Experiments
 TREC 2011-2012 query sets, datasets
 ClubWeb09 Category B
137
Search Accuracy (TREC 2012)
 nDCG@10 (official metric used inTREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.2474 -21.54% 0.1274 -18.28%
TREC’12 median 0.2608 -17.29% 0.1440 -7.63%
Our TREC’12
submission
0.3021 −4.19% 0.1490 -4.43%
TREC’12 best 0.3221 0.00% 0.1559 0.00%
QCM 0.3353 4.10%† 0.1529 -1.92%
QCM+Dup 0.3368 4.56%† 0.1537 -1.41%
138
Search Accuracy (TREC 2011)
 nDCG@10 (official metric used inTREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.3378 -23.38% 0.1118 -25.86%
TREC’11 median 0.3544 -19.62% 0.1143 -24.20%
TREC’11 best 0.4409 0.00% 0.1508 0.00%
QCM 0.4728 7.24%† 0.1713 13.59%†
QCM+Dup 0.4821 9.34%† 0.1714 13.66%†
Our TREC’12
submission
0.4836 9.68%† 0.1724 14.32%†
139
Search Accuracy for Different
Session Types
 TREC 2012 Sessions are classified into:
 Product: Factual / Intellectual
 Goal quality: Specific / Amorphous
Intellec
tual
%chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
140
- Better handle sessions that demonstrate evolution and exploration
Because QCM treats a session as a continuous process by studying
changes among query transitions and modeling the dynamics
Outline
Dynamic Information Retrieval ModelingTutorial 2014141
 Introduction
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
Multi Page Search
Dynamic Information Retrieval ModelingTutorial 2014142
Multi Page Search
Dynamic Information Retrieval ModelingTutorial 2014143
Page 1 Page 2
2.
1.
2.
1.
Relevance Feedback
Dynamic Information Retrieval ModelingTutorial 2014144
 No UI Changes
 Interactivity is Hidden
 Private, performed in browser
Relevance Feedback
Dynamic Information Retrieval ModelingTutorial 2014145
Page 1
• Diverse Ranking
• Maximise
learning
potential
• Exploration vs
Exploitation
Page 2
• Clickthroughs or
explicit ratings
• Respond to
feedback from
page 1
• Personalized
Model
Dynamic Information Retrieval ModelingTutorial 2014146
Model
Dynamic Information Retrieval ModelingTutorial 2014147
 𝑁 𝜃1, Σ1
 𝜃1 -prior estimate of relevance
 Σ1
- prior estimate of covariance
 Document similarity
 Topic Clustering
Model
Dynamic Information Retrieval ModelingTutorial 2014148
 Rank action for page 1
Model
Dynamic Information Retrieval ModelingTutorial 2014149
Model
Dynamic Information Retrieval ModelingTutorial 2014150
 Feedback from page 1
 𝒓 ~ 𝑁(𝜃𝒔
1
, Σ 𝒔
1
)
Model
Dynamic Information Retrieval ModelingTutorial 2014151
 Update estimates using 𝒓1
 𝜃1
=
𝜃𝒔′
𝜃 𝒔′
Σ1
=
Σ𝒔′ Σs′𝒔′
Σs′𝒔′ Σ 𝒔′
 𝜃2
= 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′
−1
(𝒓1
− 𝜃𝒔′)
 Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′
−1
Σs′𝒔′
Model
Dynamic Information Retrieval ModelingTutorial 2014152
 Rank using PRP
Model
Dynamic Information Retrieval ModelingTutorial 2014153
 Utility or Ranking
 𝜆
𝜃 𝑠 𝑗
1
log2(𝑗+1)
+ 1 − 𝜆
𝜃 𝑠 𝑗
2
log2(𝑗+1)
2𝑀
𝑗=1+𝑀
𝑀
𝑗=1
 DCG
Model – Bellman Equation
Dynamic Information Retrieval ModelingTutorial 2014154
 Optimize 𝒔1 to improve 𝑼 𝒔
2
 𝑉 𝜃1
, Σ1
, 1 =
max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔2
(1 − 𝜆) 𝜃𝒔
2
. 𝑾2 𝑃 𝒓 𝑑𝒓𝒓
𝜆
Dynamic Information Retrieval ModelingTutorial 2014155
 Balances exploration and exploitation in page 1
 Tuned for different queries
 Navigational
 Informational
 𝜆 = 1 for non-ambiguous search
Approximation
Dynamic Information Retrieval ModelingTutorial 2014156
 Monte Carlo Sampling
 ≈ max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔2
1 − 𝜆
1
𝑆
𝜃𝒔
2
. 𝑾2 𝑃 𝒓𝑟∈𝑂
 Sequential Ranking Decision
Experiment Data
Dynamic Information Retrieval ModelingTutorial 2014157
 Difficult to evaluate without access to live users
 Simulated using 3TREC collections and relevance
judgements
 WT10G – Explicit Ratings
 TREC8 – Clickthroughs
 Robust – Difficult (ambiguous) search
User Simulation
Dynamic Information Retrieval ModelingTutorial 2014158
 Rank M documents
 Simulated user clicks according to relevance judgements
 Update page 2 ranking
 Measure at page 1 and 2
 Recall
 Precision
 nDCG
 MRR
 BM25 – prior ranking model
Investigating λ
Dynamic Information Retrieval ModelingTutorial 2014159
Baselines
Dynamic Information Retrieval ModelingTutorial 2014160
 𝜆 determined experimentally
 BM25
 BM25 with conditional update (𝜆 = 1)
 Maximum Marginal Relevance (MMR)
 Diversification
 MMR with conditional update
 Rocchio
 Relevance Feedback
Results
Dynamic Information Retrieval ModelingTutorial 2014161
Results
Dynamic Information Retrieval ModelingTutorial 2014162
Results
Dynamic Information Retrieval ModelingTutorial 2014163
Results
Dynamic Information Retrieval ModelingTutorial 2014164
Results
Dynamic Information Retrieval ModelingTutorial 2014165
 Similar results across data sets and metrics
 2nd page gain outweighs 1st page losses
 Outperformed Maximum Marginal Relevance using MRR to
measure diversity
 BM25-U simply no exploration case
 Similar results when 𝑀 = 5
Results
Dynamic Information Retrieval ModelingTutorial 2014166
Outline
Dynamic Information Retrieval ModelingTutorial 2014167
 Introduction
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
Dynamic Information Retrieval
Evaluation
EmineYilmaz
University College London
Emine.Yilmaz@ucl.ac.uk
Information Retrieval Systems
Match information seekers with
the information they seek
Retrieval Evaluation: Traditional
View
Retrieval Evaluation: Dynamic
View
Retrieval Evaluation: Dynamic
View
Retrieval Evaluation: Dynamic
View
Different Approaches to
Evaluation
 Online Evaluation
 Design interactive experiments
 Use users’ actions to evaluate the quality
 Inherently dynamic in nature
 Offline Evaluation
 Controlled laboratory experiments
 The users’ interaction with the engine is only simulated
 Recent work focused on dynamic IR evaluation
Online Evaluation
 Standard click metrics
 Clickthrough rate
 Probability user skips over results they have considered (pSkip)
 Most recently: Result interleaving



Click/Noclick
Evaluate
175
What is result interleaving?
 A way to compare rankers online
 Given the two rankings produced by two methods
 Present a combination of the rankings to users
 Team Draft Interleaving (Radlinski et al., 2008)
 Interleaving two rankings
 Input:Two rankings (“can be seen as teams who pick players”)
 Repeat:
o Toss a coin to see which team (ranking) picks next
o Winner picks their best remaining player (document)
o Loser picks their best remaining player (document)
 Output: One ranking (2 teams of 5)
 Credit assignment
 Ranking providing more of the clicked results wins
Team Draft InterleavingRanking A
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley – The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
AB
Team Draft InterleavingRanking A
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley – The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
B wins!
Team Draft InterleavingRanking A
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley – The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
B wins!
Repeat Over Many Different
Queries!
Offline Evaluation
 Controlled laboratory experiments
 The user’s interaction with the engine is
only simulated
 Ask experts to judge each query result
 Predict how users behave when they search
 Aggregate judgments to evaluate
180
Offline Evaluation
 Until recently: Metrics assume that user’s information need was not affected
by the documents read
 E.g.Average Precision, NDCG, …
• Users are more likely to stop searching when they see a highly relevant
document
• Lately: Metrics that incorporate the affect of relevance of documents seen
by the user on user behavior
 Based on devising more realistic user models
 EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09]
181
Modeling User Behavior
Cascade-based models
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
• The user views search results from top to bottom
• At each rank i, the user has a certain probability of being
satisfied.
• Probability of satisfaction proportional to the
relevance grade of the document at rank i.
• Once the user is satisfied with a document, he terminates
the search.
Rank Biased Precision
Query
Stop
View Next
Item
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
Rank Biased Precision
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…



1=i
1
=utilityTotal i
irel
examineddocsm.utility/NuTotalRBP 
)1/(1)1(=examineddocsNum.
1=i
1
 

i
i
)-(1=RBP
1=i
1


i
irel
Expected Reciprocal Rank
[Chapelle et al CIKM09]
Query
Stop
Relevant?
View Next
Item
nosomewhathighly
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
Expected Reciprocal Rank
[Chapelle et al CIKM09]
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
rrankatdocument"perfectthe"findingofUtility:(r)
1/r(r) 
)positionatstopsuser(
1
1
rP
r
ERR
n
r






1
11
)1(
1 r
i
ri
n
r
RR
r
ERR
documentitheofgraderelevance: th
ig
iRi g
g
i
i
docatstopofProb.
2
12
docofrelevanceofProb. max



Paris Luxurious HotelsParis HiltonJ LoSession Evaluation
What is a good system?
Measuring “goodness”
The user steps down a ranked list of documents and
observes each one of them until a decision point and either
a) abandons the search, or
b) reformulates
While stepping down or sideways, the user accumulates
utility
Evaluation over a single ranked list
1
2
3
4
5
6
7
8
9
10
…
kenya cooking
traditional swahili
kenya cooking
traditional
kenya swahili
traditional food
recipes
Session DCG
[Järvelin et al ECIR 2008]
kenya cooking
traditional swahili
kenya cooking
traditional

2rel(r)
1
logb (r b 1)r1
k


2rel(r)
1
logb (r b 1)r1
k

1
logc (1 c 1)
DCG(RL1) 
1
logc (2  c 1)
 DCG(RL2)
Model-based measures
Probabilistic space of users following
different paths
 Ω is the space of all paths
 P(ω) is the prob of a user following a path ω in Ω
 Mω is a measure over a path ω
[Yang and Lad ICTIR 2009,
Kanoulas et al. SIGIR 2011]
Probability of a path
Probability of abandoning at
reform 2
X
Probability of reformulating at rank
3
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
(1)
(2)
Expected Global Utility
[Yang and Lad ICTIR 2009]
1. User steps down ranked results one-by-one
2. Stops browsing documents based on a stochastic process
that defines a stopping probability distribution over ranks
and reformulates
3. Gains something from relevant documents, accumulating
utility
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
Probability
of abandoning
the session at
reformulation i
Geometric w/ parameter preform
(1)
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
Geometricw/parameterpdown
Probability
of reformulating
at rank j
(2)
Geometric w/ parameter preform
Expected Global Utility
[Yang and Lad ICTIR 2009]
 The probability of a user following a path ω:
P(ω) = P(r1, r2, ..., rK)
ri is the stopping and reformulation point in list i
 Assumption: stopping positions in each list are independent
P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)
 Use geometric distribution (RBP) to model the stopping and
reformulation behaviour
P(ri = r) = (1-)k1
Conclusions
 Recent focus on evaluating the dynamic nature of the search
process
 Interleaving
 New offline evaluation metrics
 ERR, RBU
 Session evaluation metrics
Outline
Dynamic Information Retrieval ModelingTutorial 2014200
 Introduction
 Theory and Models
 Session Search
 Reranking
 GuestTalk: Evaluation
 Conclusion
Conclusions
Dynamic Information Retrieval ModelingTutorial 2014201
 Dynamic IR describes a new class of interactive model
 Incorporates rich feedback, temporal dependency and is goal
oriented.
 Family of Markov models and Multi Armed Bandit theory
useful in building DIR models
 Applicable to a range of IR problems
 Useful in applications such as session search and evaluation
Dynamic IR Book
Dynamic Information Retrieval ModelingTutorial 2014202
 Published by Morgan & Claypool
 ‘Synthesis Lectures on Information Concepts, Retrieval, and
Services’
 Due March/April 2015 (in time for SIGIR 2015)
Acknowledgment
Dynamic Information Retrieval ModelingTutorial 2014203
 We thank Dr. EmineYilmaz for giving us the guest speech.
 We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
 We also thank comments and suggestions from the following
colleagues:
 Dr. Jamie Callan
 Dr. Ophir Frieder
 Dr. Fernando Diaz
 Dr Filip Radlinski
Dynamic Information Retrieval ModelingTutorial 2014204
Thank You
Dynamic Information Retrieval ModelingTutorial 2014205
References
Dynamic Information Retrieval ModelingTutorial 2014206
Static IR
 Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-
Neto.Addison-Wesley, 1999.
 The PageRank Citation Ranking: Bringing Order to theWeb.
Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd.
1999
 Implicit User Modeling for Personalized Search, Xuehua Shen et.
al, CIKM, 2005
 A Short Introduction to Learning to Rank. Hang Li, IEICE
Transactions 94-D(10): 1854-1862, 2011.
References
Dynamic Information Retrieval ModelingTutorial 2014207
Interactive IR
 Relevance Feedback in Information Retrieval, Rocchio, J. J.,The
SMART Retrieval System (pp. 313-23), 1971
 A study in interface support mechanisms for interactive
information retrieval, RyenW.White et. al, JASIST, 2006
 Visualizing stages during an exploratory search session, Bill Kules
et. al, HCIR, 2011
 Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011
 Structured Learning of Two-level Dynamic Rankings, Karthik
Raman et. al, CIKM, 2011
References
Dynamic Information Retrieval ModelingTutorial 2014208
Dynamic IR
 A hidden Markov model information retrieval system. D. R. H.
Miller,T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.
 Threshold setting and performance optimization in adaptive
filtering, Stephen Robertson, JIR 2002
 A large-scale study of the evolution of web pages, Dennis Fetterly
et. al.,WWW 2003
 Learning diverse rankings with multi-armed bandits. Filip
Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008.
 Interactively Optimizing Information Retrieval Systems as a
Dueling Bandits Problem,YisongYue et. al., ICML 2009
 Meme-tracking and the dynamics of the news cycle, Jure Leskovec
et. al., KDD 2009
References
Dynamic Information Retrieval ModelingTutorial 2014209
Dynamic IR
 Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip
Radlinski, Eli Upfal. NIPS 2009
 A Novel Click Model and Its Applications to Online Advertising ,
Zeyuan Allen Zhu et. al.,WSDM 2010
 A contextual-bandit approach to personalized news article
recommendation. Lihong Li,Wei Chu, John Langford, Robert E.
Schapire.WWW, 2010
 Inferring search behaviors using partially observable markov model with
duration (POMD),Yin he et. al.,WSDM, 2011
 No Clicks, No Problem: Using Cursor Movements to Understand and
Improve Search, Jeff Huang et. al., CHI 2011
 Balancing Exploration and Exploitation in Learning to Rank Online,
Katja Hofmann et. al., ECIR, 2011
 Large-ScaleValidation and Analysis of Interleaved Search Evaluation,
Olivier Chapelle et. al.,TOIS 2012
References
Dynamic Information Retrieval ModelingTutorial 2014210
Dynamic IR
 Using ControlTheory for Stable and Efficient Recommender Systems.T.
Jambor, J.Wang, N. Lathia. In:WWW '12, pages 11-20.
 Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al.,
CIKM 2012
 Utilizing query change for session search. D. Guan, S. Zhang, and H.
Yang. In SIGIR ’13, pages 453–462.
 Query Change as Relevance Feedback in Session Search (short paper). S.
Zhang, D. Guan, and H.Yang. In SIGIR 2013.
 Interactive exploratory search for multi page search results. X. Jin, M.
Sloan, and J.Wang. InWWW ’13.
 Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In:
CIKM'2013, pages 1411-1420.
 Win-win search: Dual-agent stochastic game in session search. J. Luo, S.
Zhang, and H.Yang. In SIGIR ’14.
References
Dynamic Information Retrieval ModelingTutorial 2014211
Markov Processes
 A markovian decision process. R. Bellman. Indiana University
Mathematics Journal, 6:679–684, 1957.
 Dynamic Programming. R. Bellman. Princeton University Press,
Princeton, NJ, USA, first edition, 1957.
 Dynamic Programming and Markov Processes. R.A. Howard. MIT Press.
1960
 Linear Programming and Sequential Decisions.Alan S. Manne.
Management Science, 1960
 Statistical Inference for Probabilistic Functions of Finite State Markov
Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical
Statistics 37, 1966
References
Dynamic Information Retrieval ModelingTutorial 2014212
Markov Processes
 Learning to predict by the methods of temporal differences. Richard
Sutton. Machine Learning 3. 1988
 Computationally feasible bounds for partially observed Markov decision
processes.W. Lovejoy. Operations Research 39: 162–175, 1991.
 Q-Learning. Christopher J.C.H.Watkins, Peter Dayan. Machine
Learning. 1992
 Reinforcement learning with replacing eligibility traces. Singh, S. P. &
Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.
 Reinforcement Learning:An Introduction. Richard S. Sutton and
Andrew G. Barto. MIT Press, 1998.
 Planning and acting in partially observable stochastic domains. L.
Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1-
2):99–134, 1998.
References
Dynamic Information Retrieval ModelingTutorial 2014213
Markov Processes
 Finding approximate POMDP solutions through belief compression. N.
Roy. PhDThesis Carnegie Mellon. 2003
 VDCBPI: an approximate scalable algorithm for large scale POMDPs, P.
Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.
 Finding Approximate POMDP solutionsThrough Belief Compression. N.
Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research,
23:1-40,2005.
 Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT
Press. 2005
 Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G.
Gordon and S.Thrun.Volume 27, pages 335-380, 2006
 Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press,
2006.
References
Dynamic Information Retrieval ModelingTutorial 2014214
Markov Processes
 The optimal control of partially observable Markov decision processes over a
finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973
 Modified Policy IterationAlgorithms for Discounted Markov Decision
Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.
 An example of statistical investigation of the text eugene onegin the connection
of samples in chains.A.A. Markov. Science in Context, 19:591–600, 12 2006.
 Learning to Rank for Information Retrieval.Tie-Yan Liu. Springer Science &
Business Media. 2011
 Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-
Bianchi, Paul Fischer. ICML 100-108, 1998
 Multi-armed bandit allocation indices,Wiley, J. C. Gittins. 1989
 Finite-time Analysis of the Multiarmed Bandit Problem, PeterAuer et. al.,
Machine Learning 47, Issue 2-3. 2002.

Weitere ähnliche Inhalte

Was ist angesagt?

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...Edureka!
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.pptImXaib
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 

Was ist angesagt? (20)

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 

Andere mochten auch

Financial Comic Information Retrieval System
Financial Comic Information Retrieval SystemFinancial Comic Information Retrieval System
Financial Comic Information Retrieval SystemJhih-Ming Chen
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsMounia Lalmas-Roelleke
 
خطوات تصميم موقع تعليمي
خطوات تصميم موقع تعليميخطوات تصميم موقع تعليمي
خطوات تصميم موقع تعليميjojo-999
 
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Roman Atachiants
 
CaRR Workshop Keynote Slides
CaRR Workshop Keynote SlidesCaRR Workshop Keynote Slides
CaRR Workshop Keynote SlidesDavid Elsweiler
 
Investigating Alternative Forms of Search
Investigating Alternative Forms of SearchInvestigating Alternative Forms of Search
Investigating Alternative Forms of SearchMax L. Wilson
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search ExperienceMarianne Sweeny
 
The Architecture of Understanding
The Architecture of UnderstandingThe Architecture of Understanding
The Architecture of UnderstandingPeter Morville
 
IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...Max L. Wilson
 
Information Retrieval Models Part I
Information Retrieval Models Part IInformation Retrieval Models Part I
Information Retrieval Models Part IIngo Frommholz
 
Search User Interface Design
Search User Interface DesignSearch User Interface Design
Search User Interface DesignMax L. Wilson
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Roman Atachiants
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 

Andere mochten auch (20)

Financial Comic Information Retrieval System
Financial Comic Information Retrieval SystemFinancial Comic Information Retrieval System
Financial Comic Information Retrieval System
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
خطوات تصميم موقع تعليمي
خطوات تصميم موقع تعليميخطوات تصميم موقع تعليمي
خطوات تصميم موقع تعليمي
 
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...
 
CaRR Workshop Keynote Slides
CaRR Workshop Keynote SlidesCaRR Workshop Keynote Slides
CaRR Workshop Keynote Slides
 
Investigating Alternative Forms of Search
Investigating Alternative Forms of SearchInvestigating Alternative Forms of Search
Investigating Alternative Forms of Search
 
Spike-Engine Flyer
Spike-Engine FlyerSpike-Engine Flyer
Spike-Engine Flyer
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
The Architecture of Understanding
The Architecture of UnderstandingThe Architecture of Understanding
The Architecture of Understanding
 
IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...
 
Information Retrieval Models Part I
Information Retrieval Models Part IInformation Retrieval Models Part I
Information Retrieval Models Part I
 
Search User Interface Design
Search User Interface DesignSearch User Interface Design
Search User Interface Design
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
Dynamic modeling
Dynamic modelingDynamic modeling
Dynamic modeling
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 

Ähnlich wie Dynamic Information Retrieval Modeling Tutorial

HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
IRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine LearningIRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine LearningIRJET Journal
 
Machine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerceMachine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerceIAESIJAI
 
A recommender system-using novel deep network collaborative filtering
A recommender system-using novel deep network collaborative filteringA recommender system-using novel deep network collaborative filtering
A recommender system-using novel deep network collaborative filteringIAESIJAI
 
Agile IT: Modern Architecture for Rapid Mobile App Development
Agile IT: Modern Architecture for Rapid Mobile App DevelopmentAgile IT: Modern Architecture for Rapid Mobile App Development
Agile IT: Modern Architecture for Rapid Mobile App DevelopmentAnyPresence
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalDavidMaxwell77
 
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity SearchFrom “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity SearchMounia Lalmas-Roelleke
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Shrutika Oswal
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsApigee | Google Cloud
 
Bulldozer price prediction using regression model (Research Ethics).pptx
Bulldozer price prediction using regression model (Research Ethics).pptxBulldozer price prediction using regression model (Research Ethics).pptx
Bulldozer price prediction using regression model (Research Ethics).pptxHaxiKhan1
 
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET Journal
 
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...IRJET Journal
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学Xu jiakon
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemSeval Çapraz
 
Report of Previous Project by Yifan Guo
Report of Previous Project by Yifan GuoReport of Previous Project by Yifan Guo
Report of Previous Project by Yifan GuoYifan Guo
 
Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...
Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...
Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...Diagsense ltd
 
IRJET- Popularity based Recommender Sytsem for Google Maps
IRJET-  	  Popularity based Recommender Sytsem for Google MapsIRJET-  	  Popularity based Recommender Sytsem for Google Maps
IRJET- Popularity based Recommender Sytsem for Google MapsIRJET Journal
 

Ähnlich wie Dynamic Information Retrieval Modeling Tutorial (20)

HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
B1802021823
B1802021823B1802021823
B1802021823
 
IRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine LearningIRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine Learning
 
Machine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerceMachine learning based recommender system for e-commerce
Machine learning based recommender system for e-commerce
 
A recommender system-using novel deep network collaborative filtering
A recommender system-using novel deep network collaborative filteringA recommender system-using novel deep network collaborative filtering
A recommender system-using novel deep network collaborative filtering
 
Agile IT: Modern Architecture for Rapid Mobile App Development
Agile IT: Modern Architecture for Rapid Mobile App DevelopmentAgile IT: Modern Architecture for Rapid Mobile App Development
Agile IT: Modern Architecture for Rapid Mobile App Development
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
 
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity SearchFrom “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee Insights
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Bulldozer price prediction using regression model (Research Ethics).pptx
Bulldozer price prediction using regression model (Research Ethics).pptxBulldozer price prediction using regression model (Research Ethics).pptx
Bulldozer price prediction using regression model (Research Ethics).pptx
 
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
 
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 
Report of Previous Project by Yifan Guo
Report of Previous Project by Yifan GuoReport of Previous Project by Yifan Guo
Report of Previous Project by Yifan Guo
 
Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...
Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...
Unlocking the Power of Machine Learning in Predicting Buying Behaviour with P...
 
IRJET- Popularity based Recommender Sytsem for Google Maps
IRJET-  	  Popularity based Recommender Sytsem for Google MapsIRJET-  	  Popularity based Recommender Sytsem for Google Maps
IRJET- Popularity based Recommender Sytsem for Google Maps
 

Kürzlich hochgeladen

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDivyaK787011
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 

Kürzlich hochgeladen (20)

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 

Dynamic Information Retrieval Modeling Tutorial

  • 1. SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling
  • 2. Dynamic Information Retrieval ModelingTutorial 20142
  • 3. Age of Empire Dynamic Information Retrieval ModelingTutorial 20143
  • 4. Dynamic Information Retrieval Dynamic Information Retrieval ModelingTutorial 20144 Documents to explore Information need Observed documents User Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which aren’t, and satisfy their information need.
  • 5. Evolving IR Dynamic Information Retrieval ModelingTutorial 20145  Paradigm shifts in IR as new models emerge  e.g.VSM → BM25 → Language Model  Different ways of defining relationship between query and document  Static → Interactive → Dynamic  Evolution in modeling user interaction with search engine
  • 6. Outline Dynamic Information Retrieval ModelingTutorial 20146  Introduction  Static IR  Interactive IR  Dynamic IR  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 7. Conceptual Model – Static IR Dynamic Information Retrieval ModelingTutorial 20147 Static IR Interactive IR Dynamic IR  No feedback
  • 8. Characteristics of Static IR Dynamic Information Retrieval ModelingTutorial 20148  Does not learn directly from user  Parameters updated periodically
  • 9. Static Information Retrieval Model Dynamic Information Retrieval ModelingTutorial 20149 Learning to Rank
  • 10. Dynamic Information Retrieval ModelingTutorial 201410 Commonly Used Static IR Models BM25 PageRank Language Model
  • 11. Feedback in IR Dynamic Information Retrieval ModelingTutorial 201411
  • 12. Outline Dynamic Information Retrieval ModelingTutorial 201412  Introduction  Static IR  Interactive IR  Dynamic IR  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 13. Conceptual Model – Interactive IR Dynamic Information Retrieval ModelingTutorial 201413 Static IR Interactive IR Dynamic IR  Exploit Feedback
  • 14. Interactive User Feedback Dynamic Information Retrieval ModelingTutorial 201414 Like, dislike, pause, skip
  • 15. Learn the user’s taste interactively! At the same time, provide good recommendations! Dynamic Information Retrieval ModelingTutorial 201415 Interactive Recommender Systems
  • 16. Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201416 Ambiguous Query
  • 17. Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201417 Topic: Car
  • 18. Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201418 Topic:Animal
  • 19. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201419 Click on ‘car’ webpage
  • 20. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201420 Click on ‘Next Page’
  • 21. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201421 Page 2 results: Cars
  • 22. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201422 Click on ‘animal’ webpage
  • 23. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201423 Page 2 results: Animals
  • 24. Example – Dynamic Search Dynamic Information Retrieval ModelingTutorial 201424 Topic: Guitar
  • 25. Example – Dynamic Search Dynamic Information Retrieval ModelingTutorial 201425 Diversified Page 1 Topics: Cars, animals, guitars
  • 26. Toy Example Dynamic Information Retrieval ModelingTutorial 201426  Multi-Page search scenario  User image searches for “jaguar”  Rank two of the four results over two pages: 𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
  • 27. Toy Example – Static Ranking Dynamic Information Retrieval ModelingTutorial 201427  Ranked according to PRP Page 1 Page 2 1. 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49
  • 28. Toy Example – Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201428  Interactive Search  Improve 2nd page based on feedback from 1st page  Use clicks as relevance feedback  Rocchio1 algorithm on terms in image webpage  𝑤 𝑞 ′ = 𝛼𝑤 𝑞 + 𝛽 |𝐷 𝑟| 𝑤 𝑑𝑑∈𝐷 𝑟 − 𝛾 𝐷 𝑛 𝑤 𝑑𝑑∈𝐷 𝑛  New query closer to relevant documents and different to non-relevant documents 1Rocchio, J. J., ’71, Baeza-Yates & Ribeiro-Neto‘99
  • 29. Toy Example – Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201429  Ranked according to PRP and Rocchio Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49 1. * * Click
  • 30. Toy Example – Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201430  No click when searching for animals Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 1. ? ?
  • 31. Toy Example – Value Function Dynamic Information Retrieval ModelingTutorial 201431  Optimize both pages using dynamic IR  Bellman equation for value function  Simplified example:  𝑉 𝑡 𝜃 𝑡 , Σ 𝑡 = max 𝑠 𝑡 𝜃𝑠 𝑡 + 𝐸(𝑉 𝑡+1 𝜃 𝑡+1 , Σ 𝑡+1 𝐶 𝑡 )  𝜃 𝑡 , Σ 𝑡 = relevance and covariance of documents for page 𝑡  𝐶 𝑡 = clicks on page 𝑡  𝑉 𝑡 =‘value’ of ranking on page 𝑡  Maximize value over all pages based on estimating feedback
  • 32. 1 0.8 0.1 0 0.8 1 0.1 0 0.1 0.1 1 0.95 0 0 0.95 1 Toy Example - Covariance Dynamic Information Retrieval ModelingTutorial 201432  Covariance matrix represents similarity between images
  • 33. Toy Example – Myopic Value Dynamic Information Retrieval ModelingTutorial 201433  For myopic ranking, 𝑉2 = 16.380 Page 1 2. 1.
  • 34. Toy Example – Myopic Ranking Dynamic Information Retrieval ModelingTutorial 201434  Page 2 ranking stays the same regardless of clicks Page 1 Page 2 2. 1. 2. 1.
  • 35. Toy Example – Optimal Value Dynamic Information Retrieval ModelingTutorial 201435  For optimal ranking, 𝑉2 = 16.528 Page 1 2. 1.
  • 36. Toy Example – Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201436  If car clicked, Jaguar logo is more relevant on next page Page 1 Page 2 2. 1. 2. 1.
  • 37. Toy Example – Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201437  In all other scenarios, rank animal first on next page Page 1 Page 2 2. 1. 2. 1.
  • 38. Interactive vs Dynamic IR Dynamic Information Retrieval ModelingTutorial 201438 • Treats interactions independently • Responds to immediate feedback • Static IR used before feedback received • Optimizes over all interaction • Long term gains • Models future user feedback • Also used at beginning of interaction Interactive Dynamic
  • 39. Outline Dynamic Information Retrieval ModelingTutorial 201439  Introduction  Static IR  Interactive IR  Dynamic IR  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 40. Conceptual Model – Dynamic IR Dynamic Information Retrieval ModelingTutorial 201440 Static IR Interactive IR Dynamic IR  Explore and exploit Feedback
  • 41. Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201441 Rich interactions  Query formulation  Document clicks  Document examination  eye movement  mouse movements  etc.
  • 42. Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201442 Temporal dependency clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 …… …… Dn qn Cn I information need iteration 1 iteration 2 iteration n
  • 43. Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201443 Overall goal Optimize over all iterations for goal IR metric or user satisfaction Optimal policy
  • 44. Dynamic IR Dynamic Information Retrieval ModelingTutorial 201444  Dynamic IR explores actions  Dynamic IR learns from user and adjusts its actions  May hurt performance in a single stage, but improves over all stages
  • 45. Applications to IR Dynamic Information Retrieval ModelingTutorial 201445  Dynamics found in lots of different aspects of IR  Dynamic Users  Users change behaviour over time, user history  Dynamic Documents  Information Filtering, document content change  Dynamic Queries  Changing query definition i.e.‘Twitter’  Dynamic Information Needs  Topic ontologies evolve over time  Dynamic Relevance  Seasonal/time of day change in relevance
  • 46. User Interactivity in DIR Dynamic Information Retrieval ModelingTutorial 201446  Modern IR interfaces  Facets  Verticals  Personalization  Responsive to particular user  Complex log data  Mobile  Richer user interactions  Ads  Adaptive targeting
  • 47. Big Data Dynamic Information Retrieval ModelingTutorial 201447  Data set sizes are always increasing  Computational footprint of learning to rank  Rich, sequential data 1Yin He et. al, ’11  Complex user model behaviour found in data, takes into account reading, skipping and re-reading behaviours1  Uses a POMDP Example
  • 48. Online Learning to Rank Dynamic Information Retrieval ModelingTutorial 201448  Learning to rank iteratively on sequential data  Clicks as implicit user feedback/preference  Often uses multi-armed bandit techniques 1Katja Hofmann et. al., ’11 2YisongYue et. al.,‘09  Uses click models to interpret clicks and a contextual bandit to improve learning1  Pairwise comparison of rankings using duelling bandits formulation2 Example
  • 49. Evaluation Dynamic Information Retrieval ModelingTutorial 201449  Use complex user interaction data to assess rankings  Compare ranking techniques in online testing  Minimise user dissatisfaction 1Jeff Huang et. al.,‘11 2Olivier Chapelle et. al.,‘12  Modelled cursor activity and correlated with eye tracking to validate good or bad abandonment1  Interleave search results from two ranking algorithms to determine which is better2 Example
  • 50. Filtering and News Dynamic Information Retrieval ModelingTutorial 201450  Adaptive techniques to personalize information filtering or news recommendation  Understand the complex dynamics of real world events in search logs  Capture temporal document change1 1Dennis Fetterly et. al.,‘03 2Stephen Robertson,‘02 3Jure Leskovec et. al.,‘09  Uses relevance feedback to adapt threshold sensitivity over time in information filtering to maximise overal utility1  Detected patterns and memes in news cycles and modeled how information spreads2 Example
  • 51. Advertising Dynamic Information Retrieval ModelingTutorial 201451  Behavioural targeting and personalized ads  Learn when to display new ads  Maximise profit from available ads 1ShuaiYuan et. al.,‘12 2ZeyuanAllen Zhu et. al.,‘10  Uses a POMDP and ad correlation to find the optimal ad to display to a user1  Dynamic click model that can interpret complex user behaviour in logs and apply results to tail queries and unseen ads2 Example
  • 52. Outline Dynamic Information Retrieval ModelingTutorial 201452  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 53. Outline Dynamic Information Retrieval ModelingTutorial 201453  Introduction  Theory and Models  Why not use supervised learning  Markov Models  Session Search  Reranking  Evaluation
  • 54. Why not use Supervised Learning for Dynamic IR Modeling? Dynamic Information Retrieval ModelingTutorial 201454  Lack of enough training data  Dynamic IR problems contain a sequence of dynamic interactions  E.g. a series of queries in session  Rare to find repeated sequences (close to zero)  Even in large query logs (WSCD 2013 & 2014, query logs fromYandex)  Chance of finding repeated adjacent query pairs is also low Dataset Repeated Adjacent Query Pairs Total Adjacent Query Pairs Repeated Percentage WSCD 2013 476,390 17,784,583 2.68% WSCD 2014 1,959,440 35,376,008 5.54%
  • 55. Our Solution Dynamic Information Retrieval ModelingTutorial 201455 Try to find an optimal solution through a sequence of dynamic interactions Trial and Error: learn from repeated, varied attempts which are continued until success No Supervised Learning
  • 56. Trial and Error Dynamic Information Retrieval ModelingTutorial 201456  q1 – "dulles hotels"  q2 – "dulles airport"  q3 – "dulles airport location"  q4 – "dulles metrostop"
  • 57. Dynamic Information Retrieval ModelingTutorial 201457  Rich interactions Query formulation, Document clicks, Document examination, eye movement, mouse movements, etc.  Temporal dependency  Overall goal Recap – Characteristics of Dynamic IR
  • 58. Dynamic Information Retrieval ModelingTutorial 201458  Model interactions, which means it needs to have place holders for actions;  Model information need hidden behind user queries and other interactions;  Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;  Represent Markov properties to handle the temporal dependency. What is a Desirable Model for Dynamic IR A model inTrial and Error setting will do! A Markov Model will do!
  • 59. Outline Dynamic Information Retrieval ModelingTutorial 201459  Introduction  Theory and Models  Why not use supervised learning  Markov Models  Session Search  Reranking  Evaluation
  • 60. Markov Process  Markov Property1 (the “memoryless” property) for a system, its next state depends on its current state. Pr(Si+1|Si,…,S0)=Pr(Si+1|Si)  Markov Process a stochastic process with Markov property. e.g. Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,‘06 s0 s1 …… si ……si+1
  • 61. Dynamic Information Retrieval ModelingTutorial 201461  Markov Chain  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-armed Bandit Family of Markov Models
  • 62. A Pagerank(A)  Discrete-time Markov process  Example: Google PageRank1 Markov Chain B Pagerank(B) 𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘 𝑆 = 1 − 𝛼 𝑁 + 𝛼 𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘(𝑌) 𝐿(𝑌) 𝑌∈Π # of pages # of outlinks pages linked to S Dynamic Information Retrieval ModelingTutorial 201462 D Pagerank(D) C Pagerank(C) E Pagerank(E) Random jump factor 1L. Page et. al.,‘99 The stable state distribution of such an MC is PageRank  State S – web page  Transition probability M  PageRank: how likely a random web surfer will land on a page (S, M)
  • 63. Hidden Markov Model  A Markov chain that states are hidden and observable symbols are emitted with some probability according to its states1. Dynamic Information Retrieval ModelingTutorial 201463 s0 s1 s2 …… o0 o1 o2 p0 𝑒0 p1 p2 𝑒1 𝑒2 Si– hidden state pi -- transition probability oi --observation ei --observation probability (emission probability) 1Leonard E. Baum et. al.,‘66 (S, M, O, e)
  • 64. An HMM example for IR Construct an HMM for each document1 Dynamic Information Retrieval ModelingTutorial 201464 s0 s1 s2 …… t0 t1 t2 p0 𝑒0 p1 p2 𝑒1 𝑒2 Si– “Document” or “General English” pi –a0 or a1 ti – query term ei – Pr(t|D) or Pr(t|GE) P(D|q)∝ (𝑎0 𝑃 𝑡 𝐺𝐸 + 𝑎1 𝑃(𝑡|𝐷))𝑡∈𝑞 Document-to-query relevance 1Miller et. al.‘99 query
  • 65.  MDP extends MC with actions and rewards1 si– state ai – action ri – reward pi – transition probability p0 p1 p2 Markov Decision Process Dynamic Information Retrieval ModelingTutorial 201465 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 1R. Bellman,‘57 (S, M, A, R, γ)
  • 66. Definition of MDP  A tuple (S, M, A, R, γ)  S : state space  M: transition matrix Ma(s, s') = P(s'|s, a)  A: action space  R: reward function R(s,a) = immediate reward taking action a at state s  γ: discount factor, 0< γ ≤1  policy π π(s) = the action taken at state s  Goal is to find an optimal policy π* maximizing the expected total rewards. Dynamic Information Retrieval ModelingTutorial 201466
  • 67. Policy Policy: (s) = a According to which, select an action a at state s. (s0) =move right and ups0 (s1) =move right and ups1 (s2) = move rights2 Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrin’s ML lecture]
  • 68. Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by   [0,1) Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrin’s ML lecture]
  • 69. Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by   [0,1) s1 R(s1) s1’’ s1’ R(s1’) R(s1’’) Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrin’s ML lecture]
  • 70. Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by   [0,1) s1 R(s1) s1’’ s1’ R(s1’) R(s1’’) (s1) R(s2) s2 (s1’) (s1’’) s2’’ s2’ R(s2’) R(s2’’) Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrin’s ML lecture]
  • 71. Computing the value of a policy Dynamic Information Retrieval ModelingTutorial 201471 V(s0) = 𝐸 𝜋 [𝑅 𝑠0, 𝑎 + 𝛾𝑅 𝑠1, 𝑎 + 𝛾2 𝑅 𝑠2, 𝑎 + 𝛾3 𝑅 𝑠3, 𝑎 + ⋯ ] =𝐸 𝜋[𝑅 𝑠0, 𝑎 + 𝛾 𝛾 𝑡−1 𝑅(𝑠𝑡, 𝑎)∞ 𝑡=1 ] =𝑅 𝑠0, 𝑎 + 𝛾𝐸 𝜋 [ 𝛾 𝑡−1 𝑅(𝑠𝑡, 𝑎)∞ 𝑡=1 ] =𝑅 𝑠0, 𝑎 + 𝛾 𝑀 𝜋 𝑠 (𝑠, 𝑠′) 𝑉(𝑠′)𝑠′ Value function A possible next state The current state
  • 72. Optimality — Bellman Equation  The Bellman equation1 to MDP is a recursive definition of the optimal value function V*(.) 𝑉∗ s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201472  Optimal Policy π∗ s = arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′) 𝑠′ 1R. Bellman,‘57 state-value function
  • 73. Optimality — Bellman Equation  The Bellman equation can be rewritten as 𝑉∗ 𝑠 = max a 𝑄(𝑠, 𝑎) 𝑄(𝑠, 𝑎) = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201473  Optimal Policy π∗ s = arg 𝑚𝑎𝑥 𝑎 𝑄 𝑠, 𝑎 action-value function Relationship betweenV and Q
  • 74. MDP algorithms Dynamic Information Retrieval ModelingTutorial 201474  Value Iteration  Policy Iteration  Modified Policy Iteration  Prioritized Sweeping  Temporal Difference (TD) Learning  Q-Learning Model free approaches Model-based approaches [Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98, Richard Sutton,‘88,Watkins,‘92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrin’s ML lecture]
  • 75. Value Iteration  Initialization Initialize 𝑉0 𝑠 arbitrarily  Loop  Iteration 𝑉𝑖+1 𝑠 ← max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′ π s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′  Stopping criteria  π s is good enough Dynamic Information Retrieval ModelingTutorial 201475 1Bellman,‘57
  • 76. Greedy Value Iteration  Initialization Initialize 𝑉0 𝑠 arbitrarily  Iteration 𝑉𝑖+1 𝑠 ← max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′  Stopping criteria ∀𝑠 𝑉𝑖+1 𝑠 − 𝑉𝑖 𝑠 < ε  Optimal policy π s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201476 1Bellman,‘57
  • 77. Greedy Value Iteration 1. For each state s∈S Initialize V0(s) arbitrarily End for 2. 𝑖 ← 0 3. Repeat 3.1 𝑖 ← 𝑖 + 1 3.2 For each 𝑠 ∈ 𝑆 𝑉𝑖 𝑠 ← max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖−1(𝑠′)𝑠′ end for until ∀𝑠 𝑉𝑖 𝑠 − 𝑉𝑖−1 𝑠 < ε 4. For each 𝑠 ∈ 𝑆 π s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′ end for Algorithm Dynamic Information Retrieval ModelingTutorial 201477
  • 78. V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6 V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) } =max{3+0.96*4.6, 6+0.96*8.0} =max{7.416, 13.68} =13.68 Greedy Value Iteration 𝑉 s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉(𝑠′) 𝑠′ V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4 V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8 Dynamic Information Retrieval ModelingTutorial 201478 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1 a2
  • 79. Greedy Value Iteration 𝑉 s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201479 i V(i)(S1) V(i)(S2) V(i)(S3) 0 6 4 8 1 13.680 9.760 13.376 2 18.841 17.133 20.380 3 25.565 22.087 25.759 … … … … 200 168.039 165.316 168.793 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1a2 a1 π S1 π S 𝟐 π S 𝟑 a2 a1 a1
  • 80. Policy Iteration  Initialization 𝑉π0 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦  Iteration (over i )  Policy Evaluation 𝑉π 𝑖 𝑠 ∞ ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉π 𝑖 (𝑠′) 𝑠′  Policy Improvement π𝑖+1 s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉π 𝑖 (𝑠′)𝑠′  Stop criteria Policy stops changing Dynamic Information Retrieval ModelingTutorial 201480 1Howard ,‘60
  • 81. Policy Iteration 1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat For each 𝑠 ∈ 𝑆 𝑉′(𝑠) ← 𝑉(𝑠) 𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′ End for until ∀𝑠 𝑉 𝑠 − 𝑉′ 𝑠 < ε 2.2 For each 𝑠 ∈ 𝑆 π𝑖+1 s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′) 𝑠′ End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1 Algorithm Dynamic Information Retrieval ModelingTutorial 201481
  • 82. Modified Policy Iteration  The “Policy Evaluation” step in Policy Iteration is time- consuming, especially when the state space is large.  The Modified Policy Iteration calculates an approximated policy evaluation by running just a few iterations Dynamic Information Retrieval ModelingTutorial 201482 Modified Policy Iteration Policy Iteration GreedyValue Iterationk=1 k=∞
  • 83. Modified Policy Iteration 1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat k times For each 𝑠 ∈ 𝑆 𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′ End for 2.2 For each 𝑠 ∈ 𝑆 π𝑖+1 s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′) 𝑠′ End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1 Algorithm Dynamic Information Retrieval ModelingTutorial 201483
  • 84. MDP algorithms Dynamic Information Retrieval ModelingTutorial 201484  Value Iteration  Policy Iteration  Modified Policy Iteration  Prioritized Sweeping  Temporal Difference (TD) Learning  Q-Learning Model free approaches Model-based approaches [Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98, Richard Sutton,‘88,Watkins,‘92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrin’s ML lecture]
  • 85. Temporal Difference Learning Dynamic Information Retrieval ModelingTutorial 201485  Monte Carlo Sampling can be used for model-free policy iteration  Estimate 𝑉 𝜋 s in “Policy Evaluation” by the average reward of trajectories from s  However, on the trajectories, some of them can be reused  So, we estimate them by an expectation over next state 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + γ𝐸 𝑉 𝜋 𝑠′ |𝑠, 𝑎  The simplest estimation: 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + 𝛾𝑉 𝜋 s′  A smoothed version: 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 s′ + (1 − 𝛼) 𝑉 𝜋 𝑠  TD-Learning rule: 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠)  r is the immediate reward, α is the learning rate Temporal difference Richard Sutton,‘88 Singh & Sutton,‘96 Sutton & Barto,‘98
  • 86. Dynamic Information Retrieval ModelingTutorial 201486 1. For each state s∈S Initialize V 𝜋(s) arbitrarily End for 2. For each step in the state sequence 2.1 Initialize s 2.2 repeat 2.2.1 take action a at state s according to 𝜋 2.2.2 observe immediate reward r and the next state 𝑠′ 2.2.3 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠) 2.2.4 𝑠 ← 𝑠′ Until s is a terminal state End for Algorithm Temporal Difference Learning
  • 87. Q-Learning Dynamic Information Retrieval ModelingTutorial 201487  TD-Learning rule  Q-learning rule 𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max 𝑎′ 𝑄 𝑠′, 𝑎′ − 𝑄(𝑠, 𝑎) 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠) 𝑉 𝑠 = max a 𝑄(𝑠, 𝑎) 𝜋∗ 𝑠 = arg 𝑚𝑎𝑥 𝑎 𝑄∗ (𝑠, 𝑎) 𝑄∗ 𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′) max 𝑎′ 𝑄∗ (𝑠′ , 𝑎′) 𝑠′
  • 88. Q-Learning Dynamic Information Retrieval ModelingTutorial 201488 1. For each state s∈S and a∈A initialize Q0(s,a) arbitrarily End for 2. 𝑖 ← 0 3. For each step in the state sequence 3.1 Initialize s 3.2 Repeat 3.2.1 𝑖 ← 𝑖 + 1 3.2.2 select an action a at state s according to Qi-1 3.2.3 take action a, observe immediate reward r and the next state 𝑠′ 3.2.4 𝑄𝑖 𝑠, 𝑎 ← 𝑄𝑖−1 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max 𝑎′ 𝑄𝑖−1 𝑠′ , 𝑎′ − 𝑄𝑖−1(𝑠, 𝑎) 3.2.5 𝑠 ← 𝑠′ Until s is a terminal state End for 4. For each 𝑠 ∈ 𝑆 π s ← arg 𝑚𝑎𝑥 𝑎 𝑄𝑖 𝑠, 𝑎 End for Algorithm
  • 89. Apply an MDP to an IR Problem Dynamic Information Retrieval ModelingTutorial 201489  We can model IR systems using a Markov Decision Process  Is there a temporal component?  States –What changes with each time step?  Actions – How does your system change the state?  Rewards – How do you measure feedback or effectiveness in your problem at each time step?  Transition Probability – Can you determine this?  If not, then model free approach is more suitable
  • 90. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval ModelingTutorial 201490  User agent in session search  States – user’s relevance judgement  Action – new query  Reward – information gained
  • 91. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval ModelingTutorial 201491  Search engine’s perspective  What if we can’t directly observe user’s relevance judgement?  Click ≠ relevance ? ? ? ?
  • 92. Dynamic Information Retrieval ModelingTutorial 201492  Markov Chain  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-armed Bandit Family of Markov Models
  • 93. POMDP Model Dynamic Information Retrieval ModelingTutorial 201493 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2  Hidden states  Observations  Belief 1R. D. Smallwood et. al.,‘73 o1 o2 o3
  • 94. POMDP Definition Dynamic Information Retrieval ModelingTutorial 201494  A tuple (S, M,A, R, γ, O, Θ, B)  S : state space  M: transition matrix  A: action space  R: reward function  γ: discount factor, 0< γ ≤1  O: observation set an observation is a symbol emitted according to a hidden state.  Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a).  B: belief space Belief is a probability distribution over hidden states.
  • 95. Dynamic Information Retrieval ModelingTutorial 201495  The agent uses a state estimator to update its belief about the hidden states b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)  b′ s′ = P s′ o′ , a, b = 𝑃(𝑠′,𝑜′|𝑎,𝑏) P(𝑜′|𝑎,𝑏) = Θ(𝑠′, 𝑎, 𝑜′) 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)𝑠 𝑃(𝑜′|𝑎, 𝑏) POMDP → Belief Update
  • 96. Dynamic Information Retrieval ModelingTutorial 201496  The Bellman equation for POMDP 𝑉 𝑏 = max 𝑎 𝑟 𝑏, 𝑎 + 𝛾 𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′) 𝑜′  A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)  B : the continuous belief space  𝑀′: transition function 𝑀 𝑎 ′ (𝑏, 𝑏′)= 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)𝑜∈𝑂 where 1 𝑎,𝑜′ 𝑏′ , 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′ 0, 𝑒𝑙𝑠𝑒 .  A: action space  r: reward function r(b, a)= 𝑏 𝑠 𝑅(𝑠, 𝑎)𝑠∈𝑆 POMDP → Bellman Equation
  • 97. Dynamic Information Retrieval ModelingTutorial 201497 The optimal policy of a POMDP The optimal policy of its belief MDP 1L. Kaelbling et. al., ’98 A variation of the value iteration algorithm Solving POMDPs – The Witness Algorithm
  • 98. Policy Tree Dynamic Information Retrieval ModelingTutorial 201498 • A policy tree of depth i is an i-step non-stationary policy • As if we run value iteration until the ith iteration a(h) ok(h) ok a11 a21 a2k a2l … … … … … … … … … … … o1 ol …aik … a(i-1)k ai1 ail o1 olok i steps to go i-1 steps to go 2 steps to go 1 step to go
  • 99. Value of a Policy Tree Dynamic Information Retrieval ModelingTutorial 201499  Can only determine the value of a policy tree h from some belief state b, because it never knows the exact state. 𝑉ℎ 𝑏 = 𝑏(𝑠)𝑉ℎ(𝑠)𝑠∈𝑆  𝑉ℎ 𝑠 = 𝑅 𝑠, 𝑎 ℎ + 𝛾 𝑀 𝑎 ℎ (𝑠, 𝑠′) Θ(𝑠′, 𝑎 ℎ , 𝑜𝑖)𝑉𝑜 𝑘 ℎ (𝑠′)𝑜 𝑘∈𝑂𝑠′∈𝑆 the action at the root node of h the (i-1)-step subtree associated with ok under the root node of h
  • 100. Idea of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014100  For each action a, compute Γ𝑖 𝑎 , the set of candidate i-step policy trees with action a at their roots  The optimal value function at the ith step, 𝑉𝑖 ∗ (b), is the upper surface of the value functions of all i-step policy trees.
  • 101. Optimal value function Dynamic Information Retrieval ModelingTutorial 2014101  Geometrically, 𝑉𝑖 ∗ (b) is piecewise linear and convex. An example for a two-state POMDP b(s1)+b(s2)=1 Simplex constraint The belief space is one-dimensional Vh2(b) Vh3(b) Vh1(b) Vh5(b) Vh4(b) 𝑉𝑖 ∗ 𝑏 = max ℎ∈H 𝑉ℎ 𝑏 Pruning the Set of PolicyTrees
  • 102. Outlines of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014102 Algorithm 1.𝐻1 ←{} 2. i ← 1 3. Repeat 3.1 i ← i+1 3.2 For each a in A Γ𝑖 𝑎 ← witness(𝐻i−1, a) end for 3.3 Prune Γ𝑖 𝑎 𝑎 to get 𝐻i until 𝑠𝑢𝑝 𝑏|Vi(b) − Vi−1(b)| < 𝜀 the inner loop
  • 103. Inner Loop of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014103 Inner loop of the witness algorithm 1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add ℎi to an agenda. 2. In each iteration 2.1 Select a policy tree ℎ 𝑛𝑒𝑤 from the agenda. 2.2 Look for a witness point b using Za and ℎ 𝑛𝑒𝑤. 2.3 If find such a witness point b, 2.3.1 Calculate the best policy tree ℎ 𝑏𝑒𝑠𝑡 for b. 2.3.2 Add ℎ 𝑏𝑒𝑠𝑡 to Za. 2.3.3 Add all the alternative trees of ℎ 𝑏𝑒𝑠𝑡 to the agenda. 2.4 Else remove ℎ 𝑛𝑒𝑤 from the agenda. 3. Repeat the above iteration until the agenda is empty.
  • 104. Other Solutions Dynamic Information Retrieval ModelingTutorial 2014104  QMDP1  MC-POMDP (Monte Carlo POMDP)2  Grid BasedApproximation3  Belief Compression4 …… 1 Thrun et. al.,‘06 2 Thrun et. al.,‘05 3 Lovejoy,‘91 4 Roy,‘03
  • 105. Dynamic Information Retrieval ModelingTutorial 2014105 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, User’s decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets Applying POMDP to Dynamic IR
  • 106. Session Search Example - States SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration  scooter price ⟶ scooter stores  Hartford visitors ⟶ Hartford Connecticut tourism  Philadelphia NYC travel ⟶ Philadelphia NYC train  distance NewYork Boston ⟶ maps.bing.com q0 106 [ J. Luo ,et al., ’14]
  • 107. Session Search Example - Actions (Au, Ase)  User Action(Au)  Add query terms (+Δq)  Remove query terms (-Δq)  keep query terms (qtheme)  clicked documents  SAT clicked documents  Search Engine Action(Ase)  increase/decrease/keep term weights,  Switch on or switch off query expansion  Adjust the number of top documents used in PRF  etc. 107 [ J. Luo et al., ’14]
  • 108. Multi Page Search Example - States & Actions Dynamic Information Retrieval ModelingTutorial 2014108 State: Relevance of document Action: Ranking of documents Observation: Clicks Belief: Multivariate Guassian Reward: DCG over 2 pages [Xiaoran Jin et. al., ’13]
  • 109. SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
  • 110. Dynamic Information Retrieval ModelingTutorial 2014110  Markov Chain  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-Armed Bandit Family of Markov Models
  • 111. Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014111 …… …… Which slot machine should I select in this round? Reward
  • 112. Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014112 I won! Is this the best slot machine? Reward
  • 113. MAB Definition Dynamic Information Retrieval ModelingTutorial 2014113  A tuple (S,A, R, B) S : hidden reward distribution of each bandit A: choose which bandit to play R: reward for playing bandit B: belief space, our estimate of each bandit’s distribution
  • 114. Comparison with Markov Models Dynamic Information Retrieval ModelingTutorial 2014114  Single state Markov Decision Process No transition probability  Similar to POMDP in that we maintain a belief state  Action = choose a bandit, does not affect state  Does not‘plan ahead’ but intelligently adapts  Somewhere between interactive and dynamic IR
  • 115. Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014115 …… …… Markov Process 1 Markov Process 2 Markov Process k Which slot machine should I select in this round? Reward
  • 116. Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014116 …… …… Markov Process 1 Markov Process 2 Markov Process k Markov Process Action Which slot machine should I select in this round? Reward
  • 117. MAB Policy Reward Dynamic Information Retrieval ModelingTutorial 2014117  MAB algorithm describes a policy 𝜋 for choosing bandits  Maximise rewards from chosen bandits over all time steps  Minimize regret  𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))𝑇 𝑡=1  Cumulative difference between optimal reward and actual reward
  • 118. Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014118  Exploration  Try out bandits to find which has highest average reward  Exploitation  Too much exploration leads to poor performance  Play bandits that are known to pay out higher reward on average  MAB algorithms balance exploration and exploitation  Start by exploring more to find best bandits  Exploit more as best bandits become known
  • 119. Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014119
  • 120. MAB – Index Algorithms Dynamic Information Retrieval ModelingTutorial 2014120  Gittens index1  Play bandit with highest‘Dynamic Allocation Index’  Modelled using MDP but suffers‘curse of dimensionality’  𝜖-greedy2  Play highest reward bandit with probability 1 − ϵ  Play random bandit with probability 𝜖  UCB (Upper Confidence Bound)3  Play bandit 𝑖 with highest 𝑥𝑖 + 2 ln 𝑡 𝑇 𝑖  Chances of playing infrequently played bandits increases over time 1J. C. Gittins.‘89 2Nicolò Cesa-Bianchi et. al.,‘98 3P.Auer et. al.,‘02
  • 121. MAB use in IR Dynamic Information Retrieval ModelingTutorial 2014121  Choosing ads to display to users1  Each ad is a bandit  User click through rate is reward  Recommending news articles2  News article is a bandit  Similar to Information Filtering case  Diversifying search results3  Each rank position is an MAB dependent on higher ranks  Documents are bandits chosen by each rank 1Deepayan Chakrabarti et. al. ,‘09 2Lihong Li et. al., ’10 3Radlinski et. al.,‘08
  • 122. MAB Variations Dynamic Information Retrieval ModelingTutorial 2014122  Contextual Bandits1  World has some context 𝑥 ∈ 𝑋 (i.e. user location)  Learn policy 𝜋: 𝑋 → 𝐴 that maps context to arms (online or offline)  Duelling Bandits2  Play two (or more) bandits at each time step  Observe relative reward rather than absolute  Learn order of bandits  Mortal Bandits3  Value of bandits decays over time  Exploitation > exploration 1Lihong Li et. al.,‘10 2YisongYue et. al.,‘09 3Deepayan Chakrabarti et. al. ,‘09
  • 123. Comparison of Markov Models Dynamic Information Retrieval ModelingTutorial 2014123  MC – a fully observable stochastic process  HMM – a partially observable stochastic process  MDP – a fully observable decision process  MAB – a decision process, either fully or partially observable  POMDP – a partially observable decision process actions rewards states MC No No Observable HMM No No Unobservable MDP Yes Yes Observable POMDP Yes Yes Unobservable MAB Yes Yes Fixed
  • 124. SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
  • 125. Outline Dynamic Information Retrieval ModelingTutorial 2014125  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 126. TREC Session Tracks (2010-2012)  Given a series of queries {q1,q2,…,qn}, top 10 retrieval results {D1, … Di-1 } for q1 to qi-1, and click information  The task is to retrieve a list of documents for the current/last query, qn  Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description)  no need to segment the sessions 126
  • 127. 1.pocono mountains pennsylvania 2.pocono mountains pennsylvania hotels 3.pocono mountains pennsylvania things to do 4.pocono mountains pennsylvania hotels 5.pocono mountains camelbeach 6.pocono mountains camelbeach hotel 7.pocono mountains chateau resort 8.pocono mountains chateau resort attractions 9.pocono mountains chateau resort getting to 10.chateau resort getting to 11.pocono mountains chateau resort directions TREC 2012 Session 6 127 Information needs: You are planning a winter vacation to the Pocono Mountains region in Pennsylvania in the US.Where will you stay?What will you do while there? How will you get there? In a session, queries change constantly
  • 128. Query change is an important form of feedback  We define query change as the syntactic editing changes between two adjacent queries:  includes  , added terms  , removed terms  The unchanged/shared terms are called:  , theme term 1 iii qqq iq 128 iq iq iq themeq q1 = “bollywood legislation” q2 = “bollywood law” --------------------------------------- ThemeTerm = “bollywood” Added (+Δq) = “law” Removed (-Δq) = “legislation”
  • 129. Where do these query changes come from?  GivenTREC Session settings, we consider two sources of query change:  the previous search results that a user viewed/read/examined  the information need  Example:  Kurosawa  Kurosawa wife  `wife’ is not in any previous results, but in the topic description  However, knowing information needs before search is difficult to achieve 129
  • 130. Previous search results could influence query change in quite complex ways  Merck lobbyists  Merck lobbying US policy  D1 contains several mentions of‘policy’, such as  “A lobbyist who until 2004 worked as senior policy advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”  These mentions are about Canadian policies; while the user adds US policy in q2  Our guess is that the user might be inspired by‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’  Therefore, for the added terms `US policy’,‘US’ is the novel term here, and‘policy’ is not since it appeared in D1.  The two terms should be treated differently 130
  • 131.  We propose to model session search as a Markov decision process (MDP)  Two agents: the User and the Search Engine Dynamic Information Retrieval ModelingTutorial 2014131  Environments Search results  States Queries  Actions  User actions: Add/remove/unchange the query terms  Search Engine actions: Increase/ decrease /remain term weights Applying MDP to Session Search
  • 132. Search Engine Agent’s Actions ∈ Di−1 action Example qtheme Y increase “pocono mountain” in s6 N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction +∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck lobbyists US policy N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy −∆q Y decrease ‘reaction’ in s28, france world cup 98 reaction → france world cup 98 N No change ‘legislation’ in s32, bollywood legislation →bollywood law 132
  • 133. Query Change retrieval Model (QCM)  Bellman Equation gives the optimal value for an MDP:  The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 133 V* (s) = max a R(s,a) + g P(s' | s,a) s' å V* (s')   a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1  Document relevant score Query Transition model Maximum past relevanceCurrent reward/relevanc e score
  • 134. Calculating the Transition Model )|(log)|( )|(log)()|(log)|( )|(log)]|(1[+d)|P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii                    134 • According to Query Change and Search Engine Actions Current reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added terms Decrease weights for old added terms
  • 135. Maximizing the Reward Function  Generate a maximum rewarded document denoted as d* i-1, from Di-1  That is the document(s) most relevant to qi-1  The relevance score can be calculated as 𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − {1 − 𝑃(𝑡|𝑑𝑖−1)} 𝑡∈𝑞 𝑖−1 𝑃 𝑡 𝑑𝑖−1 = #(𝑡,𝑑 𝑖−1) |𝑑 𝑖−1|  From several options, we choose to only use the document with top relevance max Di-1 P(qi-1 | Di-1) 135
  • 136. Scoring the Entire Session  The overall relevance score for a session of queries is aggregated recursively : Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d) = Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)] = gn-i i=1 n å Score(qi, d) 136
  • 137. Experiments  TREC 2011-2012 query sets, datasets  ClubWeb09 Category B 137
  • 138. Search Accuracy (TREC 2012)  nDCG@10 (official metric used inTREC) Approach nDCG@10 %chg MAP %chg Lemur 0.2474 -21.54% 0.1274 -18.28% TREC’12 median 0.2608 -17.29% 0.1440 -7.63% Our TREC’12 submission 0.3021 −4.19% 0.1490 -4.43% TREC’12 best 0.3221 0.00% 0.1559 0.00% QCM 0.3353 4.10%† 0.1529 -1.92% QCM+Dup 0.3368 4.56%† 0.1537 -1.41% 138
  • 139. Search Accuracy (TREC 2011)  nDCG@10 (official metric used inTREC) Approach nDCG@10 %chg MAP %chg Lemur 0.3378 -23.38% 0.1118 -25.86% TREC’11 median 0.3544 -19.62% 0.1143 -24.20% TREC’11 best 0.4409 0.00% 0.1508 0.00% QCM 0.4728 7.24%† 0.1713 13.59%† QCM+Dup 0.4821 9.34%† 0.1714 13.66%† Our TREC’12 submission 0.4836 9.68%† 0.1724 14.32%† 139
  • 140. Search Accuracy for Different Session Types  TREC 2012 Sessions are classified into:  Product: Factual / Intellectual  Goal quality: Specific / Amorphous Intellec tual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10% 140 - Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamics
  • 141. Outline Dynamic Information Retrieval ModelingTutorial 2014141  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 142. Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014142
  • 143. Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014143 Page 1 Page 2 2. 1. 2. 1.
  • 144. Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014144  No UI Changes  Interactivity is Hidden  Private, performed in browser
  • 145. Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014145 Page 1 • Diverse Ranking • Maximise learning potential • Exploration vs Exploitation Page 2 • Clickthroughs or explicit ratings • Respond to feedback from page 1 • Personalized
  • 146. Model Dynamic Information Retrieval ModelingTutorial 2014146
  • 147. Model Dynamic Information Retrieval ModelingTutorial 2014147  𝑁 𝜃1, Σ1  𝜃1 -prior estimate of relevance  Σ1 - prior estimate of covariance  Document similarity  Topic Clustering
  • 148. Model Dynamic Information Retrieval ModelingTutorial 2014148  Rank action for page 1
  • 149. Model Dynamic Information Retrieval ModelingTutorial 2014149
  • 150. Model Dynamic Information Retrieval ModelingTutorial 2014150  Feedback from page 1  𝒓 ~ 𝑁(𝜃𝒔 1 , Σ 𝒔 1 )
  • 151. Model Dynamic Information Retrieval ModelingTutorial 2014151  Update estimates using 𝒓1  𝜃1 = 𝜃𝒔′ 𝜃 𝒔′ Σ1 = Σ𝒔′ Σs′𝒔′ Σs′𝒔′ Σ 𝒔′  𝜃2 = 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′ −1 (𝒓1 − 𝜃𝒔′)  Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′ −1 Σs′𝒔′
  • 152. Model Dynamic Information Retrieval ModelingTutorial 2014152  Rank using PRP
  • 153. Model Dynamic Information Retrieval ModelingTutorial 2014153  Utility or Ranking  𝜆 𝜃 𝑠 𝑗 1 log2(𝑗+1) + 1 − 𝜆 𝜃 𝑠 𝑗 2 log2(𝑗+1) 2𝑀 𝑗=1+𝑀 𝑀 𝑗=1  DCG
  • 154. Model – Bellman Equation Dynamic Information Retrieval ModelingTutorial 2014154  Optimize 𝒔1 to improve 𝑼 𝒔 2  𝑉 𝜃1 , Σ1 , 1 = max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 + max 𝒔2 (1 − 𝜆) 𝜃𝒔 2 . 𝑾2 𝑃 𝒓 𝑑𝒓𝒓
  • 155. 𝜆 Dynamic Information Retrieval ModelingTutorial 2014155  Balances exploration and exploitation in page 1  Tuned for different queries  Navigational  Informational  𝜆 = 1 for non-ambiguous search
  • 156. Approximation Dynamic Information Retrieval ModelingTutorial 2014156  Monte Carlo Sampling  ≈ max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 + max 𝒔2 1 − 𝜆 1 𝑆 𝜃𝒔 2 . 𝑾2 𝑃 𝒓𝑟∈𝑂  Sequential Ranking Decision
  • 157. Experiment Data Dynamic Information Retrieval ModelingTutorial 2014157  Difficult to evaluate without access to live users  Simulated using 3TREC collections and relevance judgements  WT10G – Explicit Ratings  TREC8 – Clickthroughs  Robust – Difficult (ambiguous) search
  • 158. User Simulation Dynamic Information Retrieval ModelingTutorial 2014158  Rank M documents  Simulated user clicks according to relevance judgements  Update page 2 ranking  Measure at page 1 and 2  Recall  Precision  nDCG  MRR  BM25 – prior ranking model
  • 159. Investigating λ Dynamic Information Retrieval ModelingTutorial 2014159
  • 160. Baselines Dynamic Information Retrieval ModelingTutorial 2014160  𝜆 determined experimentally  BM25  BM25 with conditional update (𝜆 = 1)  Maximum Marginal Relevance (MMR)  Diversification  MMR with conditional update  Rocchio  Relevance Feedback
  • 161. Results Dynamic Information Retrieval ModelingTutorial 2014161
  • 162. Results Dynamic Information Retrieval ModelingTutorial 2014162
  • 163. Results Dynamic Information Retrieval ModelingTutorial 2014163
  • 164. Results Dynamic Information Retrieval ModelingTutorial 2014164
  • 165. Results Dynamic Information Retrieval ModelingTutorial 2014165  Similar results across data sets and metrics  2nd page gain outweighs 1st page losses  Outperformed Maximum Marginal Relevance using MRR to measure diversity  BM25-U simply no exploration case  Similar results when 𝑀 = 5
  • 166. Results Dynamic Information Retrieval ModelingTutorial 2014166
  • 167. Outline Dynamic Information Retrieval ModelingTutorial 2014167  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  • 168. Dynamic Information Retrieval Evaluation EmineYilmaz University College London Emine.Yilmaz@ucl.ac.uk
  • 169. Information Retrieval Systems Match information seekers with the information they seek
  • 174. Different Approaches to Evaluation  Online Evaluation  Design interactive experiments  Use users’ actions to evaluate the quality  Inherently dynamic in nature  Offline Evaluation  Controlled laboratory experiments  The users’ interaction with the engine is only simulated  Recent work focused on dynamic IR evaluation
  • 175. Online Evaluation  Standard click metrics  Clickthrough rate  Probability user skips over results they have considered (pSkip)  Most recently: Result interleaving    Click/Noclick Evaluate 175
  • 176. What is result interleaving?  A way to compare rankers online  Given the two rankings produced by two methods  Present a combination of the rankings to users  Team Draft Interleaving (Radlinski et al., 2008)  Interleaving two rankings  Input:Two rankings (“can be seen as teams who pick players”)  Repeat: o Toss a coin to see which team (ranking) picks next o Winner picks their best remaining player (document) o Loser picks their best remaining player (document)  Output: One ranking (2 teams of 5)  Credit assignment  Ranking providing more of the clicked results wins
  • 177. Team Draft InterleavingRanking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org AB
  • 178. Team Draft InterleavingRanking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins!
  • 179. Team Draft InterleavingRanking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins! Repeat Over Many Different Queries!
  • 180. Offline Evaluation  Controlled laboratory experiments  The user’s interaction with the engine is only simulated  Ask experts to judge each query result  Predict how users behave when they search  Aggregate judgments to evaluate 180
  • 181. Offline Evaluation  Until recently: Metrics assume that user’s information need was not affected by the documents read  E.g.Average Precision, NDCG, … • Users are more likely to stop searching when they see a highly relevant document • Lately: Metrics that incorporate the affect of relevance of documents seen by the user on user behavior  Based on devising more realistic user models  EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09] 181
  • 182. Modeling User Behavior Cascade-based models black powder ammunition 1 2 3 4 5 6 7 8 9 10 … • The user views search results from top to bottom • At each rank i, the user has a certain probability of being satisfied. • Probability of satisfaction proportional to the relevance grade of the document at rank i. • Once the user is satisfied with a document, he terminates the search.
  • 183. Rank Biased Precision Query Stop View Next Item black powder ammunition 1 2 3 4 5 6 7 8 9 10 …
  • 184. Rank Biased Precision black powder ammunition 1 2 3 4 5 6 7 8 9 10 …    1=i 1 =utilityTotal i irel examineddocsm.utility/NuTotalRBP  )1/(1)1(=examineddocsNum. 1=i 1    i i )-(1=RBP 1=i 1   i irel
  • 185. Expected Reciprocal Rank [Chapelle et al CIKM09] Query Stop Relevant? View Next Item nosomewhathighly black powder ammunition 1 2 3 4 5 6 7 8 9 10 …
  • 186. Expected Reciprocal Rank [Chapelle et al CIKM09] black powder ammunition 1 2 3 4 5 6 7 8 9 10 … rrankatdocument"perfectthe"findingofUtility:(r) 1/r(r)  )positionatstopsuser( 1 1 rP r ERR n r       1 11 )1( 1 r i ri n r RR r ERR documentitheofgraderelevance: th ig iRi g g i i docatstopofProb. 2 12 docofrelevanceofProb. max   
  • 187. Paris Luxurious HotelsParis HiltonJ LoSession Evaluation
  • 188. What is a good system?
  • 189. Measuring “goodness” The user steps down a ranked list of documents and observes each one of them until a decision point and either a) abandons the search, or b) reformulates While stepping down or sideways, the user accumulates utility
  • 190. Evaluation over a single ranked list 1 2 3 4 5 6 7 8 9 10 … kenya cooking traditional swahili kenya cooking traditional kenya swahili traditional food recipes
  • 191.
  • 192. Session DCG [Järvelin et al ECIR 2008] kenya cooking traditional swahili kenya cooking traditional  2rel(r) 1 logb (r b 1)r1 k   2rel(r) 1 logb (r b 1)r1 k  1 logc (1 c 1) DCG(RL1)  1 logc (2  c 1)  DCG(RL2)
  • 193. Model-based measures Probabilistic space of users following different paths  Ω is the space of all paths  P(ω) is the prob of a user following a path ω in Ω  Mω is a measure over a path ω [Yang and Lad ICTIR 2009, Kanoulas et al. SIGIR 2011]
  • 194. Probability of a path Probability of abandoning at reform 2 X Probability of reformulating at rank 3 Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R … … … (1) (2)
  • 195. Expected Global Utility [Yang and Lad ICTIR 2009] 1. User steps down ranked results one-by-one 2. Stops browsing documents based on a stochastic process that defines a stopping probability distribution over ranks and reformulates 3. Gains something from relevant documents, accumulating utility
  • 196. Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R … … … Probability of abandoning the session at reformulation i Geometric w/ parameter preform (1)
  • 197. Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R … … … Geometricw/parameterpdown Probability of reformulating at rank j (2) Geometric w/ parameter preform
  • 198. Expected Global Utility [Yang and Lad ICTIR 2009]  The probability of a user following a path ω: P(ω) = P(r1, r2, ..., rK) ri is the stopping and reformulation point in list i  Assumption: stopping positions in each list are independent P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)  Use geometric distribution (RBP) to model the stopping and reformulation behaviour P(ri = r) = (1-)k1
  • 199. Conclusions  Recent focus on evaluating the dynamic nature of the search process  Interleaving  New offline evaluation metrics  ERR, RBU  Session evaluation metrics
  • 200. Outline Dynamic Information Retrieval ModelingTutorial 2014200  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation  Conclusion
  • 201. Conclusions Dynamic Information Retrieval ModelingTutorial 2014201  Dynamic IR describes a new class of interactive model  Incorporates rich feedback, temporal dependency and is goal oriented.  Family of Markov models and Multi Armed Bandit theory useful in building DIR models  Applicable to a range of IR problems  Useful in applications such as session search and evaluation
  • 202. Dynamic IR Book Dynamic Information Retrieval ModelingTutorial 2014202  Published by Morgan & Claypool  ‘Synthesis Lectures on Information Concepts, Retrieval, and Services’  Due March/April 2015 (in time for SIGIR 2015)
  • 203. Acknowledgment Dynamic Information Retrieval ModelingTutorial 2014203  We thank Dr. EmineYilmaz for giving us the guest speech.  We sincerely thank Dr. Xuchu Dong for his help in preparation of the tutorial  We also thank comments and suggestions from the following colleagues:  Dr. Jamie Callan  Dr. Ophir Frieder  Dr. Fernando Diaz  Dr Filip Radlinski
  • 204. Dynamic Information Retrieval ModelingTutorial 2014204
  • 205. Thank You Dynamic Information Retrieval ModelingTutorial 2014205
  • 206. References Dynamic Information Retrieval ModelingTutorial 2014206 Static IR  Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro- Neto.Addison-Wesley, 1999.  The PageRank Citation Ranking: Bringing Order to theWeb. Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd. 1999  Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005  A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.
  • 207. References Dynamic Information Retrieval ModelingTutorial 2014207 Interactive IR  Relevance Feedback in Information Retrieval, Rocchio, J. J.,The SMART Retrieval System (pp. 313-23), 1971  A study in interface support mechanisms for interactive information retrieval, RyenW.White et. al, JASIST, 2006  Visualizing stages during an exploratory search session, Bill Kules et. al, HCIR, 2011  Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011  Structured Learning of Two-level Dynamic Rankings, Karthik Raman et. al, CIKM, 2011
  • 208. References Dynamic Information Retrieval ModelingTutorial 2014208 Dynamic IR  A hidden Markov model information retrieval system. D. R. H. Miller,T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.  Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002  A large-scale study of the evolution of web pages, Dennis Fetterly et. al.,WWW 2003  Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008.  Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem,YisongYue et. al., ICML 2009  Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009
  • 209. References Dynamic Information Retrieval ModelingTutorial 2014209 Dynamic IR  Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009  A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al.,WSDM 2010  A contextual-bandit approach to personalized news article recommendation. Lihong Li,Wei Chu, John Langford, Robert E. Schapire.WWW, 2010  Inferring search behaviors using partially observable markov model with duration (POMD),Yin he et. al.,WSDM, 2011  No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011  Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011  Large-ScaleValidation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al.,TOIS 2012
  • 210. References Dynamic Information Retrieval ModelingTutorial 2014210 Dynamic IR  Using ControlTheory for Stable and Efficient Recommender Systems.T. Jambor, J.Wang, N. Lathia. In:WWW '12, pages 11-20.  Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al., CIKM 2012  Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.  Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H.Yang. In SIGIR 2013.  Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J.Wang. InWWW ’13.  Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In: CIKM'2013, pages 1411-1420.  Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H.Yang. In SIGIR ’14.
  • 211. References Dynamic Information Retrieval ModelingTutorial 2014211 Markov Processes  A markovian decision process. R. Bellman. Indiana University Mathematics Journal, 6:679–684, 1957.  Dynamic Programming. R. Bellman. Princeton University Press, Princeton, NJ, USA, first edition, 1957.  Dynamic Programming and Markov Processes. R.A. Howard. MIT Press. 1960  Linear Programming and Sequential Decisions.Alan S. Manne. Management Science, 1960  Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical Statistics 37, 1966
  • 212. References Dynamic Information Retrieval ModelingTutorial 2014212 Markov Processes  Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988  Computationally feasible bounds for partially observed Markov decision processes.W. Lovejoy. Operations Research 39: 162–175, 1991.  Q-Learning. Christopher J.C.H.Watkins, Peter Dayan. Machine Learning. 1992  Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.  Reinforcement Learning:An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.  Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1- 2):99–134, 1998.
  • 213. References Dynamic Information Retrieval ModelingTutorial 2014213 Markov Processes  Finding approximate POMDP solutions through belief compression. N. Roy. PhDThesis Carnegie Mellon. 2003  VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.  Finding Approximate POMDP solutionsThrough Belief Compression. N. Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.  Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT Press. 2005  Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S.Thrun.Volume 27, pages 335-380, 2006  Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press, 2006.
  • 214. References Dynamic Information Retrieval ModelingTutorial 2014214 Markov Processes  The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973  Modified Policy IterationAlgorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.  An example of statistical investigation of the text eugene onegin the connection of samples in chains.A.A. Markov. Science in Context, 19:591–600, 12 2006.  Learning to Rank for Information Retrieval.Tie-Yan Liu. Springer Science & Business Media. 2011  Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa- Bianchi, Paul Fischer. ICML 100-108, 1998  Multi-armed bandit allocation indices,Wiley, J. C. Gittins. 1989  Finite-time Analysis of the Multiarmed Bandit Problem, PeterAuer et. al., Machine Learning 47, Issue 2-3. 2002.