SlideShare ist ein Scribd-Unternehmen logo
1 von 219
Downloaden Sie, um offline zu lesen
推薦系統的介紹
林守德教授
CSIE, NTU
sdlin@csie.ntu.edu.tw
1
About the Speaker
• PI: Shou-de Lin (machine discovery and social
network mining lab)
• B.S. in NTUEE
• M.S. in EECS, UM
• M.S. in Computational Linguistics, USC
• Ph.D. in CS, USC
• Postdoc in Los Alamos National Lab
• Courses:
• Machine Learning and Data Mining- Theory
and Practice
• Machine Discovery
• Social network Analysis
• Technical Writing and Research Method
• Probabilistic Graphical Model
• Awards:
• All-time ACM KDD Cup Champion (2008,
2010, 2011, 2012, 2013)
• Google Research Award 2008
• Microsoft Research Award 2009
• Best Paper Award WI2003, TAAI 2010,
ASONAM 2011, TAAI 2014
• US Areospace AROAD Research Grant
Award 2011, 2013, 2014, 2015, 2016
Machine
Learning
with Big
Data
Machine
Discovery
Learning IoT
Data
Applications in
NLP&SNA&
Recommender
Practical
Issues in
Learning
2
Agenda
• Recommender System: an overview
• Introducing various recommendation models
• Evaluating a recommender system
• Advanced issues in recommendation
• Tools or Library for building a recommender system
3
• A recommendation system suggests potentially
favored items or contents to users
• Users may be recommended new items previously
unknown to them.
• The system can find the opposite (e.g.: items the users do
not like)
What is the Recommendation System(RS)?
4
我要退
貨
為什麼?
這商品是
「心魔」要
我買的
推薦系統就是那
個「心魔」
Values of Recommendation Systems
• For service providers
• Improve trust and customer loyalty
• Increase sales, click trough rates, etc.
• Opportunities for promotion
• For customers
• New interesting items being identified
• Narrow down the possible choices
2/3 of the movies
watched are
recommended
recommendations
generate 38% more
click-throughs
35% sales from
recommendations
28% of the people would
buy more music if they
found what they liked
5
Recommending products, services,
and news
From Dietmar Jannach, Markus Zanker and Gerhard Friedrich
6
Recommending friends, jobs, and photos
From Dietmar Jannach, Markus Zanker and Gerhard Friedrich
7
The Netflix Challenge (2006~2009)
• 1M prize to improve the prediction accuracy by 10%
8
KDD Cup 2011: Yahoo Music
Recommendation
KDD Cup 2012: Tencent
Advertisement Recommendation
9
What is a good recommender
system?
• Requirement 1: finding items that interests the
specific user  personalization
• Requirement 2: recommending diverse items that
satisfy all the possible needs of a specific user 
diversity
• Requirement 3: making non-trivial
recommendation (Harry Potter 1 ,2 ,3 ,4  5) 
novelty
10
Inputs/outputs of a Recommendation
System
• Given (1 or 2&3):
1. Ratings of users to items: rating can be either explicit or
implicit, for example
1. Explicit ratings: the level of likeliness of users to items
2. Implicit ratings: the browsing information of users to items
2. User features (e.g. preferences, demographics)
3. Item features (e.g. item category, keywords)
4. User/user relationships (optional)
• Predict:
• Ratings of any user to any item, or
• Ranking of items to each user
• What to recommend?
• Highly rated or ranked items given each user
11
Types of Recommender Systems
• Content-based Recommendation (CBR)
• Collaborative Filtering
• Other solutions
• Advanced Issues
12
Key Concept of CBR
1. Recommend items that are similar to the
ones liked by this user in the past
2. Recommend items whose attributes are
similar to the users’ profile
13
CBR: Requirement
• Required info:
• information about items (e.g. genre and actors of a
movie, or textual description of a book)
• Information about users (e.g. user profile info,
demographic features, etc)
• Optional Info
• Ratings from users to items
14
Content-based Recommendation vs.
Information Retrieval (i.e. Search Engine)
Information
Retrieval tries to
match
Keywords
Documents
CBR tries to
match
Profiles of
liked items
of the user
Profiles of
unseen items
• If we know how keywords can be used to match
documents in a search engine, then we will use the
same model to match user and item profiles
15
Vector Space Model
• Represent the keywords of items using a term vector
• Term: basic concept, e.g., keywords to describe an item
• Each term represents one dimension in a vector
• N total terms define an n-element terms
• Values of each term in a vector corresponds to the
importance of that term
• E.g., d=(x1,…,xN), xi is “importance” of term i
• Measure similarity by the vector distances
16
An Example
• Items or users are represented using a set of keywords.
• each dimension represents a binary random variable associated
to the existence of an indexed term (1:exist, 0: not exist in the
document)
Best-selling, romantic,
drama
0 1 1 0 1 0 ……
Best-
selling
exampassgood romanticdrama
17
Term Frequency and Inverse Document
Frequency (TFIDF)
• Since not all terms in the vector space are equally
important, we can weight each term using its
occurrence probability in the item description
• Term frequency: TF(d,t)
• number of times t occurs in the item description d
• Inverse document frequency: IDF(t)
• to scale down the terms that occur in many descriptions
18
Normalizing Term Frequency
• nij represents the number of times a term ti occurs in a
description dj . Tfij can be normalized using the total
number of terms in the document.
• Another normalization:
ij
ij
kj
k
n
tf
n


max
ij
ij
kj
n
tf
n

19
Inverse Document Frequency
• IDF seeks to scale down the coordinates of terms that
occur in many item descriptions.
• For example, some stop words (the, a, of, to, and…) may
occur many times in a description. However, they should
not be considered as important representation feature.
• IDF of a term ti :
,where dfi (document frequency of term ti) is the
number of descriptions in which ti occurs.
20
)1log( 
i
i
df
N
idf
Term Frequency and Inverse Document
Frequency
• TF-IDF weighting : weight(t,d)=TF(t,d)*IDF(t)
• Common in doc high tf  high weight
• Rare in collection  high idf high weight
• weight wij of a term ti in a document dj
wij )1(log* 
j
ij
df
N
tf
21
22
Computing TF-IDF -- An Example
Given an item description contains terms with given
frequencies:
A: best-selling (3), B: Romantic (2), C: Drama (1)
Assume collection contains 10,000 items and
document frequencies of these terms are:
best-selling(50), Romantic (1300), Drama (250)
Then:
A: tf = 3/3; idf = log2(10000/50+1) = 7.6; tf-idf = 7.6
B: tf = 2/3; idf = log2 (10000/1300+1) = 2.9; tf-idf = 2.0
C: tf = 1/3; idf = log2 (10000/250+1) = 5.3; tf-idf = 1.8
The vector-space presentation of this item using A, B, and
C is (7.6,2.0,1.8)
max
ij
ij
kj
n
tf
n

Graphic Representation
T3
T1
T2
I1 = 2T1+ 3T2 + 5T3
I2 = 3T1 + 7T2 + T3
U = 0T1 + 0T2 + 2T3
7
32
5
Example:
I1 = (2,3,5)=2T1 + 3T2 + 5T3
I2 = (3,7,1)=3T1 + 7T2 + T3
U = (0,0,2)=0T1 + 0T2 + 2T3
• Is I1 or I2 more similar to U?
• How to measure the degree of
similarity? Distance? Angle?
Projection?
23
Measure Similarity Between two item
descriptions
• D={d1,d2,…dn}
• Q={q1,q2,…qn} (the qx value is 0 if a term is absent)
• Dot product similarity: Sim(D,Q)=d1*q1+d2*q2+…+dn*qn
• Cosine Similarity (or normalized dot product):
sim(D,Q)=
 


N
1j
2
j
N
1j
2
j
n11
q*d
*d...*d
),( nqq
DQsim
24
Content-based Recommendation:
an example from Alexandros Karatzoglou
• A customer buys the book: “Building data mining
applications for CBM”
• 7 Books are possible candidates for a recommendation:
1. Accelerating Customer Relationships: Using CRM and
Relationship Technologies
2. Mastering Data Mining: The Art and Science of Customer
Relationship Management
3. Data Mining Your Website
4. Introduction to marketing
5. Consumer behaviour
6. Marketing research, a handbook
7. Customer knowledge management
25
Using binary existence of words
26
Using TFIDF
27
Pros and Cons for CBR
Pros:
• user independence – does not need data from other users
• transparency – easily explainable
• new items can be easily incorporated (assuming content
available)
Cons:
• Difficult to apply to domain where feature extraction is an
inherent problem (e.g. multimedia data)
• Keywords might not be sufficient: Items represented by the
same set of features are indistinguishable
• Cannot deal with new users
• Redundancy: should we show items that are too similar?
28
Types of Recommender Systems
• Content-based Recommendation (CBR)
• Collaborative Filtering
• Other solutions
• Advanced Issues
29
Collaborative Filtering (CF)
• CF is the most successful and common approach to
generate recommendations
• used in Amazon, Netflix, and most of the e-commerce
sites
• many algorithms exist
• General and can be applied to many domains (book,
movies, DVDs, ..)
• Key Idea
• Recommended the favored items of people who are
‘similar’ to you
• Need to collect the taste (implicit or explicit) of other
people  that’s why we call it ‘collaborative’
30
Mathematic Form of CF
Explicit
ratings
Item1 Item2 Item3 Item4
User1 ? 3 ? ?
User2 1 ? ? 5
User3 ? 4 ? 2
User4 3 ? 3 ?
• Given: some ratings from users to items
• Predict: unknown ratings of users to items
Implicit
ratings
Item1 Item2 Item3 Item4
User1 ? 1 ? ?
User2 1 ? ? 1
User3 ? 1 ? 1
User4 1 ? 1 ?
31
CF models
• Memory-based CF
• User-based CF
• Item-based CF
• Model-based CF
32
User-Based Collaborative Filtering
• Finding users 𝑁(𝑢) most similar to user 𝑢
(neighborhood of 𝑢)
• Assign 𝑁(𝑢)’s rating as u’s rating
• Prediction
• 𝑟𝑢,𝑖 = 𝑟 𝑢 +
𝑣∈𝑁(𝑢) 𝑠𝑖𝑚(𝑢,𝑣)(𝑟 𝑣,𝑖−𝑟 𝑣)
𝑣∈𝑁(𝑢) 𝑠𝑖𝑚(𝑢,𝑣)
• 𝑟 𝑢: Average of ratings of user 𝑢
• Problem: Usually users do not have many ratings;
therefore, the similarities between users may be
unreliable
33
Example of User-Based CF
• To predict 𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛
using cosine similarity
• Neighborhood size is 2
Rating Lion King Aladdin Mulan Anastasia
John 3 0 3 3
Joe 5 4 0 2
Jill 1 2 4 2
Jane 3 ? 1 0
Jorge 2 2 0 1
𝑟𝐽𝑜ℎ𝑛 =
3 + 0 + 3 + 3
4
= 2.25
𝒓 𝑱𝒐𝒆 =
5 + 4 + 0 + 2
4
= 𝟐. 𝟕𝟓
𝑟𝐽𝑖𝑙𝑙 =
1 + 2 + 4 + 2
4
= 2.25
𝑟𝐽𝑎𝑛𝑒 =
3 + 1 + 0
3
= 1.33
𝒓 𝑱𝒐𝒓𝒈𝒆 =
2 + 2 + 0 + 1
4
= 𝟏. 𝟐𝟓
𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝐽𝑜ℎ𝑛 =
3 × 3 + 1 × 3 + 0 × 3
32 + 12 + 02 32 + 32 + 32
= 0.73
𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝑱𝒐𝒆 =
3 × 5 + 1 × 0 + 0 × 2
32 + 12 + 02 52 + 02 + 22
= 𝟎. 𝟖𝟖
𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝐽𝑖𝑙𝑙 =
3 × 1 + 1 × 4 + 0 × 2
32 + 12 + 02 12 + 42 + 22
= 0.48
𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝑱𝒐𝒓𝒈𝒆 =
3 × 2 + 1 × 0 + 0 × 1
32 + 12 + 02 22 + 02 + 12
= 𝟎. 𝟖𝟒
𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛 = 1.33 +
0.88 4 − 2.75 + 0.84(2 − 1.25)
0.88 + 0.84
= 2.33
34
Item-Based Collaborative Filtering
• Similarity is defined between two items
• Finding items 𝑁(𝑖) most similar to item 𝑖
(neighborhood of 𝑖)
• Assign 𝑁(𝑖)’s rating as i’s rating
• Prediction
• 𝑟𝑢,𝑖 = 𝑟𝑖 +
𝑗∈𝑁(𝑖) 𝑠𝑖𝑚(𝑖,𝑗)(𝑟 𝑢,𝑗−𝑟 𝑗)
𝑗∈𝑁(𝑖) 𝑠𝑖𝑚(𝑖,𝑗)
• 𝑟𝑖: Average of ratings of item 𝑖
• Items usually have more ratings from many users
and the similarities between items are more stable
35
Example of Item-Based CF
• To predict 𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛
using cosine similarity
• Neighborhood size is 2
Rating Lion King Aladdin Mulan Anastasia
John 3 0 3 3
Joe 5 4 0 2
Jill 1 2 4 2
Jane 3 ? 1 0
Jorge 2 2 0 1
𝒓 𝑳𝒊𝒐𝒏 𝑲𝒊𝒏𝒈 =
3 + 5 + 1 + 3 + 2
5
= 𝟐. 𝟖
𝑟 𝐴𝑙𝑎𝑑𝑑𝑖𝑛 =
0 + 4 + 2 + 2
4
= 2
𝑟 𝑀𝑢𝑙𝑎𝑛 =
3 + 0 + 4 + 1 + 0
5
= 1.6
𝒓 𝑨𝒏𝒂𝒔𝒕𝒂𝒔𝒊𝒂 =
3 + 2 + 2 + 0 + 1
5
= 𝟏. 𝟔
𝑠𝑖𝑚 𝐴𝑙𝑎𝑑𝑑𝑖𝑛, 𝑳𝒊𝒐𝒏 𝑲𝒊𝒏𝒈
=
0 × 3 + 4 × 5 + 2 × 1 + 2 × 2
02 + 42 + 22 + 22 32 + 52 + 12 + 22
= 𝟎. 𝟖𝟒
𝑠𝑖𝑚 𝐴𝑙𝑎𝑑𝑑𝑖𝑛, 𝑀𝑢𝑙𝑎𝑛
=
0 × 3 + 4 × 0 + 2 × 4 + 2 × 0
02 + 42 + 22 + 22 32 + 02 + 42 + 02
= 0.32
𝑠𝑖𝑚 𝐴𝑙𝑎𝑑𝑑𝑖𝑛, 𝑨𝒏𝒂𝒔𝒕𝒂𝒔𝒊𝒂
=
0 × 3 + 4 × 2 + 2 × 2 + 2 × 1
02 + 42 + 22 + 22 32 + 22 + 22 + 12
= 𝟎. 𝟔𝟕
𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛 = 2 +
0.84 3 − 2.8 + 0.67(0 − 1.6)
0.84 + 0.67
= 2.33
36
Quiz: 推薦系統之阿拉丁神燈
part 1
• 小胖撿到了一個神燈,神燈巨人答應他一個願望,
小胖說:我想要蔡10陪我共進晚餐。
• 神燈巨人說沒問題,繃的一聲,就把蔡10,蔡
13都變了出來。
• 小胖說:我沒有說要找蔡13啊?
• 神燈巨人說:我的推薦系統認為你也可能對蔡13
感興趣。
• 請問,神燈巨人內部是跑哪種推薦系統
37
CF models
• Memory-based CF
• User-based CF
• Item-based CF
• Model-based CF
• Matrix Factorization
38
Matrix Factorization (MF)
• Given a matrix 𝑅 ∈ ℝ 𝑁×𝑀
, we would like to find
two matrices 𝑈 ∈ ℝ 𝐾×𝑁
, 𝑉 ∈ ℝ 𝐾×𝑀
such that
𝑈⊤
𝑉 ≈ 𝑅
– 𝐾 ≪ min 𝑁, 𝑀  we assume 𝑅 of small rank 𝐾
– A low-rank approximation method
– Earlier works (before 2007 ~ 2009) call it singular
value decomposition (SVD)
≈ 𝑈⊤
𝑉 𝐾
𝐾
𝑁
𝑀
𝑅𝑀
𝑁 39
Matrix factorization
Vk
T
Dim1 -0.4 -0.7 0.6 0.4 0.5
Dim2 0.8 -0.6 0.2 0.2 -0.3
Uk Dim1 Dim2
Alice 0.4 0.3
Bob -0.4 0.3
Mary 0.7 -0.6
Sue 0.3 0.9
40
𝑅 = 𝑈⊤
𝑉
Rating (Alice to Harry Potter)=0.4*0.4+0.3*0.2
MF as an Effective CF Realization
• We find two matrices 𝑈, 𝑉 to approximate 𝑅
– Missing entry 𝑅𝑖𝑗 can be predicted by 𝑈𝑖
⊤
𝑉𝑗
• Entries in column 𝑈𝑖 (or 𝑉𝑗)represent the latent factor (i.e.
rating patterns learned from data) of user 𝑖 (or item j)
• If two users have similar latent factors, then they will give
similar ratings to all items
• If two items have similar latent factor, then the
corresponding rating for all users are similar
1 2 5
4 2 5
2 4 3
5 1 4
3 3
≈
1 0 2
1 0 1
6
3
-1
𝑈⊤
𝑉
𝑅
41
𝑅
Relationship between MF and SVD
• Singular value decomposition (SVD)
– Matrix 𝑅 of rank 𝐾 should be non-missing
• Matrix factorization (MF)
– Missing entries can be omitted in learning
– It is more scalable for large datasets
𝑊⊤ 𝐻=𝑁
𝑀
𝑁
𝑁
𝑀
𝑀
𝑀
𝑁
Diagonal matrix of 𝐾
positive singular values
𝜎1 ≥ ⋯ ≥ 𝜎 𝐾 > 0 for a
rank-𝐾 matrix
𝑊⊤ 𝐻= 𝑁
𝑀
𝐾
𝐾
𝜎1
𝜎 𝐾
⋱
0
⋱
𝜎1
𝜎 𝐾
⋱
𝜎1
𝜎 𝐾
⋱
𝑈⊤ 𝑉
𝐾 𝐾
𝐾 𝐾
SVD
MF
42
Geared
towards
females
Geared
towards
males
Serious
Funny
Latent Factor Examples (Leskovec et al.)
43
Factor 1
Factor2
How to Train an MF Model (1/2)
• MF as a Minimization problem
arg min
𝑈,𝑉
1
2
𝑖=1
𝑁
𝑗=1
𝑀
𝛿𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗
2
+
𝜆 𝑈
2
𝑈 𝐹
2
+
𝜆 𝑉
2
𝑉 𝐹
2
– 𝑈 𝐹
2
= 𝑖=1
𝑁
𝑈𝑖 2
2
= 𝑖=1
𝑁
𝑘=1
𝐾
𝑈 𝑘𝑖
2
: squared Frobenius
norm
– 𝛿𝑖𝑗 ∈ 0,1 : rating 𝑅𝑖𝑗 is observed in 𝑅
≈𝑈𝑖
⊤ 𝑉𝑗
𝑅𝑖𝑗
Regularization
Squared error
44
How to Train an MF Model (2/2)
• MF with bias terms
arg min
𝑈,𝑉
1
2
𝑖=1
𝑁
𝑗=1
𝑀
𝛿𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗 + 𝑏𝑖 + 𝑐𝑗 + 𝜇 − 𝑅𝑖𝑗
2
+
𝜆 𝑈
2
𝑈 𝐹
2
+
𝜆 𝑉
2
𝑉 𝐹
2
+
𝜆 𝑏
2
𝑏 2
2
+
𝜆 𝑐
2
𝑐 2
2
+
𝜆 𝜇
2
𝜇2
– 𝑏: rating mean vector for each user
– 𝑐: rating mean vector for each item
– 𝜇: overall mean of all ratings
• Some MF extension works omit the bias terms
Koren, Yehuda, Robert Bell, and Chris Volinsky.
"Matrix factorization techniques for recommender systems."
Computer 42.8 (2009): 30-37.
45
MF as Neural Network (NN)
• Shallow NN with identity activation function
– 𝑁 input neurons: user 𝑖 as one-hot encoding
– 𝑀 output neurons: row 𝑖 in rating matrix 𝑅
𝑁 𝑀
𝐾
0
0
0
1
0
𝑈 𝑉 1
4
𝑅4
User
4
46
MF as Probabilistic Graphical Model
(PGM)
• Bayesian network with normal distributions
• Maximum a posteriori (MAP)
arg max
𝑈,𝑉
𝑖=1
𝑁
𝑗=1
𝑀
𝒩 𝑅𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗, 𝜎 𝑅
2 𝛿 𝑖𝑗
𝑖=1
𝑁
𝒩(𝑈𝑖|0, 𝜎 𝑈
2
𝐼)
𝑗=1
𝑀
𝒩(𝑉𝑗|0, 𝜎 𝑉
2
𝐼)
Likelihood
(normal distribution)
Zero-mean spherical Gaussian prior
(multivariate normal distribution)
47
Probabilistic Matrix Factorization (PMF)
• PGM viewpoint of MF
𝑗 = 1: 𝑀
𝑖 = 1: 𝑁
𝑅𝑖𝑗
𝑈𝑖 𝑉𝑗
𝜎 𝑅
𝜎 𝑈 𝜎 𝑉
A. Mnih and R. R. Salakhutdinov, “Probabilistic matrix factorization,”
in Proc. of NIPS, 2008, pp. 1257–1264.
48
PMF vs. MF
• PMF is equivalent to MF
– 𝜆 𝑈 =
𝜎 𝑅
2
𝜎 𝑈
2 , 𝜆 𝑉 =
𝜎 𝑅
2
𝜎 𝑉
2
– 𝐶: terms irrelevant to 𝑈, 𝑉
– 𝑙2-norm regularization = zero-mean spherical Gaussian
prior
– PMF can be easily incorporated into other PGM models
arg max
𝑈,𝑉
𝑖=1
𝑁
𝑗=1
𝑀
𝒩 𝑅𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗, 𝜎 𝑅
2 𝛿 𝑖𝑗
𝑖=1
𝑁
𝒩(𝑈𝑖|0, 𝜎 𝑈
2
𝐼)
𝑗=1
𝑀
𝒩(𝑉𝑗|0, 𝜎 𝑉
2
𝐼)
arg min
𝑈,𝑉
1
2
𝑖=1
𝑁
𝑗=1
𝑁
𝛿𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗
2
+
𝜆 𝑈
2
𝑈 𝐹
2
+
𝜆 𝑉
2
𝑉 𝐹
2
+ 𝐶
Take
−𝜎 𝑅
2
log 𝑥
Take
𝑒
−
1
𝜎 𝑅
2 𝑥
49
Learning in MF
• Stochastic gradient descent (SGD)
• Alternating least squares (ALS)
• Variational expectation maximization (VEM)
50
Stochastic Gradient Descent (SGD) (1/2)
• Gradient descent: updating variables based on the
direction of negative gradients
– SGD updates variables instance-wise
• Let 𝐿 be the objective function
𝐿 =
1
2
𝑖=1
𝑁
𝑗=1
𝑀
𝛿𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗
2
+
𝜆 𝑈
2
𝑈 𝐹
2
+
𝜆 𝑉
2
𝑉 𝐹
2
– Objective function for each training rating
𝐿𝑖𝑗 =
1
2
𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗
2
+
𝜆 𝑈
2
𝑈𝑖 2
2
+
𝜆 𝑉
2
𝑉𝑗 2
2
51
Stochastic Gradient Descent (SGD) (2/2)
• Gradient
–
𝜕𝐿 𝑖𝑗
𝜕𝑈 𝑖
= 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗 𝑉𝑗 + 𝜆 𝑈 𝑈𝑖
–
𝜕𝐿 𝑖𝑗
𝜕𝑉 𝑗
= 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗 𝑈𝑖 + 𝜆 𝑉 𝑉𝑗
• Update rule
– 𝑈𝑖 ← 𝑈𝑖 − 𝜂
𝜕𝐿 𝑖𝑗
𝜕𝑈 𝑖
– 𝑉𝑗 ← 𝑉𝑗 − 𝜂
𝜕𝐿 𝑖𝑗
𝜕𝑉 𝑗
– 𝜂: learning rate or step size
𝐿
52
Matrix factorization: Stopping criteria
• When do we stop updating?
– Improvement drops (e.g. <0)
– Reached small error
– Achieved predefined # of iterations
– No time to train anymore
53
Alternating Least Squares (ALS)
• Stationary point: zero gradient
– We can find the closed-form solution of 𝑈 with 𝑉 fixed, and vise
versa
• Zero gradient
–
𝜕𝐿
𝜕𝑈 𝑖
= 𝑗=1
𝑀
𝛿𝑖𝑗 𝑉𝑗 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗 + 𝜆 𝑈 𝑈𝑖 = 0
–
𝜕𝐿
𝜕𝑉 𝑗
= 𝑖=1
𝑁
𝛿𝑖𝑗 𝑈𝑖 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗 + 𝜆 𝑉 𝑉𝑗 = 0
• Closed-form solution i.e. update rule
– 𝑈𝑖 = 𝑗=1
𝑀
𝛿𝑖𝑗 𝑉𝑗 𝑉𝑗
⊤
+ 𝜆 𝑈 𝐼
−1
𝑗=1
𝑀
𝛿𝑖𝑗 𝑅𝑖𝑗 𝑉𝑗
– 𝑉𝑗 = 𝑖=1
𝑁
𝛿𝑖𝑗 𝑈𝑖 𝑈𝑖
⊤
+ 𝜆 𝑉 𝐼
−1
𝑖=1
𝑁
𝛿𝑖𝑗 𝑅𝑖𝑗 𝑈𝑖
54
SGD vs. ALS
• SGD
– It is easier to develop MF extensions since we do not
require the closed-form solutions
• ALS
– We are free from determining the learning rate 𝜂
– It allows parallel computing
• Drawback for both
– Regularization parameters 𝜆 need careful tuning using
validation  we have to run MF for multiple times
• Variational-EM (VEM) learns regularization
parameters
arg min
𝑈,𝑉
1
2
𝑖=1
𝑁
𝑗=1
𝑀
𝛿𝑖𝑗 𝑈𝑖
⊤
𝑉𝑗 − 𝑅𝑖𝑗
2
+
𝜆 𝑈
2
𝑈 𝐹
2
+
𝜆 𝑉
2
𝑉 𝐹
2
55
Extensions of MF
• Matrix 𝑅 can be factorized into 𝑈⊤
𝑈, 𝑈⊤
𝑉𝑈, 𝑈𝑉𝑊, …
• SVD++
– MF with implicit interactions among items
• Non-negative MF (NMF)
– non-negative entries for the factorized matrices
• Tensor factorization (TF), Factorization machines (FM)
– Additional features involved in learning latent factors
• Bayesian PMF (BPMF)
– Further modeling distributions of PMF parameters 𝜃
• Poisson factorization (PF)
– PMF normal likelihood replaced with Poisson likelihood to
form a probabilistic nonnegative MF 56
Applications of MF
• Recommender systems
• Filling missing features
• Clustering
• Link prediction
– Predict future new edges in a graph
• Community detection
– Cluster nodes based on edge density in a graph
• Word embedding
– Word2Vec is actually an MF
57
Quiz: 推薦系統之阿拉丁神燈Part 2
• 小胖再度摩擦神燈,這次出來了另一個神燈巨
人。巨人看小胖一臉狐疑,說到「上次那個神
燈巨人推薦得不好被炒魷魚了,我是新來的」
• 小胖想,那這次應該就不會帶拖油瓶出來了吧,
於是他要求:我想要蔡10陪我共進晚餐。
• 神燈巨人說沒問題,繃的一聲,就把蔡10,侯
佩程,昆0都變了出來。
• 請問,神燈巨人內部是執行哪種推薦系統?
58
Types of Recommender Systems
• Content-based Recommendation (CBR)
• Collaborative Filtering
• Other solutions
• Restricted Boltzmann Machines
• Clustering-based recommendation
• Association rule based recommendation
• Random-walk based recommendation
• Advanced Issues
59
Restricted Boltzmann Machines
For Collaborative Filtering
• RBM is an MRF of 2 layers
• One RBM each user, but share the link weights
• Learned by Contrastive divergence
• DBM  Deep Boltzmann Machine
Ratings from
1~5
Restricted Boltzmann Machines for Collaborative Filtering, ICML 07 60
Clustering-based Recommendation
• Step 1: Clustering users into groups
• Step 2: finding the highest rated items in each
group as the recommended items for this group
• Notes: it is more efficient but less personalized
61
I1 I2 I3 I4 I5
U1 4 2
U2 4 3
U3 3 2 2
U4 4 3
U5 3 2
Association-rule based
recommendation
• Finding association rules from global data
• E.g. (item1, item2) appears together frequently
• User 1 has purchased item1  recommend item 2
• Pros: easy to implement, quick prediction
• Cons: it is not very personalized
• Associate rule work well for retail store
recommendation
62
Graph-Based Model
• Build network from rating database
•
• Extra knowledge
63
i1 i2 i3 i4 i5 i6 i7
u1 r11 r12 r14
u2 r22 r23
u3 r34 r35
u4 r43 r45 r46
u4
u1
u2
u3
i1
i2
i3
i4
r11
i5
i6
i7
r12
r14
r22
r34
r35
r23
r46
r45
r43
i1 i2 i3 i4 i5 i6 i7
f1 v v v v
f2 v v v
i1
i2
i3
i4
i5
i6
i7
f2
f1
u1
u2
u3
i1
i2
i3
i4
r11
i5
i6
i7
u4
r12
r14
r22
r34 r35
r23
r46
r45
r43
f2
f1
Random Walk on Graph
• Random walk with restart
• Random surf in the network from a source user.
• Recommend items that have higher chance to be
reached
64
A
B
D
E
F
C
Limitations on Collaborative filtering
1. Cannot consider features of items and users
• Solution: factorization machine
2. Cannot consider cold-start situation
• Solution: transfer recommendation
65
Demo: A joke recommendation
system using CM models
• http://eigentaste.berkeley.edu/index.html
66
Evaluating Recommendation
Systems
• Accuracy of predictions
• How close predicted ratings are to the true ratings
• Relevancy of recommendations
• Whether users find the recommended items relevant to
their interests
• Ranking of recommendations
• Ranking products based on their levels of
interestingness to the user
67
Accuracy of Predictions
• Mean absolute error (MAE)
• 𝑀𝐴𝐸 =
𝑖𝑗 𝑟 𝑖𝑗−𝑟 𝑖𝑗
𝑛
• 𝑟𝑖𝑗: Predicted rating of user 𝑖 and item 𝑗
• 𝑟𝑖𝑗: True rating
• Normalized mean absolute error (NMAE)
• 𝑁𝑀𝐴𝐸 =
𝑀𝐴𝐸
𝑟 𝑚𝑎𝑥−𝑟 𝑚𝑖𝑛
• Root mean squared error (RMSE)
• 𝑅𝑀𝑆𝐸 =
1
𝑛 𝑖𝑗 𝑟𝑖𝑗 − 𝑟𝑖𝑗
2
• Error contributes more to the RMSE value
68
Example of Accuracy
• 𝑀𝐴𝐸 =
1−3 + 2−5 + 3−3 + 4−2 + 4−1
5
= 2
• 𝑁𝑀𝐴𝐸 =
𝑀𝐴𝐸
5−1
= 0.5
• 𝑅𝑀𝑆𝐸 =
1−3 2+ 2−5 2+ 3−3 2+ 4−2 2+ 4−1 2
5
= 2.28
Item Predicted Rating True Rating
1 1 3
2 2 5
3 3 3
4 4 2
5 4 1
69
Relevancy of Recommendations
• Precision
• 𝑃 =
𝑁 𝑟𝑠
𝑁𝑠
• Recall
• 𝑅 =
𝑁 𝑟𝑠
𝑁 𝑟
• F-measure
(F1 score)
• 𝐹 =
2𝑃𝑅
𝑃+𝑅
Recommended Items
Selected Not Selected Total
Relevancy
Relevant 𝑁𝑟𝑠 𝑁𝑟𝑛 𝑁𝑟
Irrelevant 𝑁𝑖𝑠 𝑁𝑖𝑛 𝑁𝑖
Total 𝑁𝑠 𝑁 𝑛 𝑁
70
Example of Relevancy
• 𝑃 =
9
12
= 0.75
• 𝑅 =
9
24
= 0.375
• 𝐹 =
2×0.75×0.375
0.75+0.375
= 0.5
Selected Not Selected Total
Relevant 9 15 24
Irrelevant 3 13 16
Total 12 28 40
71
Ranking of Recommendations
• Spearman’s rank correlation: For 𝑛 items
• 𝜌 = 1 −
6 𝑖=1
𝑛
𝑥 𝑖−𝑦 𝑖
2
𝑛3−𝑛
• 1 ≤ 𝑥𝑖 ≤ 𝑛: Predicted rank of item 𝑖
• 1 ≤ 𝑦𝑖 ≤ 𝑛: True rank of item 𝑖
• Kendall’s tau: For all 𝑛
2
pairs of items (𝑖, 𝑗)
• 𝜏 =
𝑐−𝑑
𝑛
2
, in range [−1,1]
• There are 𝑐 concordant pairs
• 𝑥𝑖 > 𝑥𝑗, 𝑦𝑖 > 𝑦𝑗 or 𝑥𝑖 < 𝑥𝑗, 𝑦𝑖 < 𝑦𝑗
• There are 𝑑 discordant pairs
• 𝑥𝑖 > 𝑥𝑗, 𝑦𝑖 < 𝑦𝑗 or 𝑥𝑖 < 𝑥𝑗, 𝑦𝑖 > 𝑦𝑗
72
Example of Ranking
Item
Predicted
Rank
True Rank
𝑖1 1 1
𝑖2 2 4
𝑖3 3 2
𝑖4 4 3
• Kendall’s Tau
• 𝜏 =
4−2
6
= 0.33
• (𝑖1, 𝑖2): concordant
• (𝑖1, 𝑖3): concordant
• (𝑖1, 𝑖4): concordant
• (𝑖2, 𝑖3): discordant
• (𝑖2, 𝑖4): discordant
• (𝑖3, 𝑖4): concordant
• Spearman’s rank correlation
– 𝜌 = 1 −
6 1−1 2+ 2−4 2+ 3−2 2+ 4−3 2
43−4
= 0.4
73
Types of Recommender Systems
• Content-based Recommendation (CBR)
• Collaborative Filtering
• Other solutions
• Restricted Boltzmann Machines
• Clustering-based recommendation
• Association rule based recommendation
• Random-walk based recommendation
• Advanced Issues
74
Advanced issue 1: Recommendation
with ratings and features
Ratings
User Features
ItemFeature
75
Factorization Machine (FM)
• Matrix Factorization (MF)
• Factorization Machine
Model:
VUY T

sparsehighlyis,)( xxfY 
ji
n
i
n
ij
ji
n
i
ii xxvvxwwxy    

1 11
0 ,:)(

76
Steffen Rendle “factorization machines” 2010
Advance Issue 2: Handling Cold
Start
• What happens if we don’t have rating/review
information for some items or users
• Solution: transfer information from other domains
• Use book review data to improve a movie recommender
system
• Use ratings from one sources (e.g. Yhaoo) to improve
the recommender of another (e.g. Pchome).
77
Why Transfer Recommendation?
• Nowadays, users can
• provide feedbacks for items of different types • e.g., in
Amazon we can rate books, DVDs, …
• express their opinions on different social media and
different providers (Facebook, Twitter, Amazon)
• Transfer recommendation tries to utilize
information from source domain to Improve the
quality of an RS
• Solving cold-start problem (very few ratings for some
users or some items)
• Improving accuracy
78
Transfer between domains
• Source domain: contains knowledge to transfer to
target, usually more data.
• Target domain: needs help from the source domain,
usually contains cold start users or items.
• Domains differ because of
• different types of items
• Movies vs. Books
• different types of users
• American vs. Taiwanese
Source Target
Movies Music
Pop music Jazz music
Taiwan movie US movies 79
Case Study1: Matching Users and Items Across Domains
to Improve the Recommendation Quality (Li & Lin,
KDD2014)
Given: Two homogeneous rating matrices
 They model user’s preference to same type of items.
 Certain portion of overlap in users and in items.
𝐑1
Target Rating Matrix
♫ ♫ ♫
𝐑2
Source Rating Matrix
♫ ♫ ♫
Challenge:
The mappings of users cross
matrices are unknown, and so are
the mappings of items.
Goals:
1. Identify the user mappings and
item mappings.
2. Use the identified mappings to
boost the recommendation
performance. Shou-De Lin 802016/12/17
Case Study2: Why Transfer
Recommendation is important
for New Business
81
You are the CEO of an online
textbook store Ambzon.com which
used to sell only English books in USA.
82
You want to expand the business
across the world to sell some Chinese
textbooks in Taiwan.
• You need a recommender system for such purpose.
• Unfortunately, you only have few user-item rating
records for the Chinese textbooks rated by
Taiwanese.
83
However, you have much more
ratings on the English textbooks rated
by the American people
84
User
Book
B1 B2 B3 B4 B5
U1 1 4 3 3
U2 2 1
U3 3 5 2
U4 4 3 6
U5 2 2 4
U6 4 6
Challenge: You will have to leverage
the ratings from the English
textbooks to enhance the
recommendation on Chinese
textbooks?
85
How can Transfer be Done?
• Feature sharing
• user demographic features in both domains
• item attributes in both domains
• Linking users and items
• Finding overlapping users and items across domains
• Connecting users (e.g. friendship) and items (e.g.
content) across domains
86
Sharing Latent Features of users/items
87
Cons: need overlap of
users and/or items
Transferring rating patterns
88
No need to assume
users or items have
overlap
Advanced Issues 3: Using Social
information in Recommendation
(Friend recommendation, link prediction)
(Memory-based CF using social networks)
(Model-based CF using social networks)
89
Social Context Alone
• Friend recommendation in social networks
• Methods
• Link prediction
• Network structure e.g. triad
• Two friends of an individual are often friends
• Friend recommendation: Open triad (missing one edge)
Triad Open triad 90
Common Friends
• If 𝑝 and 𝑞 have many common
friends, then 𝑝 tends to be a better
recommendation for 𝑞
• 𝑠𝑐𝑜𝑟𝑒 = 𝐹 𝑝 ∩ 𝐹(𝑞)
• 𝐹 𝑝 : Friend set of person 𝑝
• If a common friend 𝑓 of 𝑝 and 𝑞 has
many friends (large degree), it is less
possible to believe the friendship of
𝑝 and 𝑞 because of 𝑓
• 𝑠𝑐𝑜𝑟𝑒 = 𝑓∈𝐹 𝑝 ∩𝐹(𝑞)
1
𝐷(𝑓)
• 𝐷(𝑓): Degree (number of friends) of
person 𝑓
𝑝 𝑞
𝑝
𝑓
𝑞
degree = 106
degree = 2
91
Recommendation Using Social
Context
92
Social Rec V.S. Collaborative Filtering
Rating matrix
𝑅
Social matrix 𝑇
Items
Users
Rating matrix
𝑅
Traditional recommendation systems
Social recommendation systems
93
Memory-Based Social Recommendation
• Especially for user-based CF
• Constraining the set of correlated users
– Traditional: 𝑁(u) obtained from rating matrix 𝑅
– Social: 𝑁+
(u) obtained from both the rating matrix 𝑅
and social matrix 𝑇
• The prediction model (aggregation of the ratings
from the correlated users) remains the same
𝑟𝑢,𝑖 = 𝑟 𝑢 +
𝑣∈𝑁+(𝑢) 𝑠𝑖𝑚(𝑢, 𝑣)(𝑟𝑣,𝑖 − 𝑟 𝑣)
𝑣∈𝑁+(𝑢) 𝑠𝑖𝑚(𝑢, 𝑣)
94
Representative Approaches (1/3)
• Social based weight mean
– Simply consider 𝑢𝑖’s friends 𝐹(𝑢𝑖) as 𝑁+
(𝑢𝑖)
– Approach 1: 𝑁+ 𝑢𝑖 = 𝑁 𝑢𝑖 ∩ 𝐹(𝑢𝑖)
• Could be problematic when 𝑁 𝑢𝑖 ∩ 𝐹 𝑢𝑖 = ∅
– Approach 2: 𝑁+ 𝑢𝑖 = 𝑆(𝑢𝑖)
• 𝑆(𝑢𝑖): The 𝑘 most similar friends of 𝑢𝑖 from 𝐹(𝑢𝑖)
– The predicted ratings from friends can be far
different from the ratings predicted using most
similar users
95
Example (1 / 2)
• To predict 𝑟𝐽𝑖𝑙𝑙,𝑀𝑢𝑙𝑎𝑛
given cosine similarity
– User-based CF
– Neighborhood size is 2
– 𝐹 𝐽𝑖𝑙𝑙 =
𝐽𝑜𝑒, 𝐽𝑎𝑛𝑒, 𝐽𝑜𝑟𝑔𝑒
Rating Lion King Aladdin Mulan Anastasia
John 4 3 2 2
Joe 5 2 1 5
Jill 2 5 ? 0
Jane 1 3 4 3
Jorge 3 1 1 2
Friend John Joe Jill Jane Jorge
John 1 1
Joe 1 1
Jill 1 1 1
Jane 1
Jorge 1 1
𝑟𝐽𝑜ℎ𝑛 =
4 + 3 + 2 + 2
4
= 2.75
𝑟𝐽𝑜𝑒 =
5 + 2 + 1 + 5
4
= 3.25
𝑟𝐽𝑖𝑙𝑙 =
2 + 5 + 0
3
= 2.33
𝑟𝐽𝑎𝑛𝑒 =
1 + 3 + 4 + 3
4
= 2.75
𝑟𝐽𝑜𝑟𝑔𝑒 =
3 + 1 + 1 + 2
4
= 1.75
96
Example (2 / 2)
• We have
– Rating similarity ∶
𝑁 𝐽𝑖𝑙𝑙 = {𝐽𝑜ℎ𝑛, 𝐽𝑎𝑛𝑒}
– Friendship:
𝐹 𝐽𝑖𝑙𝑙 = 𝐽𝑜𝑒, 𝐽𝑎𝑛𝑒, 𝐽𝑜𝑟𝑔𝑒
– 𝐹 𝐽𝑖𝑙𝑙 ∩ 𝑁 𝐽𝑖𝑙𝑙 = 𝐽𝑎𝑛𝑒
– 𝑆 𝐽𝑖𝑙𝑙 = {𝐽𝑎𝑛𝑒, 𝐽𝑜𝑟𝑔𝑒},
when k=2
𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝑱𝒐𝒉𝒏 =
2 × 4 + 5 × 3 + 0 × 2
22 + 52 + 02 42 + 32 + 22
= 𝟎. 𝟕𝟗
𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝐽𝑜𝑒 =
2 × 5 + 5 × 2 + 0 × 5
22 + 52 + 02 52 + 22 + 52
= 0.50
𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝑱𝒂𝒏𝒆 =
2 × 1 + 5 × 3 + 0 × 3
22 + 52 + 02 12 + 32 + 32
= 𝟎. 𝟕𝟐
𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝐽𝑜𝑟𝑔𝑒 =
2 × 3 + 5 × 1 + 0 × 2
22 + 52 + 02 32 + 12 + 22
= 0.54
Approach 1: 𝑟𝐽𝑖𝑙𝑙,𝑀𝑢𝑙𝑎𝑛 = 2.33 +
0.72(4−2.75)
0.72
= 3.58
Approach 2: 𝑟𝐽𝑖𝑙𝑙,𝑀𝑢𝑙𝑎𝑛 = 2.33 +
0.72 4−2.75 +0.54(1−1.75)
0.72+0.54
= 2.72
97
Recommendation Using Social Context
98
MF Framework with Social Info
• Framework
– min
𝑈,𝑉,Ω
𝑊⨀ 𝑅 − 𝑈 𝑇
𝑉 𝐹
2
+ 𝛼 ∗ 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω +
𝜆( 𝑈 𝐹
2
+ 𝑉 𝐹
2
+ Ω 𝐹
2
)
– 𝑆𝑜𝑐𝑖𝑎𝑙(𝑇, 𝑆, Ω): Social information
– 𝑇: Adjacency matrix of a social network
– 𝑆: Similarity matrix
• E.g. Cosine similarity 𝑆𝑖,𝑗 between rating vectors 𝑅𝑖, 𝑅𝑗 of two users 𝑢𝑖 and
𝑢𝑗 if they are connected in the social network; 𝑆𝑖,𝑗 = 0 otherwise
– Ω: The set of parameters learned from social information
– 𝑊: The weights of known ratings; 𝑊𝑖,𝑗 = 1 if 𝑅𝑖,𝑗 known
• Categories w.r.t. different 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω
– Co-factorization methods
– Ensemble methods
– Regularization methods
99
Co-Factorization Methods (1/2)
• The user-item rating matrix 𝑅 and user-user social
matrix 𝑇 shares the same user latent preference
• SoRec
– 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖
𝑇
𝑧 𝑘
2
– 𝑁(𝑖): The set of neighbors of 𝑢𝑖 in the social network
– 𝑍 = 𝑧1, 𝑧2, … , 𝑧 𝑛 ∈ ℝ 𝐾×𝑛: Latent feature matrix of 𝐾
latent factors
– The user preference matrix 𝑈 = 𝑢1, 𝑢2, … , 𝑢 𝑛
𝑇
∈ ℝ 𝑛×𝐾
is learned from both rating information and social
information
– Optimization problem:
min
𝑈,𝑉,Z
𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹
2
+ 𝛼
𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖)
𝑆𝑖,𝑘 − 𝑢𝑖
𝑇
𝑧 𝑘
2
+ 𝜆( 𝑈 𝐹
2
+ 𝑉 𝐹
2
+ 𝑍 𝐹
2
)
100Ma, H., Yang, H., Lyu, M., King, I.: SoRec: social recommendation using probabilistic matrix factorization. In:
Proceeding of the 17th ACM conference on Information and knowledge management, pp. 931–940. ACM (2008)
Co-Factorization Methods (2/2)
• LOCABAL
– 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖
𝑇
𝐻𝑢 𝑘
2
– The user preference of two connected users are correlated
via a correlation matrix 𝐻 ∈ ℝ 𝐾×𝐾
– A large value of 𝑆𝑖,𝑘 (strong connection) indicates that user
preference 𝑢𝑖 and 𝑢 𝑘 should be highly correlated via 𝐻
– Optimization problem:
min
𝑈,𝑉,𝐻
𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹
2
+ 𝛼
𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖)
𝑆𝑖,𝑘 − 𝑢𝑖
𝑇
𝐻𝑢 𝑘
2
+ 𝜆( 𝑈 𝐹
2
+ 𝑉 𝐹
2
+ 𝐻 𝐹
2
)
101
Tang, J., Hu, X., Gao, H., Liu, H.: Exploiting local and global social context for recommendation. In: IJCAI (2013)
Additional Benefit the Co-Factorization
• Since we want to learn the user latent U s.t.
min 𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖
𝑇
𝐻𝑢 𝑘
2
, it is possible to reconstruct S as SUTHU
• By doing such, we are performing link
prediction
• That means co-factorization allows us to
perform recommendation and link prediction
jointly
102
Ensemble Methods (1/2)
• CF and their social neighbors jointly contribute to the
rating prediction
– A missing rating for a given user is predicted as a linear
combination of
• ratings from the similar users/items (CF), and
• ratings from the social neighbors (social network)
• STE
– The estimated rating from user 𝑢𝑖 to item 𝑣𝑗 is defined as
𝑅𝑖,𝑗 = 𝑢𝑖
𝑇
𝑣𝑗 + 𝛽
𝑢 𝑘∈𝑁(𝑖)
𝑆𝑖,𝑘 𝑢 𝑘
𝑇
𝑣𝑗
– Optimization problem:
min
𝑈,𝑉
𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 − 𝛽𝑆𝑈 𝑇 𝑉 𝐹
2
103
H. Ma, I. King, and M. R. Lyu. Learning to recommend with social trust ensemble. In SIGIR 2009 , pages 203–210.
Ensemble Methods (2/2)
• mTrust
– The estimated rating from user 𝑢𝑖 to item 𝑣𝑗 is defined
as
𝑅𝑖,𝑗 = 𝑢𝑖
𝑇
𝑣𝑗 + 𝛽
𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑅 𝑘,𝑗
𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘
– Optimization problem:
min
𝑈,𝑉,𝑆
𝑖 𝑗
𝑊𝑖,𝑗 𝑅𝑖,𝑗 − 𝑢𝑖
𝑇
𝑣𝑗 − 𝛽
𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 𝑅 𝑘,𝑗
𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘
2
– The similarity matrix 𝑆 in mTrust needs to be learned,
while it is predefined in STE
weighted mean of the ratings of vj from ui’s
social neighbor
104
Tang, J., Gao, H., Liu, H.: mTrust: Discerning multi-faceted trust in a connected world. In: Proceedings of the
fifth ACM international conference onWeb search and data mining, pp. 93–102. ACM (2012)
Regularization Methods (1/2)
• A user’s preference should be similar to that of the
user’s social neighbors
• SocialMF
– 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1
𝑛
𝑢𝑖 − 𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 𝑢 𝑘
2
– 𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 𝑢 𝑘: Weighted average preference of the user’s
neighbors
– Optimization problem:
min
𝑈,𝑉
𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹
2
+ 𝛼
𝑖=1
𝑛
𝑢𝑖 −
𝑢 𝑘∈𝑁(𝑖)
𝑆𝑖,𝑘 𝑢 𝑘
2
+ 𝜆( 𝑈 𝐹
2
+ 𝑉 𝐹
2
)
105
Jamali, M., Ester, M.: A matrix factorization technique with trust propagation for recommendation in social
networks. In: Proceedings of the fourth ACM conference on Recommender systems, pp. 135–142. ACM (2010)
Regularization Methods (2/2)
• Social Regularization
– 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑢𝑖 − 𝑢 𝑘
2
– A large value of 𝑆𝑖,𝑘 (strong connection) implies
that user 𝑢𝑖 and 𝑢 𝑘 have more similar preference
(distance of latent factors is smaller)
– Optimization problem:
min
𝑈,𝑉
𝑊⨀ 𝑅 − 𝑈 𝑇
𝑉 𝐹
2
+ 𝛼
𝑖=1
𝑛
𝑢 𝑘∈𝑁(𝑖)
𝑆𝑖,𝑘 𝑢𝑖 − 𝑢 𝑘
2 + 𝜆( 𝑈 𝐹
2
+ 𝑉 𝐹
2
)
106
Ma, H., Zhou, D., Liu, C., Lyu, M., King, I.: Recommender systems with social regularization. In: Proceedings of
the fourth ACM international conference on Web search and data mining, pp. 287–296. ACM (2011)
Datasets
• There are no agreed benchmark datasets for
social recommendation systems
• Used datasets
– Epinions03
• Product reviews and ratings, directed trust network
– Flixster
• Moving ratings, undirected social network
– Ciao
• Product reviews and ratings with time
, categories, etc.
– Epinions11
• More information included e.g. temporal information,
categories of products, distrust, etc.
107
Advanced Issue 4: Implicit
Recommendation
• Sometimes it is hard to obtain user ratings
toward items.
• However, it is easier to obtain some implicit
information from users to items, such as
– Browsing
– Purchasing
– Listen/watch
Implicit
ratings
Item1 Item2 Item3 Item4
User1 ? 1 ? ?
User2 1 ? ? 1
User3 ? 1 ? 1
User4 1 ? 1 ?
• The rating matrix contains only ‘1’ and ‘?’
• Regular MF would predict every ‘?’ to 1 to
minimize the squared error
108
One-Class Collaborative Filtering-
Matrix Factorization (OCCF-MF) Model
• MF model approximates a rating by the inner
product of a user feature vector and an item
feature vector.
• However, directly apply MF models on the
original rating matrix yields poor performance
– Due to the ignorance of the negative instances
• Solution 1 (OCCF-MF) : for each user, sample
some items with negative ratings to create a
more condensed matrix before applying MF.
• Solution 2 (BPR-MF): Using pairwise ranking to
update the model.
109
Advanced Issue 5: Location-based
Recommendation
• Learning Travel Recommendations from User-
generated GPS Traces. ACM TIST 2011.
• Measuring and Recommending Time-Sensitive Routes
from Location-based Data. ACM TIST 2014.
• New Venue Recommendation in Location-Based Social
Networks.
IEEE SocialCom 2012.
• Exploiting Geographical Influence for Collaborative
Point-of-Interest Recommendation. ACM SIGIR 2011.
• Time-aware Point-of-Interest Recommendation. ACM
SIGIR 2013.
110
S
Empire State
Building
Madison Square
Garden
Time Square
MOMA
Metropolitan
Museum of Art
Central Park
Yankee
Stadium
The Statue
of Liberty
Current Location
SOHO
World Trade
Center Wall Street
10:00 AM
D
11:00 AM
13:00 PM
16:00 PM
19:00 PM
21:00 PM
2016/12/17 111
• Places are sensitive to the visiting time, for example,
– People usually visit the Empire State Building from about
12:00 to the mid night (night view is popular)
– People tend to visit the Madison Square Garden in the early
evening for a basketball game
– The proper time to visit the Central Park is during daytime
– Time Square is preferred from afternoon to midnight.
The proper visiting time of a place
2016/12/17 112
Overview of our approaches
• Incorporate four important factors into a goodness
function
– popularity of a place
– visiting order of places
– proper visiting time of a place
– proper transit time between places
• Construct one-day route with high goodness function
efficiently
– Propose the A* like algorithm to construct route toward
destination
2016/12/17 113
Scenario
2016/12/17 114
Central Park(8:00AM)
↓
Grand Army plaza(11:00AM)
↓
5th ave(12:00PM)
↓
Rockefeller Center(14:00PM)
↓
Time Square(16:00PM)
Popular?
Proper visiting
time?
Proper transit
time?
Smooth order?
• Definition: Temporal Visiting Distribution
(TVD), as the probability distribution of check-
in for a location
Proper Visiting Time (1)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Probability
Time (hour)
TVD
TVD
2016/12/17 115
• The goodness of visiting a place at 8:00AM
– A thin Gaussian distribution (with 8 mean and small
variance).
– Measure the difference between the 𝐺(𝑡; 𝜇, 𝜎2) with
TVD.
• Symmetric Kullback-Leibler (KL) Divergence.
Proper Visiting Time (2)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Probability
Time (hour)
TVD
Normal
Distribution
2016/12/17 116
• Definition: Duration Distribution (DD),
The probability distribution over time between
locations li and lj
Proper Transit Time Duration(1)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Probability
Time (hour)
DD
DD
2016/12/17 117
• Similar to what we do to TVD, given a pair of
locations li and lj, we can model a thin
Gaussian distribution(∆) and compare it with
DD using symmetric KL divergence.
Proper Transit Time Duration(2)
2016/12/17 118
• Exploit n-gram language model to measure the
quality of the order
• Uni-gram(Popularity), bi-gram, and tri-gram
Proper Visiting Order (1)
2016/12/17 119
• The goodness of visiting order of a route :
Proper Visiting Order (2)
2016/12/17 120
• Use a parameter α ϵ [0,1] to devise a linear
combination of such two parts
– Time-sensitive properties
– Popularity and Order
Final Goodness Function
2016/12/17 121
ls
10 AM
ld
l9
l5
l6
l7
l8
? ? ?
Step1. For each candidate lc , calculate
g(𝑙 𝑐) = 𝑓 𝑙 𝑠 → 𝑙 𝑐
Step2. Choose the best 𝑓 𝑙 𝑐 → 𝑙 𝑑
Step3. Calculate the distance of 𝑑(𝑙𝑐 → 𝑙 𝑑)
l2
Route Construction -
Guidance Search
l3
l4
13 PM 15 PM 18 PM 20 PM
Step4. ℎ 𝑙 = 𝑚𝑎𝑥(𝑙→𝑙 𝑑)∈𝑆(𝑙→𝑙 𝑑){ 𝑓 𝑙 𝑐 → 𝑙 𝑑 × (1 − 𝑑(𝑙𝑐 → 𝑙 𝑑))}
Step5. 𝑓∗ 𝑙 = (1 − 𝛽) × 𝑔 𝑙 + 𝛽 × ℎ(𝑙)
The heuristic satisfaction function
𝑓∗ 𝑙 direct the search algorithm to
move toward destination
• Utilize the Gowalla dataset crawled by Dr. Jure Leskovec
– contains 6,442,890 check-in records from Feb. 2009 to Oct.
2010. The total number of check-in locations is 1,280,969.
• Extract three subsets of the check-in data, which
corresponds to cities of New York, San Francisco, and
Paris.
Gowalla dataset
Total Number
of Check-ins
Avg. Route
Length
Variance of
Route Length
Distinct Check-in
Locations
RouteDB 6,442,890 4.09 48.04 1,280,969
New York 103,174 4.46 71.24 14,941
San Francisco 187,568 4.09 58.36 15,406
Paris 14,224 4.45 75.73 3,472
2016/12/17 123
Three experiments
2016/12/17 124
• Pair-wise Time-sensitive Route Detection
exploiting replacing strategy
– Is the designed goodness function reasonable?
• Time-sensitive Cloze Test of Locations in Routes
– Can the heuristic function and search method capture the
human mobility and toward destination?
• Approximation ratio of Guidance Search
– How close is the goodness measure of our method
toward the global optimal value
• Evaluation measure
– Accuracy: is calculated as the number of successfully
detected routes divided by the number of pair
instances.
Evaluation Plans(1)
2016/12/17 125
ls
10 AM 20 PM
l2 l3 l4
18 PM15 PM13 PM
ld
ls
10 AM 20 PM
f2 l3 f4
18 PM15 PM13 PM
ld
Replace
locations with
‘plausible’ ones
• Baseline
– Distance-based Approach:
• Choose the closest location to the current spot as the next spot to
move to.
– Popular-based Approach:
• Choose the most popular spot of a given time in that city as the next
spot to move to.
– Forward Heuristic Approach:
• Choose a location li that possesses the largest bi-gram probability
with the previous location P(li|li-1) as the next location to move to.
– Backward Heuristic Approach:
• Choose a location li that possesses the largest bi-gram probability
with the next location P(li+1|li) as the next location to move to.
Compared methods
2016/12/17 126
Experiment 1: Pairwise Time-Sensitive
Route Detection
Accuracy for New York Accuracy for San Francisco
Accuracy for Paris
2016/12/17 127
• Evaluation measure
– Hit rate: means the ratio of guessing correctly
instances
Evaluation Plans(2)
2016/12/17 128
ls
10 AM 20 PM
l2 l3 l4
18 PM15 PM13 PM
ld
ls
10 AM 20 PM
? ? ?
18 PM15 PM13 PM
ld
Removing some
middle locations
Experiment 2: Time-Sensitive Cloze
Test in Routes
Hit rate for New York Hit rate for San Francisco
Hit rate for Paris
2016/12/17 129
0%
5%
10%
15%
20%
25%
30%
35%
1 2 3 4 5
Hitrate
Number of guessing instance per route
Our Search
method
Greedy Search[8]
Distance-based
method
Popular-based
method
forward heuristic
backward heuristic 0%
5%
10%
15%
20%
1 2 3 4 5
Hitrate
Number of guessing instance per route
Our Search method
Greedy Search[8]
Distance-based
method
Popular-based
method
forward heuristic
backward heuristic
0%
5%
10%
15%
20%
25%
30%
35%
40%
1 2 3 4 5
Hitrate
Number of guessing instance per route
Our Search
method
Greedy Search[8]
Distance-based
method
Popular-based
method
forward heuristic
backward heuristic
Experiment 2: Time-Sensitive Cloze
Test in Routes
Hit rate for New York Hit rate for San Francisco
Hit rate for Paris
2016/12/17 130
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
0 0.25 0.5 0.75 1
Hitrate
β
Our Search
method
Greedy Search[8]
Distance-based
method
Popular-based
method
forward heuristic
backward heuristic
0%
2%
4%
6%
8%
10%
0 0.25 0.5 0.75 1
Hitrate
β
Our Search
method
Greedy Search[8]
Distance-based
method
Popular-based
method
forward heuristic
backward
heuristic
0%
5%
10%
15%
20%
25%
0 0.25 0.5 0.75 1
Hitrate
β
Our Search
method
Greedy Search[8]
Distance-based
method
Popular-based
method
forward heuristic
The approximation ratio of all methods
compared with the Exhaustive Search.
2016/12/17 131
• Average route construction time
<=0.0043(65814 times faster than Exhaustive
Search)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ApproximationRatio
𝛽
Guadian Search
Greedy Search[8]
Distance-based method
Popular-based method
Forward heuristic
Backward heuristic
• Propose a novel time-sensitive route planning problem using
the check-in data in location-based services.
• The first work considering four elements
– popularity of a place
– visiting order of places
– proper visiting time of a place
– proper transit time between places
• Model the four factors into the our design of the goodness
function by exploiting some statistical methods.
• Given time-sensitive location query, we propose a Guidance
Search to search the route by optimizing the goodness
function
Conclusions
2016/12/17 132
Advanced Issue 6: Explaining the
Recommendation Outputs
• With reasonable explanation,
recommendation can be more powerful
– Recommendation  Persuasion
• Goal: how can the system automatically
generates human-understandable
explanations for the recommendation.
133
Explanation in recommender systems, AIR 2005
Explaining collaborative filtering recommendations CSCW 2000
Advanced Issue 7: Recommendation
for Groups of Users
• Recommendation for an individual 𝑢 can be
extended to a group 𝐺 of individuals
– e.g. advertisement on a social media site
• Aggregation strategy
– Considering the ratings for each individual
– Aggregating ratings for the individuals in a group
134
Aggregation Strategies
• Maximizing average satisfaction
– For a group 𝐺 of 𝑛 users, we have the average rating 𝑅𝑖 =
1
𝑛 𝑢∈𝐺 𝑟𝑢,𝑖 for all items 𝑖
– Recommending the items that have the highest 𝑅𝑖’s
• Least misery
– No user is recommended an item that he or she strongly
dislikes
– 𝑅𝑖 = min
𝑢∈𝐺
𝑟𝑢,𝑖
• Most pleasure
– At least one user enjoys the recommendation most
– 𝑅𝑖 = max
𝑢∈𝐺
𝑟𝑢,𝑖
135
Example of Aggregation
• Group 𝐺 = 𝐽𝑜ℎ𝑛, 𝐽𝑖𝑙𝑙, 𝐽𝑢𝑎𝑛
• Best recommendation
– Average satisfaction: Tea
– Least misery: Water
– Most pleasure: Coffee
Soda Water Tea Coffee
John 1 3 1 1
Joe 4 3 1 2
Jill 2 2 4 2
Jorge 1 1 3 5
Juan 3 3 4 5
Average Satisfaction Least Misery Most Pleasure
𝑅 𝑆𝑜𝑑𝑎 =
1 + 2 + 3
3
= 2
𝑅 𝑊𝑎𝑡𝑒𝑟 =
3 + 2 + 3
3
= 2.66
𝑅 𝑇𝑒𝑎 =
1 + 4 + 4
3
= 3
𝑅 𝐶𝑜𝑓𝑓𝑒𝑒 =
1 + 2 + 5
3
= 2.66
𝑅 𝑆𝑜𝑑𝑎 = min{1,2,3} = 1
𝑅 𝑊𝑎𝑡𝑒𝑟 = min{3,2,3} = 2
𝑅 𝑇𝑒𝑎 = min{1,4,4} = 1
𝑅 𝐶𝑜𝑓𝑓𝑒𝑒 = min{1,2,5} = 1
𝑅 𝑆𝑜𝑑𝑎 = max{1,2,3} = 3
𝑅 𝑊𝑎𝑡𝑒𝑟 = max{3,2,3} = 3
𝑅 𝑇𝑒𝑎 = max{1,4,4} = 4
𝑅 𝐶𝑜𝑓𝑓𝑒𝑒 = max{1,2,5} = 5
136
Quiz:推薦系統之阿拉丁神燈part 3
• 小胖跟蔡10自從上次共進晚餐後互有好感,於是FB
狀態改成「穩定交往中」。
• 有天,蔡10跟小胖借了神燈來許願,她說:「我平
常工作很忙,沒有什麼好朋友,你可以推薦一個朋
友給我嗎?」
• 神燈巨人說:沒問題。於是繃的一聲,一個女孩變
了出來。
• 小胖看到這個女孩臉色大變,原來出現的是他的前
女友。神燈巨人說:「我推薦她給蔡10,因為妳們
兩個有共同好友!!!!!」
• 請問,神燈巨人是執行何種推薦系統?
137
Library for
Recommender Systems
138
LIBMF: A Matrix-factorization Library for
Recommender Systems
URL: https://www.csie.ntu.edu.tw/~cjlin/libmf/
Language: C++
Focus: solvers for Matrix-factorization models
R interface: package “recosystem”
https://cran.r-project.org/web/packages/recosystem/index.html
Features:
• solvers for real-valued MF, binary MF, and one-class MF
• parallel computation in a multi-core machine
• less than 20 minutes to converge on a data set of 1.7B ratings
• supporting disk-level training, which largely reduces the memory usage
13
9
MyMediaLite - a recommender system algorithm
library
URL: http://mymedialite.net/
Language: C#
Focus: rating prediction and item prediction from positive-
only feedback
Alogrithms: kNN, BiasedMF, SVD++...
Features:
Measure: MAE, RMSE, CBD, AUC, prec@N, MAP, and NDCG
14
0
LibRec: A Java Library for Recommender Systems
URL: http://www.librec.net/index.html
Language: Java
Focus: algorithms for rating prediction and item ranking
Algorithms: kNN, PMF, NMF, BiasedMF, SVD++, BPR,
LDA…more at: http://www.librec.net/tutorial.html
Features:
Faster than MyMediaLite
Collection of Datasets: MovieLens 1M, Epinions, Flixster...
http://www.librec.net/datasets.html
14
1
mrec: recommender systems library
URL: http://mendeley.github.io/mrec/
Language: Python
Focus: item similarity and other methods for implicit
feedback
Algorithms: item similarity methods, MF, weighted MF for
implicit feedback
Features:
• train models and make recommendations in parallel using IPython
• utilities to prepare datasets and compute quality metrics
14
2
SUGGEST: Top-N recommendation engine
Python interface: “pysuggest”
https://pypi.python.org/pypi/pysuggest
Focus: collaborative filtering-based top-N recommendation
algorithms (user-based and item-based)
Algorithms: user-based or item-based collaborative filtering
based on various similarity measures
Features:
low latency: compute top-10 recommendations in less that 5us
14
3
Conclusion
• Recommendation is arguably the most successful
AI/ML solutions till now.
• It is not just for customer-product
• Matching users and users
• Matching users and services
• Matching users and locations
• …
• The basic skills and tools are mature, but the
advanced issues are not fully solved.
• Lack of data for cold start users is still the main
challenging task to be solved.
144
References (1/3)
• SVD++
– Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." Proceedings of
the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.
• TF
– Karatzoglou, Alexandros, et al. "Multiverse recommendation: n-dimensional tensor
factorization for context-aware collaborative filtering." Proceedings of the fourth ACM
conference on Recommender systems. ACM, 2010.
• FM
– Rendle, Steffen, et al. "Fast context-aware recommendations with factorization machines." Proceedings of the 34th
international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011.
• BPR-MF
– Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the twenty-fifth
conference on uncertainty in artificial intelligence. AUAI Press, 2009.
• BPMF
– Salakhutdinov, Ruslan, and Andriy Mnih. "Bayesian probabilistic matrix factorization using Markov
chain Monte Carlo." Proceedings of the 25th international conference on Machine learning. ACM,
2008.
• PF
– Gopalan, Prem, Jake M. Hofman, and David M. Blei. "Scalable recommendation with poisson
factorization." arXiv preprint arXiv:1311.1704 (2013). 145
References (2/3)
• Link prediction
– Menon, Aditya Krishna, and Charles Elkan. "Link prediction via matrix factorization." Joint
European Conference on Machine Learning and Knowledge Discovery in Databases. Springer
Berlin Heidelberg, 2011.
• Community detection
– Yang, Jaewon, and Jure Leskovec. "Overlapping community detection at scale: a nonnegative matrix factorization
approach." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
– Psorakis, Ioannis, et al. "Overlapping community detection using bayesian non-negative matrix factorization." Physical
Review E 83.6 (2011): 066114.
• Clustering
– Ding, Chris HQ, Xiaofeng He, and Horst D. Simon. "On the Equivalence of Nonnegative Matrix Factorization
and Spectral Clustering." SDM. Vol. 5. 2005.
– Kuang, Da, Sangwoon Yun, and Haesun Park. "SymNMF: nonnegative low-rank approximation of a
similarity matrix for graph clustering." Journal of Global Optimization 62.3 (2015): 545-574.
• Word embedding
– Levy, Omer, and Yoav Goldberg. "Neural word embedding as implicit matrix factorization."
Advances in neural information processing systems. 2014.
– Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global Vectors for Word
Representation." EMNLP. Vol. 14. 2014.
146
References (3/3)
• The Matrix Cookbook
– http://www2.imm.dtu.dk/pubdb/views/edoc_do
wnload.php/3274/pdf/imm3274.pdf
– Collection of matrix facts, including derivatives
and expected values
147
人工智慧與機器學習
Shou-De Lin(林守德)
2016/12/17 1
CSIE/GINM, NTU
sdlin@csie.ntu.edu.tw
人工智慧
機器學習
推薦系統
從實務面上定義人工智慧
• 能夠觀察,瞭解,並對人事物進行反應
(跟世界互動)
– Reinforcement Learning, Markov Decision Process
• 能夠找到最佳的解決辦法
– Optimization
• 能夠推論以及規劃
– Inference and Planning
• 能夠學習以及調適
– Machine Learning and adaption
強人工智慧 vs. 弱人工智慧
• 希望電腦的產生智慧方式跟人類一樣
(Strong AI)
• 希望電腦能夠展現出有智慧的外顯行為
(weak AI)
2016/12/17 3
What is Machine Learning
• ML tries to optimize a performance criterion using
example data or past experience.
• Mathematically speaking: given some data X, we want
to learn a function mapping f(X) for certain purpose
– f(x)= a label y  classification
– f(x)= a set Y in X  clustering
– f(x)=p(x)  probabilistic graphical model
– f(x)= a set of y  multi-label classification
– …
• ML techniques tell us how to produce high quality f(x),
given certain objective and evaluation metrics
2016/12/17 4
AI Research Are Mostly Driven By
Solving Concrete Challenges
Deep Blue (1996)
• 第一個贏過棋王的
程式
– 利用電腦平行運算
能力從事地毯式的
搜尋
– 在演算法上面並沒
有特別突破
Deep Blue
(photo taken by James at
Computer History Museum)
DARPA Grand Challenge (2004)
• DARPA 提供一百萬美金給第一台橫越內華
達沙漠的車
• 所有的隊伍在2004年都失敗了,但是在
2005年有5台車成功橫越。
• 2007 進化為Urban challenge
Loebner Prize
• $25000 給予第一個被認為是人類的聊天機
器人
– 類似目的的系統如Siri, 小冰
• 頗具爭議性的AI競賽
• 至今仍未有人贏得最高獎金
RoboCup (1997-now)
• 機器人足球競賽
• "By the middle of the 21st century, a team of
fully autonomous humanoid robot soccer playe
rs shall win a soccer game, complying with the
official rules of FIFA, against the winner of the
most recent World Cup.“
ACM KDD Cup (1997~now)
• It is an annual competition in the area of
knowledge discovery and data mining
• Organized by ACM special interest group on
KDD, started from 1997, now considered as
the most prestigious data mining competition
• Competition lasts roughly 2-4 months
Team NTU’s Performance on ACM KDD Cup
11
KDD Cups 2008 2009 2010 2011 2012 2013
Organizer Siemens Orange
PSLC
Datashop
Yahoo! Tencent Microsoft
Topic
Breast Cancer
Prediction
User Behavior
Prediction
Learner
Performance
Prediction
Recommendation
Internet
advertising
(track 2)
Author-paper &
Author name
Identification
Data Type Medical Telcom Education Music
Search Engine
Log
Academic Search
Data
Challenge
Imbalance
Data
Heterogeneous
Data
Time-
dependent
instances
Large Scale
Temporal +
Taxonomy Info
Click through
rate prediction
Alias in names
# of
records
0.2M 0.1M 30M 300M 155M
250K Authors,
2.5M papers
# of teams >200 >400 >100 >1000 >170 >700
Our Record Champion 3rd
place Champion Champion Champion Champion
2016/12/17 Prof. Shou-De Lin in NAU
IBM Watson (2011)
• 參與益智問答Jeopardy 節目打敗過去的冠軍
• Watson內部有上百個有智慧的模組(從語
言分析一直到搜尋引擎)。
• 代表電腦在「知識問答」的任務已經能夠
超過人類
AlphaGo (2016)
• Google’s AlphaGo 贏得號稱最困難的棋類:
圍棋
• 與IBM的深藍不同,AlphaGo是結合硬體與
複雜機器學習演算法
實現人工智慧的技術重點
• 知識表徵 Knowledge Representation
• (機器)學習:(machine) Learning
• (機器)規劃 :Planning
• 搜尋與最佳化 Search and Optimization
知識表徵 Knowledge Representation
• 知識表徵的方法有很多,取決於將來如何
「使用」這個知識。
– Logical Representations, e.g.  x, King(x) 
Greedy (x)  Evil (x)
– Rules (can be written in logic or other form)
– Semantic Networks/graphs
– Frames
From Wikipedia
機器學習(ML)
• 目的:從現有的資料建構一個系統能夠做
出最佳的決定
• 數學上而言,機器學習就是給定input X,想
要去學一個函數f(X)能夠產生我們要的結果
Y
– X: X光照片  Y:是否得癌症, classification
– X: 金融新聞  Y:股票指數, regression
– X: 很多人的照片  Y:把照片相似的分到同一
群,clustering
規劃 (planning)
• 為了達成某個目的,人工智慧的代理人
(Agent)根據環境做出相對應的行動
• Markov Decision Process
From Wikipedia
搜尋與最佳化
• 在許多可能性中,找出最好的選擇
– 下棋:在所有可能的下一步中,搜尋出最有可能勝
利的一步
– 機器人足球:在所有可能的行動裡,選擇最有可能
得分的行動
– 益智問答:在所有可能的答案裡,選擇最有可能是
正確
• 能夠被model成最佳化的問題,基本上AI就可
以解
– 關鍵在於搜尋與最佳化的速度
人工智慧與其他領域的結合
• 哲學(邏輯,認知,自我意識)
• 語言學(語法學,語意學,聲韻學)
• 統計與數學(機器學習)
• 控制(最佳化)
• 經濟學(Game Theory)
• 神經科學(類神經網路)
• 資訊工程(AI-driven CPU, 雲端計算)
• 資料科學(資料探勘)
應用一:推薦系統
• 推薦產品給客戶
– E-commence sites
– Service providers
• 推薦使用者給使用者
– Match maker
– Investment/donation
• 基本上,能夠對兩方做出智慧型的推薦
– Recommending resume to job
– Recommending course to students
應用2:物聯網(IoT)
• IoT 包括三個元素:資料,連結以及設備(感
測器)
• 相關應用有
– 智慧都市 (parking, traffic, waste management)
– 災害防治 (abnormal PM2.5 level, electromagnetic
filed levels, forest fire, earthquake, etc)
– 健康 (monitoring, fall detection, etc)
– 安全 (access control, law enforcement, terrorist
identification)
– 智慧農業
應用三:SOLOMO
• SOLOMOSocial, Local, Mobile
• 許多相關應用
–智慧導遊
–Nike+GPS: 路徑追蹤, 馬拉松運動轉播
–智慧廣告看板
應用4: 零售 & E-commence
• 智慧型產品規劃
• 智慧倉儲
• 貨品運輸最佳化
• 商品追蹤
機器學習概述
2016/12/17 24
A variety of ML Scenarios
Supervised Learning
(Classification &
Regression)
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
2016/12/17 25
Supervised Learning
• Given: a set of <input, output> pairs
• Goal: given an unseen input, predict the corresponding
output
• For example:
1. Input: X-ray photo of chests, output: whether it is cancerous
2. Input: a sentence, output: whether a sentence is grammatical
3. Input: some indicators of a company, output: whether it will
make profit next year
• There are two kinds of outputs an ML system generates
– Categorical: classification problem (E1 and E2)
• Ordinal outputs: small, medium, large
• Non-ordinal outputs: blue, green, orange
– Real values: regression problem (E3)
2016/12/17 26
Classification (1/2)
• It’s a supervised learning task that, given a real-valued feature
vector x, predicts which class in C may be associated with x.
• |C|=2  Binary Classification
|C|>2  Multi-class Classification
• Training and predicting of a binary classification problem:
Feature Vector (xi ∈ Rd) Class
x1 +1
x2 -1
… …
xn-1 -1
xn +1
Training set (Binary Classification)
Classifier f(x)
Feature Vector (xnew ∈ Rd) Class
xnew ?
A new instance
(1) Training (2) Predicting
2016/12/17 27
Classification (2/2)
• A classifier can be either linear or non-linear
• The geometric view of a linear classifier
• Famous classification models:
• k-nearest neighbor (kNN)
• Decision Tree (DT)
• Support Vector Machine (SVM)
• Neural Network (NN)
• …
+1
-1
w
wTx + b
2016/12/17 28
We will talk more
about classification
later !!
Regression (1/2)
• A supervised learning task that, given a real-valued
feature vector x, predicts the target value y ∈ R.
• Training and predicting of a regression problem:
Feature Vector (xi ∈ Rd) yi ∈ R
x1 +0.26
x2 -3.94
… …
xn-1 -1.78
xn +5.31
Training set
Regression
Model
Feature Vector (xnew ∈ Rd) ynew ∈ R
xnew ?
A new instance
(1) Training (2) Predicting
2016/12/17 29
Regression (2/2)
• The geometric view of a linear regression
function
• Some types of regression: linear regression,
support vector regression, …
2016/12/17 30
Multi-label Learning
• A classification task in that an instance is associated with
a set of labels, instead of a single label.
Feature Vector (xi ∈ Rd) 𝓁1 𝓁2 𝓁3
x1 +1 -1 +1
x2 -1 +1 -1
… … … …
xn-1 +1 -1 -1
xn -1 +1 +1
Training set
Classifier
A new instance
(1) Training (2) Predicting
Feature Vector (xnew ∈ Rd) 𝓁1 𝓁2 𝓁3
xnew ? ? ?
• Existing models: Binary Relevance, Label Powerset,
ML-KNN, IBLR, … 31
Multimedia tagging
• Many websites allow the user community to add tags, thus enabling easier
retrieval.
Example of a tag cloud: the beach boys, from Last.FM (Ma et
al., 2010)
32
Cost-sensitive Learning
• A classification task with non-uniform cost for
different types of classification error.
• Goal: To predict the class C* that minimizes the
expected cost rather than the misclassification rate
• Methods: cost-sensitive SVM, cost-sensitive sampling
jk
k
k
j
LxCYPC   )|(minarg*
• An example cost matrix L : medical diagnosis
Ljk
Actual Cancer Actual Normal
Predict Cancer 0 1
Predict Normal 10000 0
2016/12/17 33
Examples for Cost-sensitive Learning
• Highly non-uniform misclassification costs are
very common in a variety of challenging real-
world machine learning problems
– Fraud detection
– Medical diagnosis
– Various problems in business decision-making.
Credit cards are one of the most famous targets of
fraud. The cost of missing a target (fraud) is much
higher than that of a false-positive.
Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, “Generalized k-Labelsets Ensemble for Multi-
Label and Cost-Sensitive Classification,” IEEE Transactions on Knowledge and Data Engineering.
2016/12/17 34
Multi-instance Learning (1/2)
• A supervised learning task in that the training set
consists of bags of instances, and instead of
associating labels on instances, labels are only
assigned to bags.
• The goal is to learn a model and predict the label of a
new instance or a new bag of instances.
• In the binary case,
Positive bag  at least one instance in the bag is positive
Negative bag  all instances in the bag are negative
2016/12/17 35
Bag+
+
-
-
-
-
Bag-
-
-
-
-
-
Multi-instance Learning (2/2)
• Training and Prediction
Feature Vector (xi ∈ Rd) Bag
x1 1
x2 1
x3 2
x4 2
x5 2
… …
xn-1 m
xn m
Bag Class
1 +1
2 -1
… …
m -1
Feature Vector or Bag Class
BAGnew / xnew ?
A new instance or bag
• Some methods: Learning Axis-Parallel Concepts, Citation
kNN, mi-SVM, MI-SVM, Multiple-decision tree, …
Training Set
Classifier
(1) Training
(2) Predicting
BAGnew = {xnew-1, …, xnew-k}
2016/12/17 36
A variety of ML Scenarios
Supervised Learning
(Classification &
Regression)
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
2016/12/17 37
Semi-supervised Learning
• Only a small
portion of data
are annotated
(usually due to
high annotation
cost)
• Leverage the
unlabeled data
for better
performance
38
[Zhu2008]
A Label for that Example
Request for the Label of another Example
A Label for that Example
Request for the Label of an Example
Active Learning
Unlabeled
Data
...
Algorithm outputs a classifier
Learning
Algorithm
Expert / Oracle
• Achieves better learning with fewer labeled training data via
actively selecting a subset of unlabeled data to be annotated
39
A variety of ML Scenarios
Supervised Learning
(Classification &
Regression)
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
2016/12/17 40
Unsupervised Learning
• Learning without teachers (presumably harder than
supervised learning)
– Learning “what normally happens”
– Think of how babies learn their first language (unsupervised)
comparing with how people learn their 2nd language
(supervised).
• Given: a bunch of input X (there is no output Y)
• Goal: depending on the tasks, for example
– Estimate P(X)  then we can find augmax P(X) PGM
– Finding P(X2|X1) we can know the dependency between
inputs  PGM
– Finding Sim(X1,X2)  then we can group similar X’s  clustering
2016/12/17 41
Clustering
• An unsupervised learning task
• Given a finite set of real-
valued feature vector S ⊂ Rd,
discover clusters in S
Feature Vector (xi ∈ Rd)
x1
x2
…
xn-1
xn
S
Clustering Algorithm
• K-Means, EM, Hierarchical
classification, etc
2016/12/17 42
A variety of ML Scenarios
Supervised Learning
(Classification &
Regression)
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
2016/12/17 43
Reinforcement Learning (RL)
• RL is a “decision
making” process.
– How an agent should
make decision to
maximize the long-
term rewards
• RL is associated with a
sequence of states X
and actions Y (i.e.
think about Markov
Decision Process) with
certain “rewards”.
• It’s goal is to find an
optimal policy to
guide the decision.
44
Figure from Mark Chang
AlphaGo: SL+RL
• 1st Stage:天下棋手為我師 (Supervised Learning)
– Data: 過去棋譜
– Learning: f(X)=Y, X: 盤面, Y:next move
– Results: AI can play Go now, but not an expert
• 2nd Stage:超越過去的自己(Reinforcement
Learning)
– Data: generating from playing with 1st Stage AI
– Learning: Observation盤面, rewardif win,
action next move
2016/12/17 45
A variety of ML Scenarios
Supervised Learning
(Classification &
Regression)
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
2016/12/17 46
Online Learning
• Data arrives incrementally (one-at-a-time)
• Once a data point has been observed, it might never be seen again.
• Learner makes a prediction on each observation.
• Time and memory usage cannot scale with data.
• Algorithms may not store previously seen data and perform batch learning.
• Models resource-constrained learning, e.g. on small devices.
Q1?
Ans1
Q2?
Ans2
Q3?
Ans347
Scenario: Human activity recognition using data from wearable
devices (one of the dataset we have experimented)
Daily activities Wearable
device
Sensory data
in real-time
BluetoothLaptopClassifier
• There are a variety of models to be learned (individual * activity)
• Data are coming incrementally while we don’t want to transmit everything to
server
48
Transfer Learning
• Example: the knowledge for
recognizing an airplane may
be helpful for recognizing a
bird.
• Improving a learning task via incorporating
knowledge from learning tasks in other domains with
different feature space and data distribution.
• Reduces expensive data-labeling efforts
[Pan & Yang 2010]
2016/12/17 49
• Approach categories:
(1) Instance-transfer
(2) Feature-representation-transfer
(3) Parameter-transfer
(4) Relational-knowledge-transfer
Distributed Learning
• Perform machine learning on multiple
machines
2016/12/17 50
Computation Traditional Parallel
Computing
Distributed Learning
# of machines Few (10~100), communication
cost can be ignored
Many (>1000), communication
cost can be the bottleneck
Computational power Powerful (cluster), dedicated Ordinal (mobile), cannot be
dedicated
Memory Large Small
Management Strong control Weak control
Deep Learning (learning representation of data)
Supervised Learning
(Classification &
Regression)
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
2016/12/17 51Probabilistic Graphical Model
• Multiple processing stages in the visual cortex (LeCun & Ranzato):
• Inspired by human brains, deep learning aims to learn multiple
layers of representation from data, with increasing level of
abstraction
Deep Learning
• Human brains perform much better than
computers at recognizing natural patterns
• The brains learn to extract many layers of features
• Features in one layer are combined to form
higher-level features in the layer above (Hinton)
Retina LGN V1 V2 V4 PIT AIT …
(Picture from LeCun & Ranzato)
Convolutional Network
2016/12/17 52
AI的未來:從機器學習
到機器發明
Moravec‘s Paradox
• 對人類而言很簡單的事情,對電腦而言可能很難;
• 肢體與眼協調(如爬樓梯,踢足球)
• 瞭解幽默與嘲諷
• 人臉辨識
• 對語意的深層瞭解
• 對人類很難的事情,對電腦而言可能很簡單。
• 在大量資料中搜尋與分析
• 下棋
• 複雜的邏輯推論
停留在搜尋階段的人工智慧
• 在人工智慧網路交友的網站上,一個女生開出徵婚
條件有兩點:1.要帥 2.要有車,輸入電腦去幫她搜
尋,結果出現︰
• 象棋!!
• 她看了不太滿意,於是改輸入:1.要高 2.要有房子,
電腦跑一跑,結果出現︰
• 台北101
• 她又改輸入︰1.要MAN 2.要有安全感,電腦回她︰
• 超人!!
• 她不死心,最後把以上所有條件都打進去︰電腦回
她
• 在101下象棋的超人!!
AI的未來兩大研究走向
• 達成更多超越人類的成就
• 更精準的預測
• 更正確的決策
• 達成原本對人類而言就很容易的任務
• Computer vision
• Robotics
• Natural Language Understanding
From Learning to Discovery
• Learning:”The act, process, or experience of gaining
knowledge”.
• 學習是正常人在生活中都能體驗的經驗
• Discovery: “To obtain knowledge or awareness of
something not known before.”
• 發現或是發明是特殊人群在特殊時空下的產物如:愛迪
生,牛頓,福爾摩斯。
12/17/201657
Why Machine Discovery?
• Def: Exploit machines to perform or assist human
beings to perform discovery
• Benefits:
• High reward
• Tolerates low precision and recall
• Challenge (why it is more difficult than ML?):
• No standard solution available
• No labeled data available
2016/12/17 58Shou-De Lin
Two Types of Discovery
• Machine as a scientist
• Machine as a detective
Machine as Scientists
As mathmatician: Graph Theory(GRAFFITI,1990 - , Fajtlowicz)
• Made conjectures in Graph Theory such as
chromatic_number(G) + radius(G) =< max_degree(G) + frequency_of_max_degree(G)
• Discovered > 800 conjectures, which have led to 60 math papers and 1 thesis
publications to proof or disproof them
As physicists: ASTRA (1998 - , Kocabas and Langley)
• suggested novel and interesting fusion reactions, and generated reaction pathways for
helium, carbon, and oxygen that do not appear in the scientific literature
As astronomers: SKICAT (1995 - , Kennefick)
• aiming at detecting quasars in digital sky surveys, and has found 5 new quasars.
602016/12/17 Shou-De Lin
61
As Linguistics:魯汶古
文的線性讀法
魯汶古文是西元前1300-
700在敘利亞北方使用
的語言
它是二維文字,還沒有
完全解碼
12/17/2016 電腦發明,Prof. Shou-de Lin
physicists
As Poet: 電腦創作中國古典詩
故人 思鄉 回文詩
千里行路難 何處不自由 行人路道路人行
平生不可攀 故土木蘭舟 別有心人心有別
相逢無人見 落日一回首 寒風幾度幾風寒
與君三十年 孤城不可留
Two Types of Discovery
• Machine as scientists
• Machine as detectives
64
Abnormal Instance Discovery From Social
Networks
2
Global global node discovery:
Finding the most abnormal
node(s) in the network
Local abnormal node
discovery: Finding the most
abnormal node(s) connected
to a given node
1
1
1
2
Interesting feature discovery:
Finding the most interesting or
unique features (e.g. paths)
between two nodes or w.r.t a
node
Novel Topic Diffusion Discovery (ACL 2012)
• Predicting diffusions for novel topic
• Predict diffusions of “iPhone 6” before any such diffusion occurs using
• Links of existing topics with contents
• Keywords of novel topics
• Intuition of our approach
• We do have the diffusion data of other topics
• Finding the connection between existing topics and novel topics using a
LDA-based approach
• Exploiting such connection for link prediction
6512/17/2016
u1 u2
u3
HTC
HTC
u4
HTC
101
HTC iPad
iPad
u1 u2
u3
iP6
iP6
u4
iP6 ?
iP6 ?
iP6 ?
?
?
PTT
iP6
Keywords
for iP6
Unseen Links Discovery (KDD 2013)
• Individual opinion (ex. customer’s preference) is valuable
• sometimes concealed due to privacy (ex. Foursquare “like”)
• Fortunately, aggregative statistics (total count) is usually available
• Predict unlabeled relationship (unseen-type link) using
• Heterogeneous social network
• Attributes of nodes
• Aggregative statistics
6612/17/2016
u1
u2
r2
c2
be-friend-of
own
belong-to
like ?
User
Item
Category
1r1
r3
c1
02
au1 au2
ar3ar2
ar1
ac1 ac2
𝑹1
Target Rating Matrix
♫ ♫ ♫
Matching Users and Items for Transfer
Learning in Collaborative Filtering (KDD2014)
2013/7/30 67
 Assumptions:
 Two rating matrices
 Modeling the same kind of preference
 Certain portion of overlap in users and items
 E.g. Using Blockbuster’s record to enhance the performance of a
NetFlix System
 Obstacle: the anonymity of users and items
𝑹2
An Available Rating Dataset
♫♫♫
Final Remark: 弱人工智慧的時代已經降臨
Big
Data
Knowledge
Representation
Machine
Learning
Weak
AI
科學人2015
68
我對人工智慧帶來的未來
有三個大膽的預測
Halle Tecco
人工智慧革命
Machine will take over some non-trivial jobs that are
used to be conducted by human beings
• Some trivial jobs will be safe 
70
決策角色的翻轉
• The decision makers will gradually shift from humans
to machines
71科學人2015
從機器學習到機器發明
• 人類智慧的演進 :
memorizing
learning
discovering
• 不久的將來,許多科學新知將會由AI發現!
Machine
Machine
Machine
72

Weitere ähnliche Inhalte

Was ist angesagt?

Software devops engineer in test (SDET)
Software devops engineer in test (SDET)Software devops engineer in test (SDET)
Software devops engineer in test (SDET)Sriram Angajala
 
[기본과정] 코드 테스트와 커버리지 기본 교육(개념)
[기본과정] 코드 테스트와 커버리지 기본 교육(개념)[기본과정] 코드 테스트와 커버리지 기본 교육(개념)
[기본과정] 코드 테스트와 커버리지 기본 교육(개념)SangIn Choung
 
Robot framework 을 이용한 기능 테스트 자동화
Robot framework 을 이용한 기능 테스트 자동화Robot framework 을 이용한 기능 테스트 자동화
Robot framework 을 이용한 기능 테스트 자동화Jaehoon Oh
 
Testing capability ppt
Testing capability pptTesting capability ppt
Testing capability pptanilreddyqa
 
End-to-End Quality Approach: 14 Levels of Testing
End-to-End Quality Approach: 14 Levels of TestingEnd-to-End Quality Approach: 14 Levels of Testing
End-to-End Quality Approach: 14 Levels of TestingJosiah Renaudin
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...DataStax
 
An Introduction to Test Driven Development
An Introduction to Test Driven Development An Introduction to Test Driven Development
An Introduction to Test Driven Development CodeOps Technologies LLP
 
Quality Assurance and Software Testing
Quality Assurance and Software TestingQuality Assurance and Software Testing
Quality Assurance and Software Testingpingkapil
 
面試心得
面試心得面試心得
面試心得澐 向
 
Test automation - What? Why? How?
Test automation - What? Why? How?Test automation - What? Why? How?
Test automation - What? Why? How?Anand Bagmar
 
Insprint automation, build the culture
Insprint automation, build the cultureInsprint automation, build the culture
Insprint automation, build the cultureShekharRamphal
 
実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)AtCoder Inc.
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationLuis Serrano
 
Best Practices for Test Case Writing
Best Practices for Test Case WritingBest Practices for Test Case Writing
Best Practices for Test Case WritingSarah Goldberg
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들
고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들
고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들Chris Ohk
 
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)MITSUNARI Shigeo
 

Was ist angesagt? (20)

TCoE
TCoETCoE
TCoE
 
2019 11-code review
2019 11-code review2019 11-code review
2019 11-code review
 
Software devops engineer in test (SDET)
Software devops engineer in test (SDET)Software devops engineer in test (SDET)
Software devops engineer in test (SDET)
 
[기본과정] 코드 테스트와 커버리지 기본 교육(개념)
[기본과정] 코드 테스트와 커버리지 기본 교육(개념)[기본과정] 코드 테스트와 커버리지 기본 교육(개념)
[기본과정] 코드 테스트와 커버리지 기본 교육(개념)
 
Robot framework 을 이용한 기능 테스트 자동화
Robot framework 을 이용한 기능 테스트 자동화Robot framework 을 이용한 기능 테스트 자동화
Robot framework 을 이용한 기능 테스트 자동화
 
Testing capability ppt
Testing capability pptTesting capability ppt
Testing capability ppt
 
End-to-End Quality Approach: 14 Levels of Testing
End-to-End Quality Approach: 14 Levels of TestingEnd-to-End Quality Approach: 14 Levels of Testing
End-to-End Quality Approach: 14 Levels of Testing
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
 
An Introduction to Test Driven Development
An Introduction to Test Driven Development An Introduction to Test Driven Development
An Introduction to Test Driven Development
 
Quality Assurance and Software Testing
Quality Assurance and Software TestingQuality Assurance and Software Testing
Quality Assurance and Software Testing
 
面試心得
面試心得面試心得
面試心得
 
Test automation - What? Why? How?
Test automation - What? Why? How?Test automation - What? Why? How?
Test automation - What? Why? How?
 
Unit Testing
Unit TestingUnit Testing
Unit Testing
 
Insprint automation, build the culture
Insprint automation, build the cultureInsprint automation, build the culture
Insprint automation, build the culture
 
実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第四回講義資料(ワークスアプリケーションズ & AtCoder)
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Best Practices for Test Case Writing
Best Practices for Test Case WritingBest Practices for Test Case Writing
Best Practices for Test Case Writing
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들
고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들
고려대학교 컴퓨터학과 특강 - 대학생 때 알았더라면 좋았을 것들
 
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
 

Andere mochten auch

[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路台灣資料科學年會
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業台灣資料科學年會
 
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹台灣資料科學年會
 
黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果台灣資料科學年會
 
心理與行為資料中的因與果-黃從仁
心理與行為資料中的因與果-黃從仁心理與行為資料中的因與果-黃從仁
心理與行為資料中的因與果-黃從仁Tren Huang
 
認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁Tren Huang
 
沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧
沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧
沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧cwbook
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務台灣資料科學年會
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學台灣資料科學年會
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用台灣資料科學年會
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法台灣資料科學年會
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班台灣資料科學年會
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務台灣資料科學年會
 
[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走台灣資料科學年會
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人台灣資料科學年會
 

Andere mochten auch (20)

[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業
 
[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰
 
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
 
黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果
 
心理與行為資料中的因與果-黃從仁
心理與行為資料中的因與果-黃從仁心理與行為資料中的因與果-黃從仁
心理與行為資料中的因與果-黃從仁
 
認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁
 
沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧
沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧
沒人懂你怎麼辦?不被誤解‧精確表達‧贏得信任的心理學溝通技巧
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務
 
[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人
 
[系列活動] 機器學習速遊
[系列活動] 機器學習速遊[系列活動] 機器學習速遊
[系列活動] 機器學習速遊
 

Ähnlich wie [系列活動] 人工智慧與機器學習在推薦系統上的應用

Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationEnrico Palumbo
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learningKiran Lonikar
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsDenis Parra Santander
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasRemaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasMongoDB
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsYONG ZHENG
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptRidoVercascade
 

Ähnlich wie [系列活動] 人工智慧與機器學習在推薦系統上的應用 (20)

Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learning
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and Lists
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasRemaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
 

Mehr von 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 

Mehr von 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Kürzlich hochgeladen

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Kürzlich hochgeladen (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

[系列活動] 人工智慧與機器學習在推薦系統上的應用

  • 2. About the Speaker • PI: Shou-de Lin (machine discovery and social network mining lab) • B.S. in NTUEE • M.S. in EECS, UM • M.S. in Computational Linguistics, USC • Ph.D. in CS, USC • Postdoc in Los Alamos National Lab • Courses: • Machine Learning and Data Mining- Theory and Practice • Machine Discovery • Social network Analysis • Technical Writing and Research Method • Probabilistic Graphical Model • Awards: • All-time ACM KDD Cup Champion (2008, 2010, 2011, 2012, 2013) • Google Research Award 2008 • Microsoft Research Award 2009 • Best Paper Award WI2003, TAAI 2010, ASONAM 2011, TAAI 2014 • US Areospace AROAD Research Grant Award 2011, 2013, 2014, 2015, 2016 Machine Learning with Big Data Machine Discovery Learning IoT Data Applications in NLP&SNA& Recommender Practical Issues in Learning 2
  • 3. Agenda • Recommender System: an overview • Introducing various recommendation models • Evaluating a recommender system • Advanced issues in recommendation • Tools or Library for building a recommender system 3
  • 4. • A recommendation system suggests potentially favored items or contents to users • Users may be recommended new items previously unknown to them. • The system can find the opposite (e.g.: items the users do not like) What is the Recommendation System(RS)? 4 我要退 貨 為什麼? 這商品是 「心魔」要 我買的 推薦系統就是那 個「心魔」
  • 5. Values of Recommendation Systems • For service providers • Improve trust and customer loyalty • Increase sales, click trough rates, etc. • Opportunities for promotion • For customers • New interesting items being identified • Narrow down the possible choices 2/3 of the movies watched are recommended recommendations generate 38% more click-throughs 35% sales from recommendations 28% of the people would buy more music if they found what they liked 5
  • 6. Recommending products, services, and news From Dietmar Jannach, Markus Zanker and Gerhard Friedrich 6
  • 7. Recommending friends, jobs, and photos From Dietmar Jannach, Markus Zanker and Gerhard Friedrich 7
  • 8. The Netflix Challenge (2006~2009) • 1M prize to improve the prediction accuracy by 10% 8
  • 9. KDD Cup 2011: Yahoo Music Recommendation KDD Cup 2012: Tencent Advertisement Recommendation 9
  • 10. What is a good recommender system? • Requirement 1: finding items that interests the specific user  personalization • Requirement 2: recommending diverse items that satisfy all the possible needs of a specific user  diversity • Requirement 3: making non-trivial recommendation (Harry Potter 1 ,2 ,3 ,4  5)  novelty 10
  • 11. Inputs/outputs of a Recommendation System • Given (1 or 2&3): 1. Ratings of users to items: rating can be either explicit or implicit, for example 1. Explicit ratings: the level of likeliness of users to items 2. Implicit ratings: the browsing information of users to items 2. User features (e.g. preferences, demographics) 3. Item features (e.g. item category, keywords) 4. User/user relationships (optional) • Predict: • Ratings of any user to any item, or • Ranking of items to each user • What to recommend? • Highly rated or ranked items given each user 11
  • 12. Types of Recommender Systems • Content-based Recommendation (CBR) • Collaborative Filtering • Other solutions • Advanced Issues 12
  • 13. Key Concept of CBR 1. Recommend items that are similar to the ones liked by this user in the past 2. Recommend items whose attributes are similar to the users’ profile 13
  • 14. CBR: Requirement • Required info: • information about items (e.g. genre and actors of a movie, or textual description of a book) • Information about users (e.g. user profile info, demographic features, etc) • Optional Info • Ratings from users to items 14
  • 15. Content-based Recommendation vs. Information Retrieval (i.e. Search Engine) Information Retrieval tries to match Keywords Documents CBR tries to match Profiles of liked items of the user Profiles of unseen items • If we know how keywords can be used to match documents in a search engine, then we will use the same model to match user and item profiles 15
  • 16. Vector Space Model • Represent the keywords of items using a term vector • Term: basic concept, e.g., keywords to describe an item • Each term represents one dimension in a vector • N total terms define an n-element terms • Values of each term in a vector corresponds to the importance of that term • E.g., d=(x1,…,xN), xi is “importance” of term i • Measure similarity by the vector distances 16
  • 17. An Example • Items or users are represented using a set of keywords. • each dimension represents a binary random variable associated to the existence of an indexed term (1:exist, 0: not exist in the document) Best-selling, romantic, drama 0 1 1 0 1 0 …… Best- selling exampassgood romanticdrama 17
  • 18. Term Frequency and Inverse Document Frequency (TFIDF) • Since not all terms in the vector space are equally important, we can weight each term using its occurrence probability in the item description • Term frequency: TF(d,t) • number of times t occurs in the item description d • Inverse document frequency: IDF(t) • to scale down the terms that occur in many descriptions 18
  • 19. Normalizing Term Frequency • nij represents the number of times a term ti occurs in a description dj . Tfij can be normalized using the total number of terms in the document. • Another normalization: ij ij kj k n tf n   max ij ij kj n tf n  19
  • 20. Inverse Document Frequency • IDF seeks to scale down the coordinates of terms that occur in many item descriptions. • For example, some stop words (the, a, of, to, and…) may occur many times in a description. However, they should not be considered as important representation feature. • IDF of a term ti : ,where dfi (document frequency of term ti) is the number of descriptions in which ti occurs. 20 )1log(  i i df N idf
  • 21. Term Frequency and Inverse Document Frequency • TF-IDF weighting : weight(t,d)=TF(t,d)*IDF(t) • Common in doc high tf  high weight • Rare in collection  high idf high weight • weight wij of a term ti in a document dj wij )1(log*  j ij df N tf 21
  • 22. 22 Computing TF-IDF -- An Example Given an item description contains terms with given frequencies: A: best-selling (3), B: Romantic (2), C: Drama (1) Assume collection contains 10,000 items and document frequencies of these terms are: best-selling(50), Romantic (1300), Drama (250) Then: A: tf = 3/3; idf = log2(10000/50+1) = 7.6; tf-idf = 7.6 B: tf = 2/3; idf = log2 (10000/1300+1) = 2.9; tf-idf = 2.0 C: tf = 1/3; idf = log2 (10000/250+1) = 5.3; tf-idf = 1.8 The vector-space presentation of this item using A, B, and C is (7.6,2.0,1.8) max ij ij kj n tf n 
  • 23. Graphic Representation T3 T1 T2 I1 = 2T1+ 3T2 + 5T3 I2 = 3T1 + 7T2 + T3 U = 0T1 + 0T2 + 2T3 7 32 5 Example: I1 = (2,3,5)=2T1 + 3T2 + 5T3 I2 = (3,7,1)=3T1 + 7T2 + T3 U = (0,0,2)=0T1 + 0T2 + 2T3 • Is I1 or I2 more similar to U? • How to measure the degree of similarity? Distance? Angle? Projection? 23
  • 24. Measure Similarity Between two item descriptions • D={d1,d2,…dn} • Q={q1,q2,…qn} (the qx value is 0 if a term is absent) • Dot product similarity: Sim(D,Q)=d1*q1+d2*q2+…+dn*qn • Cosine Similarity (or normalized dot product): sim(D,Q)=     N 1j 2 j N 1j 2 j n11 q*d *d...*d ),( nqq DQsim 24
  • 25. Content-based Recommendation: an example from Alexandros Karatzoglou • A customer buys the book: “Building data mining applications for CBM” • 7 Books are possible candidates for a recommendation: 1. Accelerating Customer Relationships: Using CRM and Relationship Technologies 2. Mastering Data Mining: The Art and Science of Customer Relationship Management 3. Data Mining Your Website 4. Introduction to marketing 5. Consumer behaviour 6. Marketing research, a handbook 7. Customer knowledge management 25
  • 26. Using binary existence of words 26
  • 28. Pros and Cons for CBR Pros: • user independence – does not need data from other users • transparency – easily explainable • new items can be easily incorporated (assuming content available) Cons: • Difficult to apply to domain where feature extraction is an inherent problem (e.g. multimedia data) • Keywords might not be sufficient: Items represented by the same set of features are indistinguishable • Cannot deal with new users • Redundancy: should we show items that are too similar? 28
  • 29. Types of Recommender Systems • Content-based Recommendation (CBR) • Collaborative Filtering • Other solutions • Advanced Issues 29
  • 30. Collaborative Filtering (CF) • CF is the most successful and common approach to generate recommendations • used in Amazon, Netflix, and most of the e-commerce sites • many algorithms exist • General and can be applied to many domains (book, movies, DVDs, ..) • Key Idea • Recommended the favored items of people who are ‘similar’ to you • Need to collect the taste (implicit or explicit) of other people  that’s why we call it ‘collaborative’ 30
  • 31. Mathematic Form of CF Explicit ratings Item1 Item2 Item3 Item4 User1 ? 3 ? ? User2 1 ? ? 5 User3 ? 4 ? 2 User4 3 ? 3 ? • Given: some ratings from users to items • Predict: unknown ratings of users to items Implicit ratings Item1 Item2 Item3 Item4 User1 ? 1 ? ? User2 1 ? ? 1 User3 ? 1 ? 1 User4 1 ? 1 ? 31
  • 32. CF models • Memory-based CF • User-based CF • Item-based CF • Model-based CF 32
  • 33. User-Based Collaborative Filtering • Finding users 𝑁(𝑢) most similar to user 𝑢 (neighborhood of 𝑢) • Assign 𝑁(𝑢)’s rating as u’s rating • Prediction • 𝑟𝑢,𝑖 = 𝑟 𝑢 + 𝑣∈𝑁(𝑢) 𝑠𝑖𝑚(𝑢,𝑣)(𝑟 𝑣,𝑖−𝑟 𝑣) 𝑣∈𝑁(𝑢) 𝑠𝑖𝑚(𝑢,𝑣) • 𝑟 𝑢: Average of ratings of user 𝑢 • Problem: Usually users do not have many ratings; therefore, the similarities between users may be unreliable 33
  • 34. Example of User-Based CF • To predict 𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛 using cosine similarity • Neighborhood size is 2 Rating Lion King Aladdin Mulan Anastasia John 3 0 3 3 Joe 5 4 0 2 Jill 1 2 4 2 Jane 3 ? 1 0 Jorge 2 2 0 1 𝑟𝐽𝑜ℎ𝑛 = 3 + 0 + 3 + 3 4 = 2.25 𝒓 𝑱𝒐𝒆 = 5 + 4 + 0 + 2 4 = 𝟐. 𝟕𝟓 𝑟𝐽𝑖𝑙𝑙 = 1 + 2 + 4 + 2 4 = 2.25 𝑟𝐽𝑎𝑛𝑒 = 3 + 1 + 0 3 = 1.33 𝒓 𝑱𝒐𝒓𝒈𝒆 = 2 + 2 + 0 + 1 4 = 𝟏. 𝟐𝟓 𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝐽𝑜ℎ𝑛 = 3 × 3 + 1 × 3 + 0 × 3 32 + 12 + 02 32 + 32 + 32 = 0.73 𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝑱𝒐𝒆 = 3 × 5 + 1 × 0 + 0 × 2 32 + 12 + 02 52 + 02 + 22 = 𝟎. 𝟖𝟖 𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝐽𝑖𝑙𝑙 = 3 × 1 + 1 × 4 + 0 × 2 32 + 12 + 02 12 + 42 + 22 = 0.48 𝑠𝑖𝑚 𝐽𝑎𝑛𝑒, 𝑱𝒐𝒓𝒈𝒆 = 3 × 2 + 1 × 0 + 0 × 1 32 + 12 + 02 22 + 02 + 12 = 𝟎. 𝟖𝟒 𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛 = 1.33 + 0.88 4 − 2.75 + 0.84(2 − 1.25) 0.88 + 0.84 = 2.33 34
  • 35. Item-Based Collaborative Filtering • Similarity is defined between two items • Finding items 𝑁(𝑖) most similar to item 𝑖 (neighborhood of 𝑖) • Assign 𝑁(𝑖)’s rating as i’s rating • Prediction • 𝑟𝑢,𝑖 = 𝑟𝑖 + 𝑗∈𝑁(𝑖) 𝑠𝑖𝑚(𝑖,𝑗)(𝑟 𝑢,𝑗−𝑟 𝑗) 𝑗∈𝑁(𝑖) 𝑠𝑖𝑚(𝑖,𝑗) • 𝑟𝑖: Average of ratings of item 𝑖 • Items usually have more ratings from many users and the similarities between items are more stable 35
  • 36. Example of Item-Based CF • To predict 𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛 using cosine similarity • Neighborhood size is 2 Rating Lion King Aladdin Mulan Anastasia John 3 0 3 3 Joe 5 4 0 2 Jill 1 2 4 2 Jane 3 ? 1 0 Jorge 2 2 0 1 𝒓 𝑳𝒊𝒐𝒏 𝑲𝒊𝒏𝒈 = 3 + 5 + 1 + 3 + 2 5 = 𝟐. 𝟖 𝑟 𝐴𝑙𝑎𝑑𝑑𝑖𝑛 = 0 + 4 + 2 + 2 4 = 2 𝑟 𝑀𝑢𝑙𝑎𝑛 = 3 + 0 + 4 + 1 + 0 5 = 1.6 𝒓 𝑨𝒏𝒂𝒔𝒕𝒂𝒔𝒊𝒂 = 3 + 2 + 2 + 0 + 1 5 = 𝟏. 𝟔 𝑠𝑖𝑚 𝐴𝑙𝑎𝑑𝑑𝑖𝑛, 𝑳𝒊𝒐𝒏 𝑲𝒊𝒏𝒈 = 0 × 3 + 4 × 5 + 2 × 1 + 2 × 2 02 + 42 + 22 + 22 32 + 52 + 12 + 22 = 𝟎. 𝟖𝟒 𝑠𝑖𝑚 𝐴𝑙𝑎𝑑𝑑𝑖𝑛, 𝑀𝑢𝑙𝑎𝑛 = 0 × 3 + 4 × 0 + 2 × 4 + 2 × 0 02 + 42 + 22 + 22 32 + 02 + 42 + 02 = 0.32 𝑠𝑖𝑚 𝐴𝑙𝑎𝑑𝑑𝑖𝑛, 𝑨𝒏𝒂𝒔𝒕𝒂𝒔𝒊𝒂 = 0 × 3 + 4 × 2 + 2 × 2 + 2 × 1 02 + 42 + 22 + 22 32 + 22 + 22 + 12 = 𝟎. 𝟔𝟕 𝑟𝐽𝑎𝑛𝑒,𝐴𝑙𝑎𝑑𝑑𝑖𝑛 = 2 + 0.84 3 − 2.8 + 0.67(0 − 1.6) 0.84 + 0.67 = 2.33 36
  • 37. Quiz: 推薦系統之阿拉丁神燈 part 1 • 小胖撿到了一個神燈,神燈巨人答應他一個願望, 小胖說:我想要蔡10陪我共進晚餐。 • 神燈巨人說沒問題,繃的一聲,就把蔡10,蔡 13都變了出來。 • 小胖說:我沒有說要找蔡13啊? • 神燈巨人說:我的推薦系統認為你也可能對蔡13 感興趣。 • 請問,神燈巨人內部是跑哪種推薦系統 37
  • 38. CF models • Memory-based CF • User-based CF • Item-based CF • Model-based CF • Matrix Factorization 38
  • 39. Matrix Factorization (MF) • Given a matrix 𝑅 ∈ ℝ 𝑁×𝑀 , we would like to find two matrices 𝑈 ∈ ℝ 𝐾×𝑁 , 𝑉 ∈ ℝ 𝐾×𝑀 such that 𝑈⊤ 𝑉 ≈ 𝑅 – 𝐾 ≪ min 𝑁, 𝑀  we assume 𝑅 of small rank 𝐾 – A low-rank approximation method – Earlier works (before 2007 ~ 2009) call it singular value decomposition (SVD) ≈ 𝑈⊤ 𝑉 𝐾 𝐾 𝑁 𝑀 𝑅𝑀 𝑁 39
  • 40. Matrix factorization Vk T Dim1 -0.4 -0.7 0.6 0.4 0.5 Dim2 0.8 -0.6 0.2 0.2 -0.3 Uk Dim1 Dim2 Alice 0.4 0.3 Bob -0.4 0.3 Mary 0.7 -0.6 Sue 0.3 0.9 40 𝑅 = 𝑈⊤ 𝑉 Rating (Alice to Harry Potter)=0.4*0.4+0.3*0.2
  • 41. MF as an Effective CF Realization • We find two matrices 𝑈, 𝑉 to approximate 𝑅 – Missing entry 𝑅𝑖𝑗 can be predicted by 𝑈𝑖 ⊤ 𝑉𝑗 • Entries in column 𝑈𝑖 (or 𝑉𝑗)represent the latent factor (i.e. rating patterns learned from data) of user 𝑖 (or item j) • If two users have similar latent factors, then they will give similar ratings to all items • If two items have similar latent factor, then the corresponding rating for all users are similar 1 2 5 4 2 5 2 4 3 5 1 4 3 3 ≈ 1 0 2 1 0 1 6 3 -1 𝑈⊤ 𝑉 𝑅 41
  • 42. 𝑅 Relationship between MF and SVD • Singular value decomposition (SVD) – Matrix 𝑅 of rank 𝐾 should be non-missing • Matrix factorization (MF) – Missing entries can be omitted in learning – It is more scalable for large datasets 𝑊⊤ 𝐻=𝑁 𝑀 𝑁 𝑁 𝑀 𝑀 𝑀 𝑁 Diagonal matrix of 𝐾 positive singular values 𝜎1 ≥ ⋯ ≥ 𝜎 𝐾 > 0 for a rank-𝐾 matrix 𝑊⊤ 𝐻= 𝑁 𝑀 𝐾 𝐾 𝜎1 𝜎 𝐾 ⋱ 0 ⋱ 𝜎1 𝜎 𝐾 ⋱ 𝜎1 𝜎 𝐾 ⋱ 𝑈⊤ 𝑉 𝐾 𝐾 𝐾 𝐾 SVD MF 42
  • 44. How to Train an MF Model (1/2) • MF as a Minimization problem arg min 𝑈,𝑉 1 2 𝑖=1 𝑁 𝑗=1 𝑀 𝛿𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 2 + 𝜆 𝑈 2 𝑈 𝐹 2 + 𝜆 𝑉 2 𝑉 𝐹 2 – 𝑈 𝐹 2 = 𝑖=1 𝑁 𝑈𝑖 2 2 = 𝑖=1 𝑁 𝑘=1 𝐾 𝑈 𝑘𝑖 2 : squared Frobenius norm – 𝛿𝑖𝑗 ∈ 0,1 : rating 𝑅𝑖𝑗 is observed in 𝑅 ≈𝑈𝑖 ⊤ 𝑉𝑗 𝑅𝑖𝑗 Regularization Squared error 44
  • 45. How to Train an MF Model (2/2) • MF with bias terms arg min 𝑈,𝑉 1 2 𝑖=1 𝑁 𝑗=1 𝑀 𝛿𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗 + 𝑏𝑖 + 𝑐𝑗 + 𝜇 − 𝑅𝑖𝑗 2 + 𝜆 𝑈 2 𝑈 𝐹 2 + 𝜆 𝑉 2 𝑉 𝐹 2 + 𝜆 𝑏 2 𝑏 2 2 + 𝜆 𝑐 2 𝑐 2 2 + 𝜆 𝜇 2 𝜇2 – 𝑏: rating mean vector for each user – 𝑐: rating mean vector for each item – 𝜇: overall mean of all ratings • Some MF extension works omit the bias terms Koren, Yehuda, Robert Bell, and Chris Volinsky. "Matrix factorization techniques for recommender systems." Computer 42.8 (2009): 30-37. 45
  • 46. MF as Neural Network (NN) • Shallow NN with identity activation function – 𝑁 input neurons: user 𝑖 as one-hot encoding – 𝑀 output neurons: row 𝑖 in rating matrix 𝑅 𝑁 𝑀 𝐾 0 0 0 1 0 𝑈 𝑉 1 4 𝑅4 User 4 46
  • 47. MF as Probabilistic Graphical Model (PGM) • Bayesian network with normal distributions • Maximum a posteriori (MAP) arg max 𝑈,𝑉 𝑖=1 𝑁 𝑗=1 𝑀 𝒩 𝑅𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗, 𝜎 𝑅 2 𝛿 𝑖𝑗 𝑖=1 𝑁 𝒩(𝑈𝑖|0, 𝜎 𝑈 2 𝐼) 𝑗=1 𝑀 𝒩(𝑉𝑗|0, 𝜎 𝑉 2 𝐼) Likelihood (normal distribution) Zero-mean spherical Gaussian prior (multivariate normal distribution) 47
  • 48. Probabilistic Matrix Factorization (PMF) • PGM viewpoint of MF 𝑗 = 1: 𝑀 𝑖 = 1: 𝑁 𝑅𝑖𝑗 𝑈𝑖 𝑉𝑗 𝜎 𝑅 𝜎 𝑈 𝜎 𝑉 A. Mnih and R. R. Salakhutdinov, “Probabilistic matrix factorization,” in Proc. of NIPS, 2008, pp. 1257–1264. 48
  • 49. PMF vs. MF • PMF is equivalent to MF – 𝜆 𝑈 = 𝜎 𝑅 2 𝜎 𝑈 2 , 𝜆 𝑉 = 𝜎 𝑅 2 𝜎 𝑉 2 – 𝐶: terms irrelevant to 𝑈, 𝑉 – 𝑙2-norm regularization = zero-mean spherical Gaussian prior – PMF can be easily incorporated into other PGM models arg max 𝑈,𝑉 𝑖=1 𝑁 𝑗=1 𝑀 𝒩 𝑅𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗, 𝜎 𝑅 2 𝛿 𝑖𝑗 𝑖=1 𝑁 𝒩(𝑈𝑖|0, 𝜎 𝑈 2 𝐼) 𝑗=1 𝑀 𝒩(𝑉𝑗|0, 𝜎 𝑉 2 𝐼) arg min 𝑈,𝑉 1 2 𝑖=1 𝑁 𝑗=1 𝑁 𝛿𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 2 + 𝜆 𝑈 2 𝑈 𝐹 2 + 𝜆 𝑉 2 𝑉 𝐹 2 + 𝐶 Take −𝜎 𝑅 2 log 𝑥 Take 𝑒 − 1 𝜎 𝑅 2 𝑥 49
  • 50. Learning in MF • Stochastic gradient descent (SGD) • Alternating least squares (ALS) • Variational expectation maximization (VEM) 50
  • 51. Stochastic Gradient Descent (SGD) (1/2) • Gradient descent: updating variables based on the direction of negative gradients – SGD updates variables instance-wise • Let 𝐿 be the objective function 𝐿 = 1 2 𝑖=1 𝑁 𝑗=1 𝑀 𝛿𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 2 + 𝜆 𝑈 2 𝑈 𝐹 2 + 𝜆 𝑉 2 𝑉 𝐹 2 – Objective function for each training rating 𝐿𝑖𝑗 = 1 2 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 2 + 𝜆 𝑈 2 𝑈𝑖 2 2 + 𝜆 𝑉 2 𝑉𝑗 2 2 51
  • 52. Stochastic Gradient Descent (SGD) (2/2) • Gradient – 𝜕𝐿 𝑖𝑗 𝜕𝑈 𝑖 = 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 𝑉𝑗 + 𝜆 𝑈 𝑈𝑖 – 𝜕𝐿 𝑖𝑗 𝜕𝑉 𝑗 = 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 𝑈𝑖 + 𝜆 𝑉 𝑉𝑗 • Update rule – 𝑈𝑖 ← 𝑈𝑖 − 𝜂 𝜕𝐿 𝑖𝑗 𝜕𝑈 𝑖 – 𝑉𝑗 ← 𝑉𝑗 − 𝜂 𝜕𝐿 𝑖𝑗 𝜕𝑉 𝑗 – 𝜂: learning rate or step size 𝐿 52
  • 53. Matrix factorization: Stopping criteria • When do we stop updating? – Improvement drops (e.g. <0) – Reached small error – Achieved predefined # of iterations – No time to train anymore 53
  • 54. Alternating Least Squares (ALS) • Stationary point: zero gradient – We can find the closed-form solution of 𝑈 with 𝑉 fixed, and vise versa • Zero gradient – 𝜕𝐿 𝜕𝑈 𝑖 = 𝑗=1 𝑀 𝛿𝑖𝑗 𝑉𝑗 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 + 𝜆 𝑈 𝑈𝑖 = 0 – 𝜕𝐿 𝜕𝑉 𝑗 = 𝑖=1 𝑁 𝛿𝑖𝑗 𝑈𝑖 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 + 𝜆 𝑉 𝑉𝑗 = 0 • Closed-form solution i.e. update rule – 𝑈𝑖 = 𝑗=1 𝑀 𝛿𝑖𝑗 𝑉𝑗 𝑉𝑗 ⊤ + 𝜆 𝑈 𝐼 −1 𝑗=1 𝑀 𝛿𝑖𝑗 𝑅𝑖𝑗 𝑉𝑗 – 𝑉𝑗 = 𝑖=1 𝑁 𝛿𝑖𝑗 𝑈𝑖 𝑈𝑖 ⊤ + 𝜆 𝑉 𝐼 −1 𝑖=1 𝑁 𝛿𝑖𝑗 𝑅𝑖𝑗 𝑈𝑖 54
  • 55. SGD vs. ALS • SGD – It is easier to develop MF extensions since we do not require the closed-form solutions • ALS – We are free from determining the learning rate 𝜂 – It allows parallel computing • Drawback for both – Regularization parameters 𝜆 need careful tuning using validation  we have to run MF for multiple times • Variational-EM (VEM) learns regularization parameters arg min 𝑈,𝑉 1 2 𝑖=1 𝑁 𝑗=1 𝑀 𝛿𝑖𝑗 𝑈𝑖 ⊤ 𝑉𝑗 − 𝑅𝑖𝑗 2 + 𝜆 𝑈 2 𝑈 𝐹 2 + 𝜆 𝑉 2 𝑉 𝐹 2 55
  • 56. Extensions of MF • Matrix 𝑅 can be factorized into 𝑈⊤ 𝑈, 𝑈⊤ 𝑉𝑈, 𝑈𝑉𝑊, … • SVD++ – MF with implicit interactions among items • Non-negative MF (NMF) – non-negative entries for the factorized matrices • Tensor factorization (TF), Factorization machines (FM) – Additional features involved in learning latent factors • Bayesian PMF (BPMF) – Further modeling distributions of PMF parameters 𝜃 • Poisson factorization (PF) – PMF normal likelihood replaced with Poisson likelihood to form a probabilistic nonnegative MF 56
  • 57. Applications of MF • Recommender systems • Filling missing features • Clustering • Link prediction – Predict future new edges in a graph • Community detection – Cluster nodes based on edge density in a graph • Word embedding – Word2Vec is actually an MF 57
  • 58. Quiz: 推薦系統之阿拉丁神燈Part 2 • 小胖再度摩擦神燈,這次出來了另一個神燈巨 人。巨人看小胖一臉狐疑,說到「上次那個神 燈巨人推薦得不好被炒魷魚了,我是新來的」 • 小胖想,那這次應該就不會帶拖油瓶出來了吧, 於是他要求:我想要蔡10陪我共進晚餐。 • 神燈巨人說沒問題,繃的一聲,就把蔡10,侯 佩程,昆0都變了出來。 • 請問,神燈巨人內部是執行哪種推薦系統? 58
  • 59. Types of Recommender Systems • Content-based Recommendation (CBR) • Collaborative Filtering • Other solutions • Restricted Boltzmann Machines • Clustering-based recommendation • Association rule based recommendation • Random-walk based recommendation • Advanced Issues 59
  • 60. Restricted Boltzmann Machines For Collaborative Filtering • RBM is an MRF of 2 layers • One RBM each user, but share the link weights • Learned by Contrastive divergence • DBM  Deep Boltzmann Machine Ratings from 1~5 Restricted Boltzmann Machines for Collaborative Filtering, ICML 07 60
  • 61. Clustering-based Recommendation • Step 1: Clustering users into groups • Step 2: finding the highest rated items in each group as the recommended items for this group • Notes: it is more efficient but less personalized 61 I1 I2 I3 I4 I5 U1 4 2 U2 4 3 U3 3 2 2 U4 4 3 U5 3 2
  • 62. Association-rule based recommendation • Finding association rules from global data • E.g. (item1, item2) appears together frequently • User 1 has purchased item1  recommend item 2 • Pros: easy to implement, quick prediction • Cons: it is not very personalized • Associate rule work well for retail store recommendation 62
  • 63. Graph-Based Model • Build network from rating database • • Extra knowledge 63 i1 i2 i3 i4 i5 i6 i7 u1 r11 r12 r14 u2 r22 r23 u3 r34 r35 u4 r43 r45 r46 u4 u1 u2 u3 i1 i2 i3 i4 r11 i5 i6 i7 r12 r14 r22 r34 r35 r23 r46 r45 r43 i1 i2 i3 i4 i5 i6 i7 f1 v v v v f2 v v v i1 i2 i3 i4 i5 i6 i7 f2 f1 u1 u2 u3 i1 i2 i3 i4 r11 i5 i6 i7 u4 r12 r14 r22 r34 r35 r23 r46 r45 r43 f2 f1
  • 64. Random Walk on Graph • Random walk with restart • Random surf in the network from a source user. • Recommend items that have higher chance to be reached 64 A B D E F C
  • 65. Limitations on Collaborative filtering 1. Cannot consider features of items and users • Solution: factorization machine 2. Cannot consider cold-start situation • Solution: transfer recommendation 65
  • 66. Demo: A joke recommendation system using CM models • http://eigentaste.berkeley.edu/index.html 66
  • 67. Evaluating Recommendation Systems • Accuracy of predictions • How close predicted ratings are to the true ratings • Relevancy of recommendations • Whether users find the recommended items relevant to their interests • Ranking of recommendations • Ranking products based on their levels of interestingness to the user 67
  • 68. Accuracy of Predictions • Mean absolute error (MAE) • 𝑀𝐴𝐸 = 𝑖𝑗 𝑟 𝑖𝑗−𝑟 𝑖𝑗 𝑛 • 𝑟𝑖𝑗: Predicted rating of user 𝑖 and item 𝑗 • 𝑟𝑖𝑗: True rating • Normalized mean absolute error (NMAE) • 𝑁𝑀𝐴𝐸 = 𝑀𝐴𝐸 𝑟 𝑚𝑎𝑥−𝑟 𝑚𝑖𝑛 • Root mean squared error (RMSE) • 𝑅𝑀𝑆𝐸 = 1 𝑛 𝑖𝑗 𝑟𝑖𝑗 − 𝑟𝑖𝑗 2 • Error contributes more to the RMSE value 68
  • 69. Example of Accuracy • 𝑀𝐴𝐸 = 1−3 + 2−5 + 3−3 + 4−2 + 4−1 5 = 2 • 𝑁𝑀𝐴𝐸 = 𝑀𝐴𝐸 5−1 = 0.5 • 𝑅𝑀𝑆𝐸 = 1−3 2+ 2−5 2+ 3−3 2+ 4−2 2+ 4−1 2 5 = 2.28 Item Predicted Rating True Rating 1 1 3 2 2 5 3 3 3 4 4 2 5 4 1 69
  • 70. Relevancy of Recommendations • Precision • 𝑃 = 𝑁 𝑟𝑠 𝑁𝑠 • Recall • 𝑅 = 𝑁 𝑟𝑠 𝑁 𝑟 • F-measure (F1 score) • 𝐹 = 2𝑃𝑅 𝑃+𝑅 Recommended Items Selected Not Selected Total Relevancy Relevant 𝑁𝑟𝑠 𝑁𝑟𝑛 𝑁𝑟 Irrelevant 𝑁𝑖𝑠 𝑁𝑖𝑛 𝑁𝑖 Total 𝑁𝑠 𝑁 𝑛 𝑁 70
  • 71. Example of Relevancy • 𝑃 = 9 12 = 0.75 • 𝑅 = 9 24 = 0.375 • 𝐹 = 2×0.75×0.375 0.75+0.375 = 0.5 Selected Not Selected Total Relevant 9 15 24 Irrelevant 3 13 16 Total 12 28 40 71
  • 72. Ranking of Recommendations • Spearman’s rank correlation: For 𝑛 items • 𝜌 = 1 − 6 𝑖=1 𝑛 𝑥 𝑖−𝑦 𝑖 2 𝑛3−𝑛 • 1 ≤ 𝑥𝑖 ≤ 𝑛: Predicted rank of item 𝑖 • 1 ≤ 𝑦𝑖 ≤ 𝑛: True rank of item 𝑖 • Kendall’s tau: For all 𝑛 2 pairs of items (𝑖, 𝑗) • 𝜏 = 𝑐−𝑑 𝑛 2 , in range [−1,1] • There are 𝑐 concordant pairs • 𝑥𝑖 > 𝑥𝑗, 𝑦𝑖 > 𝑦𝑗 or 𝑥𝑖 < 𝑥𝑗, 𝑦𝑖 < 𝑦𝑗 • There are 𝑑 discordant pairs • 𝑥𝑖 > 𝑥𝑗, 𝑦𝑖 < 𝑦𝑗 or 𝑥𝑖 < 𝑥𝑗, 𝑦𝑖 > 𝑦𝑗 72
  • 73. Example of Ranking Item Predicted Rank True Rank 𝑖1 1 1 𝑖2 2 4 𝑖3 3 2 𝑖4 4 3 • Kendall’s Tau • 𝜏 = 4−2 6 = 0.33 • (𝑖1, 𝑖2): concordant • (𝑖1, 𝑖3): concordant • (𝑖1, 𝑖4): concordant • (𝑖2, 𝑖3): discordant • (𝑖2, 𝑖4): discordant • (𝑖3, 𝑖4): concordant • Spearman’s rank correlation – 𝜌 = 1 − 6 1−1 2+ 2−4 2+ 3−2 2+ 4−3 2 43−4 = 0.4 73
  • 74. Types of Recommender Systems • Content-based Recommendation (CBR) • Collaborative Filtering • Other solutions • Restricted Boltzmann Machines • Clustering-based recommendation • Association rule based recommendation • Random-walk based recommendation • Advanced Issues 74
  • 75. Advanced issue 1: Recommendation with ratings and features Ratings User Features ItemFeature 75
  • 76. Factorization Machine (FM) • Matrix Factorization (MF) • Factorization Machine Model: VUY T  sparsehighlyis,)( xxfY  ji n i n ij ji n i ii xxvvxwwxy      1 11 0 ,:)(  76 Steffen Rendle “factorization machines” 2010
  • 77. Advance Issue 2: Handling Cold Start • What happens if we don’t have rating/review information for some items or users • Solution: transfer information from other domains • Use book review data to improve a movie recommender system • Use ratings from one sources (e.g. Yhaoo) to improve the recommender of another (e.g. Pchome). 77
  • 78. Why Transfer Recommendation? • Nowadays, users can • provide feedbacks for items of different types • e.g., in Amazon we can rate books, DVDs, … • express their opinions on different social media and different providers (Facebook, Twitter, Amazon) • Transfer recommendation tries to utilize information from source domain to Improve the quality of an RS • Solving cold-start problem (very few ratings for some users or some items) • Improving accuracy 78
  • 79. Transfer between domains • Source domain: contains knowledge to transfer to target, usually more data. • Target domain: needs help from the source domain, usually contains cold start users or items. • Domains differ because of • different types of items • Movies vs. Books • different types of users • American vs. Taiwanese Source Target Movies Music Pop music Jazz music Taiwan movie US movies 79
  • 80. Case Study1: Matching Users and Items Across Domains to Improve the Recommendation Quality (Li & Lin, KDD2014) Given: Two homogeneous rating matrices  They model user’s preference to same type of items.  Certain portion of overlap in users and in items. 𝐑1 Target Rating Matrix ♫ ♫ ♫ 𝐑2 Source Rating Matrix ♫ ♫ ♫ Challenge: The mappings of users cross matrices are unknown, and so are the mappings of items. Goals: 1. Identify the user mappings and item mappings. 2. Use the identified mappings to boost the recommendation performance. Shou-De Lin 802016/12/17
  • 81. Case Study2: Why Transfer Recommendation is important for New Business 81
  • 82. You are the CEO of an online textbook store Ambzon.com which used to sell only English books in USA. 82
  • 83. You want to expand the business across the world to sell some Chinese textbooks in Taiwan. • You need a recommender system for such purpose. • Unfortunately, you only have few user-item rating records for the Chinese textbooks rated by Taiwanese. 83
  • 84. However, you have much more ratings on the English textbooks rated by the American people 84 User Book B1 B2 B3 B4 B5 U1 1 4 3 3 U2 2 1 U3 3 5 2 U4 4 3 6 U5 2 2 4 U6 4 6
  • 85. Challenge: You will have to leverage the ratings from the English textbooks to enhance the recommendation on Chinese textbooks? 85
  • 86. How can Transfer be Done? • Feature sharing • user demographic features in both domains • item attributes in both domains • Linking users and items • Finding overlapping users and items across domains • Connecting users (e.g. friendship) and items (e.g. content) across domains 86
  • 87. Sharing Latent Features of users/items 87 Cons: need overlap of users and/or items
  • 88. Transferring rating patterns 88 No need to assume users or items have overlap
  • 89. Advanced Issues 3: Using Social information in Recommendation (Friend recommendation, link prediction) (Memory-based CF using social networks) (Model-based CF using social networks) 89
  • 90. Social Context Alone • Friend recommendation in social networks • Methods • Link prediction • Network structure e.g. triad • Two friends of an individual are often friends • Friend recommendation: Open triad (missing one edge) Triad Open triad 90
  • 91. Common Friends • If 𝑝 and 𝑞 have many common friends, then 𝑝 tends to be a better recommendation for 𝑞 • 𝑠𝑐𝑜𝑟𝑒 = 𝐹 𝑝 ∩ 𝐹(𝑞) • 𝐹 𝑝 : Friend set of person 𝑝 • If a common friend 𝑓 of 𝑝 and 𝑞 has many friends (large degree), it is less possible to believe the friendship of 𝑝 and 𝑞 because of 𝑓 • 𝑠𝑐𝑜𝑟𝑒 = 𝑓∈𝐹 𝑝 ∩𝐹(𝑞) 1 𝐷(𝑓) • 𝐷(𝑓): Degree (number of friends) of person 𝑓 𝑝 𝑞 𝑝 𝑓 𝑞 degree = 106 degree = 2 91
  • 93. Social Rec V.S. Collaborative Filtering Rating matrix 𝑅 Social matrix 𝑇 Items Users Rating matrix 𝑅 Traditional recommendation systems Social recommendation systems 93
  • 94. Memory-Based Social Recommendation • Especially for user-based CF • Constraining the set of correlated users – Traditional: 𝑁(u) obtained from rating matrix 𝑅 – Social: 𝑁+ (u) obtained from both the rating matrix 𝑅 and social matrix 𝑇 • The prediction model (aggregation of the ratings from the correlated users) remains the same 𝑟𝑢,𝑖 = 𝑟 𝑢 + 𝑣∈𝑁+(𝑢) 𝑠𝑖𝑚(𝑢, 𝑣)(𝑟𝑣,𝑖 − 𝑟 𝑣) 𝑣∈𝑁+(𝑢) 𝑠𝑖𝑚(𝑢, 𝑣) 94
  • 95. Representative Approaches (1/3) • Social based weight mean – Simply consider 𝑢𝑖’s friends 𝐹(𝑢𝑖) as 𝑁+ (𝑢𝑖) – Approach 1: 𝑁+ 𝑢𝑖 = 𝑁 𝑢𝑖 ∩ 𝐹(𝑢𝑖) • Could be problematic when 𝑁 𝑢𝑖 ∩ 𝐹 𝑢𝑖 = ∅ – Approach 2: 𝑁+ 𝑢𝑖 = 𝑆(𝑢𝑖) • 𝑆(𝑢𝑖): The 𝑘 most similar friends of 𝑢𝑖 from 𝐹(𝑢𝑖) – The predicted ratings from friends can be far different from the ratings predicted using most similar users 95
  • 96. Example (1 / 2) • To predict 𝑟𝐽𝑖𝑙𝑙,𝑀𝑢𝑙𝑎𝑛 given cosine similarity – User-based CF – Neighborhood size is 2 – 𝐹 𝐽𝑖𝑙𝑙 = 𝐽𝑜𝑒, 𝐽𝑎𝑛𝑒, 𝐽𝑜𝑟𝑔𝑒 Rating Lion King Aladdin Mulan Anastasia John 4 3 2 2 Joe 5 2 1 5 Jill 2 5 ? 0 Jane 1 3 4 3 Jorge 3 1 1 2 Friend John Joe Jill Jane Jorge John 1 1 Joe 1 1 Jill 1 1 1 Jane 1 Jorge 1 1 𝑟𝐽𝑜ℎ𝑛 = 4 + 3 + 2 + 2 4 = 2.75 𝑟𝐽𝑜𝑒 = 5 + 2 + 1 + 5 4 = 3.25 𝑟𝐽𝑖𝑙𝑙 = 2 + 5 + 0 3 = 2.33 𝑟𝐽𝑎𝑛𝑒 = 1 + 3 + 4 + 3 4 = 2.75 𝑟𝐽𝑜𝑟𝑔𝑒 = 3 + 1 + 1 + 2 4 = 1.75 96
  • 97. Example (2 / 2) • We have – Rating similarity ∶ 𝑁 𝐽𝑖𝑙𝑙 = {𝐽𝑜ℎ𝑛, 𝐽𝑎𝑛𝑒} – Friendship: 𝐹 𝐽𝑖𝑙𝑙 = 𝐽𝑜𝑒, 𝐽𝑎𝑛𝑒, 𝐽𝑜𝑟𝑔𝑒 – 𝐹 𝐽𝑖𝑙𝑙 ∩ 𝑁 𝐽𝑖𝑙𝑙 = 𝐽𝑎𝑛𝑒 – 𝑆 𝐽𝑖𝑙𝑙 = {𝐽𝑎𝑛𝑒, 𝐽𝑜𝑟𝑔𝑒}, when k=2 𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝑱𝒐𝒉𝒏 = 2 × 4 + 5 × 3 + 0 × 2 22 + 52 + 02 42 + 32 + 22 = 𝟎. 𝟕𝟗 𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝐽𝑜𝑒 = 2 × 5 + 5 × 2 + 0 × 5 22 + 52 + 02 52 + 22 + 52 = 0.50 𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝑱𝒂𝒏𝒆 = 2 × 1 + 5 × 3 + 0 × 3 22 + 52 + 02 12 + 32 + 32 = 𝟎. 𝟕𝟐 𝑠𝑖𝑚 𝐽𝑖𝑙𝑙, 𝐽𝑜𝑟𝑔𝑒 = 2 × 3 + 5 × 1 + 0 × 2 22 + 52 + 02 32 + 12 + 22 = 0.54 Approach 1: 𝑟𝐽𝑖𝑙𝑙,𝑀𝑢𝑙𝑎𝑛 = 2.33 + 0.72(4−2.75) 0.72 = 3.58 Approach 2: 𝑟𝐽𝑖𝑙𝑙,𝑀𝑢𝑙𝑎𝑛 = 2.33 + 0.72 4−2.75 +0.54(1−1.75) 0.72+0.54 = 2.72 97
  • 99. MF Framework with Social Info • Framework – min 𝑈,𝑉,Ω 𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹 2 + 𝛼 ∗ 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω + 𝜆( 𝑈 𝐹 2 + 𝑉 𝐹 2 + Ω 𝐹 2 ) – 𝑆𝑜𝑐𝑖𝑎𝑙(𝑇, 𝑆, Ω): Social information – 𝑇: Adjacency matrix of a social network – 𝑆: Similarity matrix • E.g. Cosine similarity 𝑆𝑖,𝑗 between rating vectors 𝑅𝑖, 𝑅𝑗 of two users 𝑢𝑖 and 𝑢𝑗 if they are connected in the social network; 𝑆𝑖,𝑗 = 0 otherwise – Ω: The set of parameters learned from social information – 𝑊: The weights of known ratings; 𝑊𝑖,𝑗 = 1 if 𝑅𝑖,𝑗 known • Categories w.r.t. different 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω – Co-factorization methods – Ensemble methods – Regularization methods 99
  • 100. Co-Factorization Methods (1/2) • The user-item rating matrix 𝑅 and user-user social matrix 𝑇 shares the same user latent preference • SoRec – 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖 𝑇 𝑧 𝑘 2 – 𝑁(𝑖): The set of neighbors of 𝑢𝑖 in the social network – 𝑍 = 𝑧1, 𝑧2, … , 𝑧 𝑛 ∈ ℝ 𝐾×𝑛: Latent feature matrix of 𝐾 latent factors – The user preference matrix 𝑈 = 𝑢1, 𝑢2, … , 𝑢 𝑛 𝑇 ∈ ℝ 𝑛×𝐾 is learned from both rating information and social information – Optimization problem: min 𝑈,𝑉,Z 𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹 2 + 𝛼 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖 𝑇 𝑧 𝑘 2 + 𝜆( 𝑈 𝐹 2 + 𝑉 𝐹 2 + 𝑍 𝐹 2 ) 100Ma, H., Yang, H., Lyu, M., King, I.: SoRec: social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM conference on Information and knowledge management, pp. 931–940. ACM (2008)
  • 101. Co-Factorization Methods (2/2) • LOCABAL – 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖 𝑇 𝐻𝑢 𝑘 2 – The user preference of two connected users are correlated via a correlation matrix 𝐻 ∈ ℝ 𝐾×𝐾 – A large value of 𝑆𝑖,𝑘 (strong connection) indicates that user preference 𝑢𝑖 and 𝑢 𝑘 should be highly correlated via 𝐻 – Optimization problem: min 𝑈,𝑉,𝐻 𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹 2 + 𝛼 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖 𝑇 𝐻𝑢 𝑘 2 + 𝜆( 𝑈 𝐹 2 + 𝑉 𝐹 2 + 𝐻 𝐹 2 ) 101 Tang, J., Hu, X., Gao, H., Liu, H.: Exploiting local and global social context for recommendation. In: IJCAI (2013)
  • 102. Additional Benefit the Co-Factorization • Since we want to learn the user latent U s.t. min 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 − 𝑢𝑖 𝑇 𝐻𝑢 𝑘 2 , it is possible to reconstruct S as SUTHU • By doing such, we are performing link prediction • That means co-factorization allows us to perform recommendation and link prediction jointly 102
  • 103. Ensemble Methods (1/2) • CF and their social neighbors jointly contribute to the rating prediction – A missing rating for a given user is predicted as a linear combination of • ratings from the similar users/items (CF), and • ratings from the social neighbors (social network) • STE – The estimated rating from user 𝑢𝑖 to item 𝑣𝑗 is defined as 𝑅𝑖,𝑗 = 𝑢𝑖 𝑇 𝑣𝑗 + 𝛽 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑢 𝑘 𝑇 𝑣𝑗 – Optimization problem: min 𝑈,𝑉 𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 − 𝛽𝑆𝑈 𝑇 𝑉 𝐹 2 103 H. Ma, I. King, and M. R. Lyu. Learning to recommend with social trust ensemble. In SIGIR 2009 , pages 203–210.
  • 104. Ensemble Methods (2/2) • mTrust – The estimated rating from user 𝑢𝑖 to item 𝑣𝑗 is defined as 𝑅𝑖,𝑗 = 𝑢𝑖 𝑇 𝑣𝑗 + 𝛽 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑅 𝑘,𝑗 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 – Optimization problem: min 𝑈,𝑉,𝑆 𝑖 𝑗 𝑊𝑖,𝑗 𝑅𝑖,𝑗 − 𝑢𝑖 𝑇 𝑣𝑗 − 𝛽 𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 𝑅 𝑘,𝑗 𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 2 – The similarity matrix 𝑆 in mTrust needs to be learned, while it is predefined in STE weighted mean of the ratings of vj from ui’s social neighbor 104 Tang, J., Gao, H., Liu, H.: mTrust: Discerning multi-faceted trust in a connected world. In: Proceedings of the fifth ACM international conference onWeb search and data mining, pp. 93–102. ACM (2012)
  • 105. Regularization Methods (1/2) • A user’s preference should be similar to that of the user’s social neighbors • SocialMF – 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1 𝑛 𝑢𝑖 − 𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 𝑢 𝑘 2 – 𝑢 𝑘∈𝑁 𝑖 𝑆𝑖,𝑘 𝑢 𝑘: Weighted average preference of the user’s neighbors – Optimization problem: min 𝑈,𝑉 𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹 2 + 𝛼 𝑖=1 𝑛 𝑢𝑖 − 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑢 𝑘 2 + 𝜆( 𝑈 𝐹 2 + 𝑉 𝐹 2 ) 105 Jamali, M., Ester, M.: A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the fourth ACM conference on Recommender systems, pp. 135–142. ACM (2010)
  • 106. Regularization Methods (2/2) • Social Regularization – 𝑆𝑜𝑐𝑖𝑎𝑙 𝑇, 𝑆, Ω = min 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑢𝑖 − 𝑢 𝑘 2 – A large value of 𝑆𝑖,𝑘 (strong connection) implies that user 𝑢𝑖 and 𝑢 𝑘 have more similar preference (distance of latent factors is smaller) – Optimization problem: min 𝑈,𝑉 𝑊⨀ 𝑅 − 𝑈 𝑇 𝑉 𝐹 2 + 𝛼 𝑖=1 𝑛 𝑢 𝑘∈𝑁(𝑖) 𝑆𝑖,𝑘 𝑢𝑖 − 𝑢 𝑘 2 + 𝜆( 𝑈 𝐹 2 + 𝑉 𝐹 2 ) 106 Ma, H., Zhou, D., Liu, C., Lyu, M., King, I.: Recommender systems with social regularization. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp. 287–296. ACM (2011)
  • 107. Datasets • There are no agreed benchmark datasets for social recommendation systems • Used datasets – Epinions03 • Product reviews and ratings, directed trust network – Flixster • Moving ratings, undirected social network – Ciao • Product reviews and ratings with time , categories, etc. – Epinions11 • More information included e.g. temporal information, categories of products, distrust, etc. 107
  • 108. Advanced Issue 4: Implicit Recommendation • Sometimes it is hard to obtain user ratings toward items. • However, it is easier to obtain some implicit information from users to items, such as – Browsing – Purchasing – Listen/watch Implicit ratings Item1 Item2 Item3 Item4 User1 ? 1 ? ? User2 1 ? ? 1 User3 ? 1 ? 1 User4 1 ? 1 ? • The rating matrix contains only ‘1’ and ‘?’ • Regular MF would predict every ‘?’ to 1 to minimize the squared error 108
  • 109. One-Class Collaborative Filtering- Matrix Factorization (OCCF-MF) Model • MF model approximates a rating by the inner product of a user feature vector and an item feature vector. • However, directly apply MF models on the original rating matrix yields poor performance – Due to the ignorance of the negative instances • Solution 1 (OCCF-MF) : for each user, sample some items with negative ratings to create a more condensed matrix before applying MF. • Solution 2 (BPR-MF): Using pairwise ranking to update the model. 109
  • 110. Advanced Issue 5: Location-based Recommendation • Learning Travel Recommendations from User- generated GPS Traces. ACM TIST 2011. • Measuring and Recommending Time-Sensitive Routes from Location-based Data. ACM TIST 2014. • New Venue Recommendation in Location-Based Social Networks. IEEE SocialCom 2012. • Exploiting Geographical Influence for Collaborative Point-of-Interest Recommendation. ACM SIGIR 2011. • Time-aware Point-of-Interest Recommendation. ACM SIGIR 2013. 110
  • 111. S Empire State Building Madison Square Garden Time Square MOMA Metropolitan Museum of Art Central Park Yankee Stadium The Statue of Liberty Current Location SOHO World Trade Center Wall Street 10:00 AM D 11:00 AM 13:00 PM 16:00 PM 19:00 PM 21:00 PM 2016/12/17 111
  • 112. • Places are sensitive to the visiting time, for example, – People usually visit the Empire State Building from about 12:00 to the mid night (night view is popular) – People tend to visit the Madison Square Garden in the early evening for a basketball game – The proper time to visit the Central Park is during daytime – Time Square is preferred from afternoon to midnight. The proper visiting time of a place 2016/12/17 112
  • 113. Overview of our approaches • Incorporate four important factors into a goodness function – popularity of a place – visiting order of places – proper visiting time of a place – proper transit time between places • Construct one-day route with high goodness function efficiently – Propose the A* like algorithm to construct route toward destination 2016/12/17 113
  • 114. Scenario 2016/12/17 114 Central Park(8:00AM) ↓ Grand Army plaza(11:00AM) ↓ 5th ave(12:00PM) ↓ Rockefeller Center(14:00PM) ↓ Time Square(16:00PM) Popular? Proper visiting time? Proper transit time? Smooth order?
  • 115. • Definition: Temporal Visiting Distribution (TVD), as the probability distribution of check- in for a location Proper Visiting Time (1) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Probability Time (hour) TVD TVD 2016/12/17 115
  • 116. • The goodness of visiting a place at 8:00AM – A thin Gaussian distribution (with 8 mean and small variance). – Measure the difference between the 𝐺(𝑡; 𝜇, 𝜎2) with TVD. • Symmetric Kullback-Leibler (KL) Divergence. Proper Visiting Time (2) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 Probability Time (hour) TVD Normal Distribution 2016/12/17 116
  • 117. • Definition: Duration Distribution (DD), The probability distribution over time between locations li and lj Proper Transit Time Duration(1) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Probability Time (hour) DD DD 2016/12/17 117
  • 118. • Similar to what we do to TVD, given a pair of locations li and lj, we can model a thin Gaussian distribution(∆) and compare it with DD using symmetric KL divergence. Proper Transit Time Duration(2) 2016/12/17 118
  • 119. • Exploit n-gram language model to measure the quality of the order • Uni-gram(Popularity), bi-gram, and tri-gram Proper Visiting Order (1) 2016/12/17 119
  • 120. • The goodness of visiting order of a route : Proper Visiting Order (2) 2016/12/17 120
  • 121. • Use a parameter α ϵ [0,1] to devise a linear combination of such two parts – Time-sensitive properties – Popularity and Order Final Goodness Function 2016/12/17 121
  • 122. ls 10 AM ld l9 l5 l6 l7 l8 ? ? ? Step1. For each candidate lc , calculate g(𝑙 𝑐) = 𝑓 𝑙 𝑠 → 𝑙 𝑐 Step2. Choose the best 𝑓 𝑙 𝑐 → 𝑙 𝑑 Step3. Calculate the distance of 𝑑(𝑙𝑐 → 𝑙 𝑑) l2 Route Construction - Guidance Search l3 l4 13 PM 15 PM 18 PM 20 PM Step4. ℎ 𝑙 = 𝑚𝑎𝑥(𝑙→𝑙 𝑑)∈𝑆(𝑙→𝑙 𝑑){ 𝑓 𝑙 𝑐 → 𝑙 𝑑 × (1 − 𝑑(𝑙𝑐 → 𝑙 𝑑))} Step5. 𝑓∗ 𝑙 = (1 − 𝛽) × 𝑔 𝑙 + 𝛽 × ℎ(𝑙) The heuristic satisfaction function 𝑓∗ 𝑙 direct the search algorithm to move toward destination
  • 123. • Utilize the Gowalla dataset crawled by Dr. Jure Leskovec – contains 6,442,890 check-in records from Feb. 2009 to Oct. 2010. The total number of check-in locations is 1,280,969. • Extract three subsets of the check-in data, which corresponds to cities of New York, San Francisco, and Paris. Gowalla dataset Total Number of Check-ins Avg. Route Length Variance of Route Length Distinct Check-in Locations RouteDB 6,442,890 4.09 48.04 1,280,969 New York 103,174 4.46 71.24 14,941 San Francisco 187,568 4.09 58.36 15,406 Paris 14,224 4.45 75.73 3,472 2016/12/17 123
  • 124. Three experiments 2016/12/17 124 • Pair-wise Time-sensitive Route Detection exploiting replacing strategy – Is the designed goodness function reasonable? • Time-sensitive Cloze Test of Locations in Routes – Can the heuristic function and search method capture the human mobility and toward destination? • Approximation ratio of Guidance Search – How close is the goodness measure of our method toward the global optimal value
  • 125. • Evaluation measure – Accuracy: is calculated as the number of successfully detected routes divided by the number of pair instances. Evaluation Plans(1) 2016/12/17 125 ls 10 AM 20 PM l2 l3 l4 18 PM15 PM13 PM ld ls 10 AM 20 PM f2 l3 f4 18 PM15 PM13 PM ld Replace locations with ‘plausible’ ones
  • 126. • Baseline – Distance-based Approach: • Choose the closest location to the current spot as the next spot to move to. – Popular-based Approach: • Choose the most popular spot of a given time in that city as the next spot to move to. – Forward Heuristic Approach: • Choose a location li that possesses the largest bi-gram probability with the previous location P(li|li-1) as the next location to move to. – Backward Heuristic Approach: • Choose a location li that possesses the largest bi-gram probability with the next location P(li+1|li) as the next location to move to. Compared methods 2016/12/17 126
  • 127. Experiment 1: Pairwise Time-Sensitive Route Detection Accuracy for New York Accuracy for San Francisco Accuracy for Paris 2016/12/17 127
  • 128. • Evaluation measure – Hit rate: means the ratio of guessing correctly instances Evaluation Plans(2) 2016/12/17 128 ls 10 AM 20 PM l2 l3 l4 18 PM15 PM13 PM ld ls 10 AM 20 PM ? ? ? 18 PM15 PM13 PM ld Removing some middle locations
  • 129. Experiment 2: Time-Sensitive Cloze Test in Routes Hit rate for New York Hit rate for San Francisco Hit rate for Paris 2016/12/17 129 0% 5% 10% 15% 20% 25% 30% 35% 1 2 3 4 5 Hitrate Number of guessing instance per route Our Search method Greedy Search[8] Distance-based method Popular-based method forward heuristic backward heuristic 0% 5% 10% 15% 20% 1 2 3 4 5 Hitrate Number of guessing instance per route Our Search method Greedy Search[8] Distance-based method Popular-based method forward heuristic backward heuristic 0% 5% 10% 15% 20% 25% 30% 35% 40% 1 2 3 4 5 Hitrate Number of guessing instance per route Our Search method Greedy Search[8] Distance-based method Popular-based method forward heuristic backward heuristic
  • 130. Experiment 2: Time-Sensitive Cloze Test in Routes Hit rate for New York Hit rate for San Francisco Hit rate for Paris 2016/12/17 130 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 0 0.25 0.5 0.75 1 Hitrate β Our Search method Greedy Search[8] Distance-based method Popular-based method forward heuristic backward heuristic 0% 2% 4% 6% 8% 10% 0 0.25 0.5 0.75 1 Hitrate β Our Search method Greedy Search[8] Distance-based method Popular-based method forward heuristic backward heuristic 0% 5% 10% 15% 20% 25% 0 0.25 0.5 0.75 1 Hitrate β Our Search method Greedy Search[8] Distance-based method Popular-based method forward heuristic
  • 131. The approximation ratio of all methods compared with the Exhaustive Search. 2016/12/17 131 • Average route construction time <=0.0043(65814 times faster than Exhaustive Search) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ApproximationRatio 𝛽 Guadian Search Greedy Search[8] Distance-based method Popular-based method Forward heuristic Backward heuristic
  • 132. • Propose a novel time-sensitive route planning problem using the check-in data in location-based services. • The first work considering four elements – popularity of a place – visiting order of places – proper visiting time of a place – proper transit time between places • Model the four factors into the our design of the goodness function by exploiting some statistical methods. • Given time-sensitive location query, we propose a Guidance Search to search the route by optimizing the goodness function Conclusions 2016/12/17 132
  • 133. Advanced Issue 6: Explaining the Recommendation Outputs • With reasonable explanation, recommendation can be more powerful – Recommendation  Persuasion • Goal: how can the system automatically generates human-understandable explanations for the recommendation. 133 Explanation in recommender systems, AIR 2005 Explaining collaborative filtering recommendations CSCW 2000
  • 134. Advanced Issue 7: Recommendation for Groups of Users • Recommendation for an individual 𝑢 can be extended to a group 𝐺 of individuals – e.g. advertisement on a social media site • Aggregation strategy – Considering the ratings for each individual – Aggregating ratings for the individuals in a group 134
  • 135. Aggregation Strategies • Maximizing average satisfaction – For a group 𝐺 of 𝑛 users, we have the average rating 𝑅𝑖 = 1 𝑛 𝑢∈𝐺 𝑟𝑢,𝑖 for all items 𝑖 – Recommending the items that have the highest 𝑅𝑖’s • Least misery – No user is recommended an item that he or she strongly dislikes – 𝑅𝑖 = min 𝑢∈𝐺 𝑟𝑢,𝑖 • Most pleasure – At least one user enjoys the recommendation most – 𝑅𝑖 = max 𝑢∈𝐺 𝑟𝑢,𝑖 135
  • 136. Example of Aggregation • Group 𝐺 = 𝐽𝑜ℎ𝑛, 𝐽𝑖𝑙𝑙, 𝐽𝑢𝑎𝑛 • Best recommendation – Average satisfaction: Tea – Least misery: Water – Most pleasure: Coffee Soda Water Tea Coffee John 1 3 1 1 Joe 4 3 1 2 Jill 2 2 4 2 Jorge 1 1 3 5 Juan 3 3 4 5 Average Satisfaction Least Misery Most Pleasure 𝑅 𝑆𝑜𝑑𝑎 = 1 + 2 + 3 3 = 2 𝑅 𝑊𝑎𝑡𝑒𝑟 = 3 + 2 + 3 3 = 2.66 𝑅 𝑇𝑒𝑎 = 1 + 4 + 4 3 = 3 𝑅 𝐶𝑜𝑓𝑓𝑒𝑒 = 1 + 2 + 5 3 = 2.66 𝑅 𝑆𝑜𝑑𝑎 = min{1,2,3} = 1 𝑅 𝑊𝑎𝑡𝑒𝑟 = min{3,2,3} = 2 𝑅 𝑇𝑒𝑎 = min{1,4,4} = 1 𝑅 𝐶𝑜𝑓𝑓𝑒𝑒 = min{1,2,5} = 1 𝑅 𝑆𝑜𝑑𝑎 = max{1,2,3} = 3 𝑅 𝑊𝑎𝑡𝑒𝑟 = max{3,2,3} = 3 𝑅 𝑇𝑒𝑎 = max{1,4,4} = 4 𝑅 𝐶𝑜𝑓𝑓𝑒𝑒 = max{1,2,5} = 5 136
  • 137. Quiz:推薦系統之阿拉丁神燈part 3 • 小胖跟蔡10自從上次共進晚餐後互有好感,於是FB 狀態改成「穩定交往中」。 • 有天,蔡10跟小胖借了神燈來許願,她說:「我平 常工作很忙,沒有什麼好朋友,你可以推薦一個朋 友給我嗎?」 • 神燈巨人說:沒問題。於是繃的一聲,一個女孩變 了出來。 • 小胖看到這個女孩臉色大變,原來出現的是他的前 女友。神燈巨人說:「我推薦她給蔡10,因為妳們 兩個有共同好友!!!!!」 • 請問,神燈巨人是執行何種推薦系統? 137
  • 139. LIBMF: A Matrix-factorization Library for Recommender Systems URL: https://www.csie.ntu.edu.tw/~cjlin/libmf/ Language: C++ Focus: solvers for Matrix-factorization models R interface: package “recosystem” https://cran.r-project.org/web/packages/recosystem/index.html Features: • solvers for real-valued MF, binary MF, and one-class MF • parallel computation in a multi-core machine • less than 20 minutes to converge on a data set of 1.7B ratings • supporting disk-level training, which largely reduces the memory usage 13 9
  • 140. MyMediaLite - a recommender system algorithm library URL: http://mymedialite.net/ Language: C# Focus: rating prediction and item prediction from positive- only feedback Alogrithms: kNN, BiasedMF, SVD++... Features: Measure: MAE, RMSE, CBD, AUC, prec@N, MAP, and NDCG 14 0
  • 141. LibRec: A Java Library for Recommender Systems URL: http://www.librec.net/index.html Language: Java Focus: algorithms for rating prediction and item ranking Algorithms: kNN, PMF, NMF, BiasedMF, SVD++, BPR, LDA…more at: http://www.librec.net/tutorial.html Features: Faster than MyMediaLite Collection of Datasets: MovieLens 1M, Epinions, Flixster... http://www.librec.net/datasets.html 14 1
  • 142. mrec: recommender systems library URL: http://mendeley.github.io/mrec/ Language: Python Focus: item similarity and other methods for implicit feedback Algorithms: item similarity methods, MF, weighted MF for implicit feedback Features: • train models and make recommendations in parallel using IPython • utilities to prepare datasets and compute quality metrics 14 2
  • 143. SUGGEST: Top-N recommendation engine Python interface: “pysuggest” https://pypi.python.org/pypi/pysuggest Focus: collaborative filtering-based top-N recommendation algorithms (user-based and item-based) Algorithms: user-based or item-based collaborative filtering based on various similarity measures Features: low latency: compute top-10 recommendations in less that 5us 14 3
  • 144. Conclusion • Recommendation is arguably the most successful AI/ML solutions till now. • It is not just for customer-product • Matching users and users • Matching users and services • Matching users and locations • … • The basic skills and tools are mature, but the advanced issues are not fully solved. • Lack of data for cold start users is still the main challenging task to be solved. 144
  • 145. References (1/3) • SVD++ – Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008. • TF – Karatzoglou, Alexandros, et al. "Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering." Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010. • FM – Rendle, Steffen, et al. "Fast context-aware recommendations with factorization machines." Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011. • BPR-MF – Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 2009. • BPMF – Salakhutdinov, Ruslan, and Andriy Mnih. "Bayesian probabilistic matrix factorization using Markov chain Monte Carlo." Proceedings of the 25th international conference on Machine learning. ACM, 2008. • PF – Gopalan, Prem, Jake M. Hofman, and David M. Blei. "Scalable recommendation with poisson factorization." arXiv preprint arXiv:1311.1704 (2013). 145
  • 146. References (2/3) • Link prediction – Menon, Aditya Krishna, and Charles Elkan. "Link prediction via matrix factorization." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2011. • Community detection – Yang, Jaewon, and Jure Leskovec. "Overlapping community detection at scale: a nonnegative matrix factorization approach." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013. – Psorakis, Ioannis, et al. "Overlapping community detection using bayesian non-negative matrix factorization." Physical Review E 83.6 (2011): 066114. • Clustering – Ding, Chris HQ, Xiaofeng He, and Horst D. Simon. "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering." SDM. Vol. 5. 2005. – Kuang, Da, Sangwoon Yun, and Haesun Park. "SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering." Journal of Global Optimization 62.3 (2015): 545-574. • Word embedding – Levy, Omer, and Yoav Goldberg. "Neural word embedding as implicit matrix factorization." Advances in neural information processing systems. 2014. – Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global Vectors for Word Representation." EMNLP. Vol. 14. 2014. 146
  • 147. References (3/3) • The Matrix Cookbook – http://www2.imm.dtu.dk/pubdb/views/edoc_do wnload.php/3274/pdf/imm3274.pdf – Collection of matrix facts, including derivatives and expected values 147
  • 148. 人工智慧與機器學習 Shou-De Lin(林守德) 2016/12/17 1 CSIE/GINM, NTU sdlin@csie.ntu.edu.tw 人工智慧 機器學習 推薦系統
  • 149. 從實務面上定義人工智慧 • 能夠觀察,瞭解,並對人事物進行反應 (跟世界互動) – Reinforcement Learning, Markov Decision Process • 能夠找到最佳的解決辦法 – Optimization • 能夠推論以及規劃 – Inference and Planning • 能夠學習以及調適 – Machine Learning and adaption
  • 150. 強人工智慧 vs. 弱人工智慧 • 希望電腦的產生智慧方式跟人類一樣 (Strong AI) • 希望電腦能夠展現出有智慧的外顯行為 (weak AI) 2016/12/17 3
  • 151. What is Machine Learning • ML tries to optimize a performance criterion using example data or past experience. • Mathematically speaking: given some data X, we want to learn a function mapping f(X) for certain purpose – f(x)= a label y  classification – f(x)= a set Y in X  clustering – f(x)=p(x)  probabilistic graphical model – f(x)= a set of y  multi-label classification – … • ML techniques tell us how to produce high quality f(x), given certain objective and evaluation metrics 2016/12/17 4
  • 152. AI Research Are Mostly Driven By Solving Concrete Challenges
  • 153. Deep Blue (1996) • 第一個贏過棋王的 程式 – 利用電腦平行運算 能力從事地毯式的 搜尋 – 在演算法上面並沒 有特別突破 Deep Blue (photo taken by James at Computer History Museum)
  • 154. DARPA Grand Challenge (2004) • DARPA 提供一百萬美金給第一台橫越內華 達沙漠的車 • 所有的隊伍在2004年都失敗了,但是在 2005年有5台車成功橫越。 • 2007 進化為Urban challenge
  • 155. Loebner Prize • $25000 給予第一個被認為是人類的聊天機 器人 – 類似目的的系統如Siri, 小冰 • 頗具爭議性的AI競賽 • 至今仍未有人贏得最高獎金
  • 156. RoboCup (1997-now) • 機器人足球競賽 • "By the middle of the 21st century, a team of fully autonomous humanoid robot soccer playe rs shall win a soccer game, complying with the official rules of FIFA, against the winner of the most recent World Cup.“
  • 157. ACM KDD Cup (1997~now) • It is an annual competition in the area of knowledge discovery and data mining • Organized by ACM special interest group on KDD, started from 1997, now considered as the most prestigious data mining competition • Competition lasts roughly 2-4 months
  • 158. Team NTU’s Performance on ACM KDD Cup 11 KDD Cups 2008 2009 2010 2011 2012 2013 Organizer Siemens Orange PSLC Datashop Yahoo! Tencent Microsoft Topic Breast Cancer Prediction User Behavior Prediction Learner Performance Prediction Recommendation Internet advertising (track 2) Author-paper & Author name Identification Data Type Medical Telcom Education Music Search Engine Log Academic Search Data Challenge Imbalance Data Heterogeneous Data Time- dependent instances Large Scale Temporal + Taxonomy Info Click through rate prediction Alias in names # of records 0.2M 0.1M 30M 300M 155M 250K Authors, 2.5M papers # of teams >200 >400 >100 >1000 >170 >700 Our Record Champion 3rd place Champion Champion Champion Champion 2016/12/17 Prof. Shou-De Lin in NAU
  • 159. IBM Watson (2011) • 參與益智問答Jeopardy 節目打敗過去的冠軍 • Watson內部有上百個有智慧的模組(從語 言分析一直到搜尋引擎)。 • 代表電腦在「知識問答」的任務已經能夠 超過人類
  • 160. AlphaGo (2016) • Google’s AlphaGo 贏得號稱最困難的棋類: 圍棋 • 與IBM的深藍不同,AlphaGo是結合硬體與 複雜機器學習演算法
  • 161. 實現人工智慧的技術重點 • 知識表徵 Knowledge Representation • (機器)學習:(machine) Learning • (機器)規劃 :Planning • 搜尋與最佳化 Search and Optimization
  • 162. 知識表徵 Knowledge Representation • 知識表徵的方法有很多,取決於將來如何 「使用」這個知識。 – Logical Representations, e.g.  x, King(x)  Greedy (x)  Evil (x) – Rules (can be written in logic or other form) – Semantic Networks/graphs – Frames From Wikipedia
  • 163. 機器學習(ML) • 目的:從現有的資料建構一個系統能夠做 出最佳的決定 • 數學上而言,機器學習就是給定input X,想 要去學一個函數f(X)能夠產生我們要的結果 Y – X: X光照片  Y:是否得癌症, classification – X: 金融新聞  Y:股票指數, regression – X: 很多人的照片  Y:把照片相似的分到同一 群,clustering
  • 165. 搜尋與最佳化 • 在許多可能性中,找出最好的選擇 – 下棋:在所有可能的下一步中,搜尋出最有可能勝 利的一步 – 機器人足球:在所有可能的行動裡,選擇最有可能 得分的行動 – 益智問答:在所有可能的答案裡,選擇最有可能是 正確 • 能夠被model成最佳化的問題,基本上AI就可 以解 – 關鍵在於搜尋與最佳化的速度
  • 166. 人工智慧與其他領域的結合 • 哲學(邏輯,認知,自我意識) • 語言學(語法學,語意學,聲韻學) • 統計與數學(機器學習) • 控制(最佳化) • 經濟學(Game Theory) • 神經科學(類神經網路) • 資訊工程(AI-driven CPU, 雲端計算) • 資料科學(資料探勘)
  • 167. 應用一:推薦系統 • 推薦產品給客戶 – E-commence sites – Service providers • 推薦使用者給使用者 – Match maker – Investment/donation • 基本上,能夠對兩方做出智慧型的推薦 – Recommending resume to job – Recommending course to students
  • 168. 應用2:物聯網(IoT) • IoT 包括三個元素:資料,連結以及設備(感 測器) • 相關應用有 – 智慧都市 (parking, traffic, waste management) – 災害防治 (abnormal PM2.5 level, electromagnetic filed levels, forest fire, earthquake, etc) – 健康 (monitoring, fall detection, etc) – 安全 (access control, law enforcement, terrorist identification) – 智慧農業
  • 169. 應用三:SOLOMO • SOLOMOSocial, Local, Mobile • 許多相關應用 –智慧導遊 –Nike+GPS: 路徑追蹤, 馬拉松運動轉播 –智慧廣告看板
  • 170. 應用4: 零售 & E-commence • 智慧型產品規劃 • 智慧倉儲 • 貨品運輸最佳化 • 商品追蹤
  • 172. A variety of ML Scenarios Supervised Learning (Classification & Regression) Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 2016/12/17 25
  • 173. Supervised Learning • Given: a set of <input, output> pairs • Goal: given an unseen input, predict the corresponding output • For example: 1. Input: X-ray photo of chests, output: whether it is cancerous 2. Input: a sentence, output: whether a sentence is grammatical 3. Input: some indicators of a company, output: whether it will make profit next year • There are two kinds of outputs an ML system generates – Categorical: classification problem (E1 and E2) • Ordinal outputs: small, medium, large • Non-ordinal outputs: blue, green, orange – Real values: regression problem (E3) 2016/12/17 26
  • 174. Classification (1/2) • It’s a supervised learning task that, given a real-valued feature vector x, predicts which class in C may be associated with x. • |C|=2  Binary Classification |C|>2  Multi-class Classification • Training and predicting of a binary classification problem: Feature Vector (xi ∈ Rd) Class x1 +1 x2 -1 … … xn-1 -1 xn +1 Training set (Binary Classification) Classifier f(x) Feature Vector (xnew ∈ Rd) Class xnew ? A new instance (1) Training (2) Predicting 2016/12/17 27
  • 175. Classification (2/2) • A classifier can be either linear or non-linear • The geometric view of a linear classifier • Famous classification models: • k-nearest neighbor (kNN) • Decision Tree (DT) • Support Vector Machine (SVM) • Neural Network (NN) • … +1 -1 w wTx + b 2016/12/17 28 We will talk more about classification later !!
  • 176. Regression (1/2) • A supervised learning task that, given a real-valued feature vector x, predicts the target value y ∈ R. • Training and predicting of a regression problem: Feature Vector (xi ∈ Rd) yi ∈ R x1 +0.26 x2 -3.94 … … xn-1 -1.78 xn +5.31 Training set Regression Model Feature Vector (xnew ∈ Rd) ynew ∈ R xnew ? A new instance (1) Training (2) Predicting 2016/12/17 29
  • 177. Regression (2/2) • The geometric view of a linear regression function • Some types of regression: linear regression, support vector regression, … 2016/12/17 30
  • 178. Multi-label Learning • A classification task in that an instance is associated with a set of labels, instead of a single label. Feature Vector (xi ∈ Rd) 𝓁1 𝓁2 𝓁3 x1 +1 -1 +1 x2 -1 +1 -1 … … … … xn-1 +1 -1 -1 xn -1 +1 +1 Training set Classifier A new instance (1) Training (2) Predicting Feature Vector (xnew ∈ Rd) 𝓁1 𝓁2 𝓁3 xnew ? ? ? • Existing models: Binary Relevance, Label Powerset, ML-KNN, IBLR, … 31
  • 179. Multimedia tagging • Many websites allow the user community to add tags, thus enabling easier retrieval. Example of a tag cloud: the beach boys, from Last.FM (Ma et al., 2010) 32
  • 180. Cost-sensitive Learning • A classification task with non-uniform cost for different types of classification error. • Goal: To predict the class C* that minimizes the expected cost rather than the misclassification rate • Methods: cost-sensitive SVM, cost-sensitive sampling jk k k j LxCYPC   )|(minarg* • An example cost matrix L : medical diagnosis Ljk Actual Cancer Actual Normal Predict Cancer 0 1 Predict Normal 10000 0 2016/12/17 33
  • 181. Examples for Cost-sensitive Learning • Highly non-uniform misclassification costs are very common in a variety of challenging real- world machine learning problems – Fraud detection – Medical diagnosis – Various problems in business decision-making. Credit cards are one of the most famous targets of fraud. The cost of missing a target (fraud) is much higher than that of a false-positive. Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, “Generalized k-Labelsets Ensemble for Multi- Label and Cost-Sensitive Classification,” IEEE Transactions on Knowledge and Data Engineering. 2016/12/17 34
  • 182. Multi-instance Learning (1/2) • A supervised learning task in that the training set consists of bags of instances, and instead of associating labels on instances, labels are only assigned to bags. • The goal is to learn a model and predict the label of a new instance or a new bag of instances. • In the binary case, Positive bag  at least one instance in the bag is positive Negative bag  all instances in the bag are negative 2016/12/17 35 Bag+ + - - - - Bag- - - - - -
  • 183. Multi-instance Learning (2/2) • Training and Prediction Feature Vector (xi ∈ Rd) Bag x1 1 x2 1 x3 2 x4 2 x5 2 … … xn-1 m xn m Bag Class 1 +1 2 -1 … … m -1 Feature Vector or Bag Class BAGnew / xnew ? A new instance or bag • Some methods: Learning Axis-Parallel Concepts, Citation kNN, mi-SVM, MI-SVM, Multiple-decision tree, … Training Set Classifier (1) Training (2) Predicting BAGnew = {xnew-1, …, xnew-k} 2016/12/17 36
  • 184. A variety of ML Scenarios Supervised Learning (Classification & Regression) Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 2016/12/17 37
  • 185. Semi-supervised Learning • Only a small portion of data are annotated (usually due to high annotation cost) • Leverage the unlabeled data for better performance 38 [Zhu2008]
  • 186. A Label for that Example Request for the Label of another Example A Label for that Example Request for the Label of an Example Active Learning Unlabeled Data ... Algorithm outputs a classifier Learning Algorithm Expert / Oracle • Achieves better learning with fewer labeled training data via actively selecting a subset of unlabeled data to be annotated 39
  • 187. A variety of ML Scenarios Supervised Learning (Classification & Regression) Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 2016/12/17 40
  • 188. Unsupervised Learning • Learning without teachers (presumably harder than supervised learning) – Learning “what normally happens” – Think of how babies learn their first language (unsupervised) comparing with how people learn their 2nd language (supervised). • Given: a bunch of input X (there is no output Y) • Goal: depending on the tasks, for example – Estimate P(X)  then we can find augmax P(X) PGM – Finding P(X2|X1) we can know the dependency between inputs  PGM – Finding Sim(X1,X2)  then we can group similar X’s  clustering 2016/12/17 41
  • 189. Clustering • An unsupervised learning task • Given a finite set of real- valued feature vector S ⊂ Rd, discover clusters in S Feature Vector (xi ∈ Rd) x1 x2 … xn-1 xn S Clustering Algorithm • K-Means, EM, Hierarchical classification, etc 2016/12/17 42
  • 190. A variety of ML Scenarios Supervised Learning (Classification & Regression) Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 2016/12/17 43
  • 191. Reinforcement Learning (RL) • RL is a “decision making” process. – How an agent should make decision to maximize the long- term rewards • RL is associated with a sequence of states X and actions Y (i.e. think about Markov Decision Process) with certain “rewards”. • It’s goal is to find an optimal policy to guide the decision. 44 Figure from Mark Chang
  • 192. AlphaGo: SL+RL • 1st Stage:天下棋手為我師 (Supervised Learning) – Data: 過去棋譜 – Learning: f(X)=Y, X: 盤面, Y:next move – Results: AI can play Go now, but not an expert • 2nd Stage:超越過去的自己(Reinforcement Learning) – Data: generating from playing with 1st Stage AI – Learning: Observation盤面, rewardif win, action next move 2016/12/17 45
  • 193. A variety of ML Scenarios Supervised Learning (Classification & Regression) Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 2016/12/17 46
  • 194. Online Learning • Data arrives incrementally (one-at-a-time) • Once a data point has been observed, it might never be seen again. • Learner makes a prediction on each observation. • Time and memory usage cannot scale with data. • Algorithms may not store previously seen data and perform batch learning. • Models resource-constrained learning, e.g. on small devices. Q1? Ans1 Q2? Ans2 Q3? Ans347
  • 195. Scenario: Human activity recognition using data from wearable devices (one of the dataset we have experimented) Daily activities Wearable device Sensory data in real-time BluetoothLaptopClassifier • There are a variety of models to be learned (individual * activity) • Data are coming incrementally while we don’t want to transmit everything to server 48
  • 196. Transfer Learning • Example: the knowledge for recognizing an airplane may be helpful for recognizing a bird. • Improving a learning task via incorporating knowledge from learning tasks in other domains with different feature space and data distribution. • Reduces expensive data-labeling efforts [Pan & Yang 2010] 2016/12/17 49 • Approach categories: (1) Instance-transfer (2) Feature-representation-transfer (3) Parameter-transfer (4) Relational-knowledge-transfer
  • 197. Distributed Learning • Perform machine learning on multiple machines 2016/12/17 50 Computation Traditional Parallel Computing Distributed Learning # of machines Few (10~100), communication cost can be ignored Many (>1000), communication cost can be the bottleneck Computational power Powerful (cluster), dedicated Ordinal (mobile), cannot be dedicated Memory Large Small Management Strong control Weak control
  • 198. Deep Learning (learning representation of data) Supervised Learning (Classification & Regression) Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 2016/12/17 51Probabilistic Graphical Model
  • 199. • Multiple processing stages in the visual cortex (LeCun & Ranzato): • Inspired by human brains, deep learning aims to learn multiple layers of representation from data, with increasing level of abstraction Deep Learning • Human brains perform much better than computers at recognizing natural patterns • The brains learn to extract many layers of features • Features in one layer are combined to form higher-level features in the layer above (Hinton) Retina LGN V1 V2 V4 PIT AIT … (Picture from LeCun & Ranzato) Convolutional Network 2016/12/17 52
  • 201. Moravec‘s Paradox • 對人類而言很簡單的事情,對電腦而言可能很難; • 肢體與眼協調(如爬樓梯,踢足球) • 瞭解幽默與嘲諷 • 人臉辨識 • 對語意的深層瞭解 • 對人類很難的事情,對電腦而言可能很簡單。 • 在大量資料中搜尋與分析 • 下棋 • 複雜的邏輯推論
  • 202. 停留在搜尋階段的人工智慧 • 在人工智慧網路交友的網站上,一個女生開出徵婚 條件有兩點:1.要帥 2.要有車,輸入電腦去幫她搜 尋,結果出現︰ • 象棋!! • 她看了不太滿意,於是改輸入:1.要高 2.要有房子, 電腦跑一跑,結果出現︰ • 台北101 • 她又改輸入︰1.要MAN 2.要有安全感,電腦回她︰ • 超人!! • 她不死心,最後把以上所有條件都打進去︰電腦回 她 • 在101下象棋的超人!!
  • 203. AI的未來兩大研究走向 • 達成更多超越人類的成就 • 更精準的預測 • 更正確的決策 • 達成原本對人類而言就很容易的任務 • Computer vision • Robotics • Natural Language Understanding
  • 204. From Learning to Discovery • Learning:”The act, process, or experience of gaining knowledge”. • 學習是正常人在生活中都能體驗的經驗 • Discovery: “To obtain knowledge or awareness of something not known before.” • 發現或是發明是特殊人群在特殊時空下的產物如:愛迪 生,牛頓,福爾摩斯。 12/17/201657
  • 205. Why Machine Discovery? • Def: Exploit machines to perform or assist human beings to perform discovery • Benefits: • High reward • Tolerates low precision and recall • Challenge (why it is more difficult than ML?): • No standard solution available • No labeled data available 2016/12/17 58Shou-De Lin
  • 206. Two Types of Discovery • Machine as a scientist • Machine as a detective
  • 207. Machine as Scientists As mathmatician: Graph Theory(GRAFFITI,1990 - , Fajtlowicz) • Made conjectures in Graph Theory such as chromatic_number(G) + radius(G) =< max_degree(G) + frequency_of_max_degree(G) • Discovered > 800 conjectures, which have led to 60 math papers and 1 thesis publications to proof or disproof them As physicists: ASTRA (1998 - , Kocabas and Langley) • suggested novel and interesting fusion reactions, and generated reaction pathways for helium, carbon, and oxygen that do not appear in the scientific literature As astronomers: SKICAT (1995 - , Kennefick) • aiming at detecting quasars in digital sky surveys, and has found 5 new quasars. 602016/12/17 Shou-De Lin
  • 209. As Poet: 電腦創作中國古典詩 故人 思鄉 回文詩 千里行路難 何處不自由 行人路道路人行 平生不可攀 故土木蘭舟 別有心人心有別 相逢無人見 落日一回首 寒風幾度幾風寒 與君三十年 孤城不可留
  • 210. Two Types of Discovery • Machine as scientists • Machine as detectives
  • 211. 64 Abnormal Instance Discovery From Social Networks 2 Global global node discovery: Finding the most abnormal node(s) in the network Local abnormal node discovery: Finding the most abnormal node(s) connected to a given node 1 1 1 2 Interesting feature discovery: Finding the most interesting or unique features (e.g. paths) between two nodes or w.r.t a node
  • 212. Novel Topic Diffusion Discovery (ACL 2012) • Predicting diffusions for novel topic • Predict diffusions of “iPhone 6” before any such diffusion occurs using • Links of existing topics with contents • Keywords of novel topics • Intuition of our approach • We do have the diffusion data of other topics • Finding the connection between existing topics and novel topics using a LDA-based approach • Exploiting such connection for link prediction 6512/17/2016 u1 u2 u3 HTC HTC u4 HTC 101 HTC iPad iPad u1 u2 u3 iP6 iP6 u4 iP6 ? iP6 ? iP6 ? ? ? PTT iP6 Keywords for iP6
  • 213. Unseen Links Discovery (KDD 2013) • Individual opinion (ex. customer’s preference) is valuable • sometimes concealed due to privacy (ex. Foursquare “like”) • Fortunately, aggregative statistics (total count) is usually available • Predict unlabeled relationship (unseen-type link) using • Heterogeneous social network • Attributes of nodes • Aggregative statistics 6612/17/2016 u1 u2 r2 c2 be-friend-of own belong-to like ? User Item Category 1r1 r3 c1 02 au1 au2 ar3ar2 ar1 ac1 ac2
  • 214. 𝑹1 Target Rating Matrix ♫ ♫ ♫ Matching Users and Items for Transfer Learning in Collaborative Filtering (KDD2014) 2013/7/30 67  Assumptions:  Two rating matrices  Modeling the same kind of preference  Certain portion of overlap in users and items  E.g. Using Blockbuster’s record to enhance the performance of a NetFlix System  Obstacle: the anonymity of users and items 𝑹2 An Available Rating Dataset ♫♫♫
  • 217. 人工智慧革命 Machine will take over some non-trivial jobs that are used to be conducted by human beings • Some trivial jobs will be safe  70
  • 218. 決策角色的翻轉 • The decision makers will gradually shift from humans to machines 71科學人2015
  • 219. 從機器學習到機器發明 • 人類智慧的演進 : memorizing learning discovering • 不久的將來,許多科學新知將會由AI發現! Machine Machine Machine 72