Linked Open Data-enabled Strategies for Top-N Recommendations - Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro - 1st Workshop on New Trends in Content-based Recommender Systems, co-located with ACM Recommender Systems 2014
Linked Open Data-enabled Strategies for Top-N Recommendations
1. CBRecSys 2014
Workshop on New Trends in
Content-based Recommender Systems
Foster City (CA, United States)
October 6, 2014
Linked Open Data-enabled
Strategies for Top-N
Recommendations
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
2. Outline
• Background
• Content-based RecSys (CBRS)
• Limitations
• Linked Open Data
• What?
• Introducing LOD in CBRS
• Experiments
• Conclusions
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
2 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
3. Content-based Recommender Systems
Suggest items similar to those the user liked in the past
(I bought Converse shoes, I’ll continue buying similar sport shoes)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
3 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
4. Content-based Recommender Systems
Limitations
Limited content
4
(in several domains)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
5. Content-based Recommender Systems
Limitations
Poor Semantics
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
5 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
6. How can we boost
Content-based
Recommender Systems
with Semantics?
(and with more content)
6
Problem
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
7. 7
Semantics in CBRS State of the art
Ontologies X
Folksonomies Distributional Semantics
Encyclopedic Knowledge Linked Open Data
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
8. 8
Top-down approaches
What is the difference?
X
Formal Semantics Large-scale
Folksonomies X X
Ontologies V X
Encyclopedic Knowledge X V
Distributional Semantics X V
Linked Open Data V V
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
9. 9
Top-down approaches
What is the difference?
X
Formal Semantics Large-scale
Folksonomies X X
Ontologies V X
Encyclopedic Knowledge X V
Distributional Semantics X V
Linked Open Data V V
Linked Open Data merge the vastness of encyclopedic knowledge
with the formal semantics typical of ontologies
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
10. 10
Top-down approaches
What is the difference?
X
We focus on the introduction of
Formal Semantics Large-scale
Folksonomies X X
Linked Open Data in
Ontologies V X
Content-based Recommender
Encyclopedic Knowledge X V
Systems
Distributional Semantics X V
Linked Open Data V V
Linked Open Data merge the vastness of encyclopedic knowledge
with the formal semantics typical of ontologies
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
11. 11
Linked Open Data
What are we talking about?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
12. 12
Linked Open Data
Definition
Methodology to publish, share and link
structured data on the Web
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
13. 13
Linked Open Data (cloud)
What is it?
A (large) set of interconnected semantic datasets
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
14. 14
Linked Open Data (cloud)
What kind of datasets?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
15. 15
Linked Open Data (cloud)
DBpedia
http://dbpedia.org
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
16. 16
Linked Open Data (cloud)
http://dbpedia.org
DBpedia
DBpedia is the structured mapping of Wikipedia
It is the core of the LOD cloud.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
17. 17
Linked Open Data (cloud)
Example: unstructured content from Wikipedia
example
“Foster City is a town in United States located in California”
(from Wikipedia page)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
18. 18
Linked Open Data (cloud)
How are these data represented?
Semantic Web cake
Information from the
LOD cloud is
represented in RDF
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
19. “Foster City is a town in United States located in California”
19
Linked Open Data (cloud)
How are these data represented?
Foster City United States
http://dbpedia.org/resource/United_States
California
http://dbpedia.org/resource/Foster_City,_California
http://dbpedia.org/resource/California
dbpedia-owl:country
dbpedia-owl:isPartOf
example
(from Wikipedia page)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
20. “Foster City is a town in United States located in California”
20
Linked Open Data (cloud)
How are these data represented?
Data coming from the LOD cloud have a
formal semantics represented in RDF
Foster City United States
http://dbpedia.org/resource/United_States
California
http://dbpedia.org/resource/Foster_City,_California
http://dbpedia.org/resource/California
dbpedia-owl:country
dbpedia-owl:isPartOf
example
(from Wikipedia page)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
21. 21
Our checklist
Can Linked Open Data boost
content-based recommender systems?
More Semantics More Content
V ?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
22. 22
Linked Open Data (cloud)
How many data?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
23. 23
Linked Open Data (cloud)
How many data?
1048 datasets and 58 billions triples
source: http://stats.lod2.eu
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
24. 24
Our checklist
Can Linked Open Data boost
content-based recommender systems?
More Semantics More Content
V V
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
25. 25
Our checklist
Can Linked Open Data boost
content-based recommender systems?
More Semantics More Content
V V
…but
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
26. 26
Research Question
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
27. 27
Approach
We propose two methodologies to
introduce LOD-based features into CBRS
Direct Access to DBpedia Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
28. Introducing LOD-based features in CBRS
28
Methodology :: Direct Access to DBpedia
(We assume that each item to be recommender is already in the LOD cloud)
The simplest way to introduce LOD-based features
Domain-dependent features
are manually defined
1.
2.
(e.g. book recommendation —> genre, author, publisher, subject, etc.)
SPARQL queries extract features’ values
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
29. Introducing LOD-based features in CBRS
Example: The Great and Secret Show (Clive Barker’s book)
29
Methodology :: Direct Access to DBpedia
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
30. Introducing LOD-based features in CBRS
30
Methodology :: Direct Access to DBpedia
e.g. Book Recommendation: author, genre, publisher, subject
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
31. Introducing LOD-based features in CBRS
31
Methodology :: Direct Access to DBpedia
Each item is represented through the set of the (manually defined)
features extracted from the LOD cloud.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
32. Introducing LOD-based features in CBRS
32
Methodology :: Direct Access to DBpedia
9 LOD-based features: author (Clive Barker), genre (Fantasy Literature), publisher (William
Collins), series (Books of the Art), subject (1980s fantasy novels, William Collins books,
Novels by Clive Barker, British Fantasy Novels)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
33. 33
Direct Access to DBpedia
Analysis
- Very Straightforward approach
- SPARQL queries can be easily built
- Properties are manually defined
- Approach is strongly domain-dependent
- Does not exploit unstructured information
Pros:
Cons:
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
34. Introducing LOD-based features in CBRS
Methodology :: Entity Linking algorithms
• Entity Linking Algorithms!
• Input: free text.
• items description, in our setting
• Output: identification of the most
relevant entities mentioned in the text.
• State of the art
• tag.me(1),
• DBpedia Spotlight(2),
• Wikipedia Miner(3)
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
(3) http://wikipedia-miner.cms.waikato.ac.nz
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 34
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
35. Introducing LOD-based features in CBRS
Methodology :: Entity Linking algorithms
• Entity Linking Algorithms!
• Input: free text.
• items description, in our setting
• Output: identification of the most
relevant entities mentioned in the text.
• State of the art
• tag.me(1),
• DBpedia Spotlight(2),
• Wikipedia Miner(3)
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
(3) http://wikipedia-miner.cms.waikato.ac.nz
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 35
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
36. Introducing LOD-based features in CBRS
36
Methodology :: Entity Linking algorithms
• Entity Linking Algorithms!
• Input: free text.
• in this setting: textual
description of the items (e.g.
Wikipedia abstract)
• Output: identification of
the most relevant entities
mentioned in the text.
from Tagme
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
37. Introducing LOD-based features in CBRS
Entity Linking - output
37
Methodology :: Entity Linking algorithms
Very human-readable representation!
Free n-grams and entity recognition, free sense disambiguation
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
38. Introducing LOD-based features in CBRS
Entity Linking - output
not a simple textual feature!
Each entity is a reference to a DBpedia node
http://dbpedia.org/resource/Harry_D'Amour
38
Methodology :: Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
39. Introducing LOD-based features in CBRS
Methodology :: Entity Linking algorithms
LOD-based representation can be enriched!
through broader categories by exploiting SPARQL queries
39
encoded in the dcterms:subject property
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
40. Introducing LOD-based features in CBRS
The final
representation of
each item is
obtained by
merging the
DBpedia nodes
identified in the
text with those the
dcterms:subjects
property refers to
(broader categories)
dbpedia nodes+
broader categories
Features =
40
Methodology :: Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
41. 41
Entity Linking Algorithms
Analysis
Pros:
Cons:
- Exploit unstructured information
- Very general approach
- May introduce unexpected
(but relevant) features
- Strong features engineering (which
ones are the best?)
- Threshold score of Entity Linking
algorithms is difficult to be set
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
42. 42
LOD-based features in CBRS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
43. Experimental Evaluation
Research Hypothesis
43
1. Which is the contribution of
the Linked Open Data features
to the accuracy of
recommendation algorithms?
2. Does the representation based
on Linked Open Data outperform
existing state-of-the-art
recommendation algorithms?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
44. Experimental Evaluation
Description of the dataset
44
• Book recommendation • ESWC 2014 Challenge
Dataset (*) • 6,733 books • 6,181 users • 72,372 binary ratings
• 11.71 ratings/user • Very sparse dataset!
• Only 5.37 positive
ratings/user! (*) http://challenges.2014.eswc-conferences.org/index.php/RecSys
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
45. Experimental Evaluation
Feature combinations
45
• Content (crawled from Wikipedia + NLP processing)
• LOD (direct access to DBpedia)
• Entity Linking (Tagme)
• Content + LOD
• Content + Entity Linking
• LOD + Entity Linking
• All
7 combinations
for each run
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
46. Experimental Evaluation
Setup
46
• Evaluation of the effectiveness of LOD-based
features on varying six different
recommendation algorithms
• Vector Space Models
• VSM • BM25 • eVSM (*) • Classifiers
• Random Forests • Linear Regression • Graph-based Approaches
• PageRank with Priors
(*) C. Musto: Enhanced vector space
models for content-based recommender
systems. RecSys 2010: 361-364
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
47. Experimental Evaluation
Design of the Experiment :: Vector Space Models
47
User profile (built upon the
features describing the items the
user liked) used as query
Cosine Similarity to
get the most similar items
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
48. Experimental Evaluation
Design of the Experiment :: Classifiers
48
Random Forests learn a
classification model which is used to
predict the class (positive/negative)
of unlabeled item.!
Model is based! on the features
coming from labeled items.
Linear Regression also uses
“basic” features (e.g. positive and
negative ratings, average rating of the
user, ratio between positive and
negative ratings, etc.) to learn the
model.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
49. Experimental Evaluation
Design of the Experiment :: PageRank with Priors (PRP)
graph-based representation
users, items = nodes positive feedback = edges
PageRank calculates the ‘importance’ of a node according to the
quality and the number of its connections
Equal probability is assigned to all the nodes, by default
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 49
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
50. Experimental Evaluation
Design of the Experiment :: PageRank with Priors (PRP)
graph-based representation
users, items = nodes positive feedback = edges
PageRank calculates the ‘importance’ of a node according to the
quality and the number of its connections
PageRank with Priors introduces a bias towards some nodes !
(in our setting, the items the user liked)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 50
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
51. Experimental Evaluation
Design of the Experiment :: PageRank with Priors (PRP)
51
Several strategies to build the graph are compared
1. no-LOD.
Graph only models
users and items
2. small-LOD. Graph
expanded with new nodes
by adding basic
properties (subject,
genre, publisher, author,
etc.), of the items as well
as their relationships
3. big-LOD. Graph is
further expanded by
introducing more nodes (e.g.
other resources of the same
genre, other resources
written by the authors, etc.),
as well as their relationships
Rationale: the introduction of new nodes and
connections coming from the LOD cloud can
improve the effectiveness of the PageRank.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
52. Experimental Evaluation
Design of the Experiment :: PageRank with Priors (PRP)
52
Several strategies to build the graph are compared
1. no-LOD.
Graph only models
users and items
2. small-LOD. Graph
expanded with new nodes
by adding basic
properties (subject,
genre, publisher, author,
etc.), of the items as well
as their relationships
3. big-LOD. Graph is
further expanded by
introducing more nodes (e.g.
other resources of the same
genre, other resources
written by the authors, etc.),
as well as their relationships
PRP is run and items in the test set are ranked
according to their PageRank
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
53. Experimental Evaluation
Recap
6 algorithms 7 set of features
• Content
• LOD
• Entity Linking
• Content + LOD
• Content + Entity Linking
• LOD + Entity Linking
• All
• VSM
• BM25
• eVSM
• Linear Regression
• Random Forests
• Page Rank With Priors
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 53
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
54. Experiment 1
54
Impact of LOD-based features.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
55. Impact of LOD-based features :: VECTOR SPACE MODEL
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
54,62
54,42
54,59
54,47
54,36
54,69
53,79
+0,17
+0,05
53 53,5 54 54,5 55
55
LOD-based features improve F1-measure
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
56. Impact of LOD-based features :: VECTOR SPACE MODEL
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
54,62
54,42
54,59
54,47
54,36
paired t-test (p<0.01)
54,69
53,79
+0,17
+0,05
53 53,5 54 54,5 55
56
Statistically significant improvement
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
57. Impact of LOD-based features :: VECTOR SPACE MODEL
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
54,62
54,42
54,59
54,47
+0,27
54,36
54,69
53,79
paired t-test (p<0.01)
53 53,5 54 54,5 55
57
Best: LOD+Entity Linking (No Content!)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
58. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
54,43
54,56
54,51
54,6
-1,00%
53,9
53,91
53,43
53 53,5 54 54,5 55
58
Impact of LOD-based features :: BM25
Worst (again): LOD alone
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
59. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
54,43
54,56
54,51
54,6
53,9
53,91
53,43
+0,17
paired t-test (p<0.01)
53 53,5 54 54,5 55
59
Impact of LOD-based features :: BM25
Best (again): LOD+Entity Linking (With Content!)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
60. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
52,9
53,07
52,8
53,04
53,02
paired t-test (p<0.01)
53,37
52,06
+0,47
+0,17
+0,14
+0,12
51 51,75 52,5 53,25 54
60
Impact of LOD-based features :: EVSM
Introduction of LOD-based features leads to an improvement again
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
61. Experiment 1
Impact of LOD-based features :: LESSONS LEARNED FOR VSMS
61
VSM BM25 eVSM
1.
2.
LOD features alone are always the worst
configuration.
(At least) a LOD-based representation
based on Entity Linking always
improve the content alone
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
62. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53,86
Experiment 1
53,68
53,75
53,76
53,77
53,34
53,52
+0,36
53 53,25 53,5 53,75 54
62
Impact of LOD-based features :: RANDOM FORESTS
Similar outcomes: all but LOD alone lead to improvement
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
63. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53,86
Experiment 1
53,68
53,75
53,76
53,77
53,34
53,52
+0,36
53 53,25 53,5 53,75 54
63
Impact of LOD-based features :: RANDOM FORESTS
Content does matter: LOD+entity+content is the best
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
64. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
55,59
55,59
55,67
55,64
55,61
+0,08
55,5
55,57
paired t-test (p<0.01)
55 55,25 55,5 55,75 56
64
Impact of LOD-based features :: LINEAR REGRESSION
Entity-based representation is the best one
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
65. CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
Experiment 1
55,59
55,59
55,67
55,64
55,61
+0,08
55,5
55,57
paired t-test (p<0.01)
55 55,25 55,5 55,75 56
65
Impact of LOD-based features :: LINEAR REGRESSION
BTW, smaller improvements (due to basic features?)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
66. Experiment 1
Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS
66
RF LR
1.
2.
LOD features alone never overcome the
content
(At least) a LOD-based representation
based on Entity Linking always
improve the content alone
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
67. Experiment 1
Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS
67
Same LR outcomes
RF
(algorithm-independent behaviour)
1.
2.
LOD features alone never overcome the
content
(At least) a LOD-based representation
based on Entity Linking always
improve the content alone
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
68. Experiment 1
Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS
68
Same LR outcomes
RF
(algorithm-independent behaviour)
1.
2.
LOD features alone never overcome the
content
(At least) a LOD-based representation
based on Entity Linking always
improve the content alone
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
69. Experiment 1
Impact of LOD-based features :: PAGERANK WITH PRIORS
+0,45
55,44
54,73
54,28
+1,16
paired t-test (p<0.001)
53 54 55 56 57
69
NO-LOD
SMALL-LOD
BIG-LOD
The more LOD-based data, the best the accuracy
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
70. Impact of LOD-based features :: PAGERANK WITH PRIORS
NO-LOD
SMALL-LOD
BIG-LOD
Experiment 1
55,44
54,73
54,28
53 54 55 56 57
Drawback: more nodes produce an exponential growth of
computational costs (from 3 hours to 120 hours to run the experiment!)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
70
+0,45
+1,16
paired t-test (p<0.001)
71. [*] V. Ostuni, T. Di Noia, E. Di Sciascio, R. Mirizzi: Top-N recommendations
from implicit feedback leveraging Linked Open Data. RECSYS 2013
[+] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme: BPR:
Bayesian Personalized Ranking from Implicit Feedback. UAI 2009.
Experiment 2
71
Comparison to State of the art
SPRANK (Semantic Path Ranking)[*]
BPRMF (Bayesian Personalized Ranking) [+]
U2U_CF (User to User CF)
I2I_CF (Item to Item CF)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
72. VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
Experiment 2
52,27
52,28
52,24
54,12
55,67
55,44
54,69
baselines
51 52,25 53,5 54,75 56
Our best-performing configurations are considered as baseline
72
Comparison to state of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
73. VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
Experiment 2
52,27
52,28
52,24
54,12
55,67
55,44
54,69
51 52,25 53,5 54,75 56
Classical CF techniques poorly performs (sparsity?)
73
Comparison to state of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
74. VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
Experiment 2
52,27
52,28
52,24
54,12
55,67
55,44
54,69
!
-3,4%
51 52,25 53,5 54,75 56
74
Comparison to state of the art
+3,4% over LOD-based state of the art algorithm
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
75. VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
Experiment 2
52,27
52,28
52,24
54,12
+0,57
55,67
55,44
54,69
+1,55
51 52,25 53,5 54,75 56
75
Comparison to state of the art
Our approaches overcome Matrix Factorization
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,32
76. Conclusions
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 76
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
77. Lessons Learned
INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA IN
Two Solutions have been proposed.!
Direct Access to DBpedia and Entity Linking Algorithms!
!
Evaluation.!
Research Question: What is the impact of LOD-based features on
VSM, Classifiers and Graph-based Algorithms?!
All recommendation approaches significantly benefit of the
introduction of LOD-based features!
Our best-performing configurations overcomes both collaborative
and LOD-based state of the art algorithms
77
CONTENT-BASED RECOMMENDATION TASKS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
78. Future Research
78
Evaluation against different datasets and
stronger baselines;
Better (automatic) tuning of parameters and
integration of more LOD-based datasources
Evaluation of Novelty, Diversity and
Serendipity on LOD-based
Recommendations;
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis.
Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014