SlideShare ist ein Scribd-Unternehmen logo
1 von 47
dbrec
Music recommendations using DBpedia
         Alexandre Passant - DERI, NUI Galway
                  In-Use Track @ ISWC2010
             11th November 2010, Shanghai, China
Good news, it doesn’t fit
anymore in a slide !

Many producers, only a
few consumers (besides
search engines): BBC,
Drupal ,,,
Agenda

• Semantic Distance over Linked Data
• dbrec - architecture, dataset and UI
• Evaluation
• Lessons learnt
• Next steps and conclusion
Semantic Distance
Semantic Distance over
    Linked Data
• Relying only on links
• Relying only on instance data
• Using dereferencable URIs
 • And using resources following the LD
    principles
Linked Data
Linked Data
              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
G = (R, L, I)
                                         e:l1

                           e:r1   e:l1          e:r2

• R = {r , r , ..., r }
          1    2       n
                                    e:l2

• L = {l , l , ..., l }
         1 2       n       e:l2   e:l3          e:l3


• I = {i , i , ..., i }
         1 2       n

                           e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
LDSD
The LDSD ontology




                Our own ontology, but
                could map with MuSim
                in the future
dbrec
At a glance
• A system providing recommendations for all
  DBpedia bands and artists (±40K) using LDSD
    • And explaining its recommendations
    • Both using Linked Data and Semantic
      Web standards (RDF, SPARQL)
• Integrating related Web data for an improved
  user-experience
Architecture
                (2) Dataset reducing




 (1) Dataset                       (3) LDSD                 (4) User
identification                     computation              interface

                    RDF Data                    RDF Data
Dataset
•   Retrieving all artists and bands in DBpedia (±40K)
    •   Including incoming / outcoming links
    •   Approximately 3M triples
•   Removing datatype properties
    •   2.2M (75%)
•   Merging /ontology and /property
    •   1.7M (55%)
Distribution




               20K+ artists (50%) are
               not linked to any other
               artist
Curation
• 118 properties linking artists together
  • 18 mis-used, 35 wrongly defined (e.g.
    dbprop:klfsgProperty)
• 578 properties linking artist to resources
  • 183 used only once, 36 wrongly defined
• 767 properties linking resources to artists
  • 336 used only once, 115 wrongly defined
• Dataset reduced to 1M triples
Computing distance
• 9,797 minutes
                 Done for all artists in
                 DBpedia
                                                Artist    Time (sec.)
                                              Ramones       25.20
  • 2 x AMD Opteron 250                     Johnny Cash     61.16
    4GB Ubuntu 8.10                              U2         50.06

• 50M triples                                The Clash
                                            Bad Religion
                                                            43.34
                                                            34.98
  • Modelled using the                     The Aggrolites    7.35
    LDSD ontology                            Janis Joplin   23.12
Artist        Distance
   Elvis Presley      0.0978
June Carter Cash      0.1056
  Willie Nelson       0.1322
Kris Kristofferson    0.1407
    Bob Dylan         0.1466
  Marty Robbins       0.1673
  Rosanne Cash        0.1782
 Charlie McCoy        0.1836
   Gene Autry         0.1910
    Carl Smith        0.1980
User interface
Sorry, slideshare people,
that’s a movie so you
won’t be able to see it !
Evaluation
Evaluation settings
• Off-line and on-line user evaluation
 • Using common RecSys metrics
• 10 subjects
 • 2 women, 8 men
 • 24 to 34 years old
 • 35 to 55 minutes per interview, F2F
Metrics
•   Off-line evaluation - comparison with last.fm
    •   5 artists / bands
    •   2 blind list, 10 ranked recommendations per list
    •   Marks from 1 to 5
•   On-line recommendation - dbrec only
    •   5 artists / bands
    •   Browsing recommendations using dbrec
    •   Marks from 1 to 5, plus observations and interviews
dbrec vs last.fm

• Average mark of recommendations
 • 3.37(±1.19)
 • 3.44(±1.25) w/ on-line
 • 3.69(±1.01) for last.fm
Results for the precision
(t=X means items are
                                Precision
relevant if ranked X or
more)

Cannot compute recall

                                 dbrec           dbrec
(implies users know all
bands in the system)
                                                             last.fm
                                (off-line)   (off+on-line)
                          t=2    92.05          90.59        98.32

                          t=3    76.63          77.72        87.91

                          t=4    49.06          51.23        58.05

                          t=5    20.09            25         25.165
Novel recommendations
• Lots of unknown recommendations
 • 62% for dbrec (59.6% w/ on-line)
 • 40.4% for last.fm
 • But that’s a good news !
• Evaluated 274 of them on dbrec
 • 3.05(± 1.09)
Observations
• Explanations for unknown bands
 • Checked for 198 / 310
• But also for known ones
 • 24 / 190
• Helped to understand the recommendation
 • Even if they already knew the band
Interviews
              User-interface Explanations
 Enjoyable          9             7
  Useful            9             9
 Enriching         8             10
Easy to use        10             9
 Confusing         0              2
Complicated        0              2
 Too geeky         1              6
Lessons learnt
Data quality
• Issues with DBpedia properties
  • Misused : dbprop:notableInstruments
  • Wrongly defined : dbprop:klfsgProperty
  • Duplicates : /ontology versus /property
• Requires data curation !
  • Automated and manual
Use, but replicate
• More and more public SPARQL endpoints
 • Often limited to X max results
 • 5,000 on DBpedia              But, that’s fair enough.

                                 Hosting a SPARQL
                                 endpoint is costly and


• Difficult to use in production
                                 opening-it up fully to
                                 anyone would require lots
                                 of maintenance, etc.



 • Requires local replica
 • But implies synchronisation !
Use, but replicate
SELECT ?label
WHERE {
    ?x rdfs:label ?label .
    { ?x a dbpedia:MusicArtist }
    UNION
    { ?x a dbpedia:Band }
}
Use, but replicate

• Names of all DBpedia artists
 • Get number of results w/ COUNT
 • Run n/5000 queries (LIMIT + OFFSET)
 • Recompose results         The query had more than
                             40K results, since most
                             artists got their names


• Network errors, etc.
                             using different
                             languages.

                             So much more than 8
                             queries
SPARQL, Be quick or be neat
   • “List all artists / bands sharing common
     property-values with the current one”
     • Fits in a single SPARQL query
     • But does not scale
   • “Optimisation” has to be done manually by
     splitting the query and recomposing results
     using an external script
SPARQL, Be quick or be neat
                                                                  Tests done in the local
                                                                  RDF store

                                                                  1: full-query
                                                                  2: split by property
                                                                  3: split by property-
                                                                  object

                                                                  Up to 75% faster

                   Direct SPARQL       Property-slicing      Complete-slicing
                 Queries     Time    Queries       Time    Queries           Time
  Ramones          1        139.97     20         109.51     66              37.84
 Johnny Cash       1        257.81     30         152.60    135              75.35
     U2            1        155.53     22         122.91     70              44.03
  The Clash        1        146.43     20         110.84     79              42.61
 Bad Religion      1        104.08     23          86.49     97              47.35
The Aggrolites     1        145.92     13         114.52     28              28.33
 Janis Joplin      1        230.88     27         151.00     98              62.81
Next steps
Next steps
•   Other data sources
    •   FreeBase, MusicBrainz, etc.
•   Distance improvement
    •   Propagation, feature selection, etc.
•   User Interface
    •   User-friendly explanations
•   LOD-compliance
    •   Mapping with other ontologies, SPARQL endpoint
Conclusion
• Defined and applied a Semantic Distance
  measure to Linked Data
• Used it to build a end-user music
  recommender system, with ±40K artists
• Evaluated it using RecSys metrics
• Learnt several domain-independent lessons
  regarding LOD consumption
Questions ?
Contact:
alexandre.passant@deri.org - http://apassant.net - @terraces

                   Acknowledgements:
   Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2)

                       References:
    AIII Spring Symposium 2010 - LinkedAI Symposium
                 ESWC2010 - Demo Track
                  ISWC2010 - In-Use Track
Pictures credits
•   http://flickr.com/photos/yumlog2/20896759/ by yuki*

•   http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch

•   http://flickr.com/photos/loungerie/2196866243/ by loungerie

•   http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor

•   http://flickr.com/photos/homer4k/461407380/ by homer4k

•   http://flickr.com/photos/jpellgen/2390204986/ by jpellgen

•   http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee

•   http://flickr.com/photos/28509009@N03/2668650475/ by marcreis

•   http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone

Weitere ähnliche Inhalte

Ähnlich wie Dbrec - Music recommendations using DBpedia

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newchizhangufl
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksLucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009Jose Quesada
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithmsDuyhai Doan
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonHakka Labs
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 

Ähnlich wie Dbrec - Music recommendations using DBpedia (20)

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Digital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDFDigital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDF
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
 

Mehr von Alexandre Passant

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music DiscoveryAlexandre Passant
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discoveryAlexandre Passant
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Alexandre Passant
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talkAlexandre Passant
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebAlexandre Passant
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next GenerationAlexandre Passant
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Alexandre Passant
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticAlexandre Passant
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingAlexandre Passant
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsAlexandre Passant
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...Alexandre Passant
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebAlexandre Passant
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseAlexandre Passant
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeAlexandre Passant
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingAlexandre Passant
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Alexandre Passant
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesAlexandre Passant
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Alexandre Passant
 

Mehr von Alexandre Passant (20)

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discovery
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talk
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next Generation
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you
 
i-Semantics panel
i-Semantics paneli-Semantics panel
i-Semantics panel
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic Microblogging
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversations
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en Entreprise
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:store
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate Blogging
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
 
The Social Web
The Social WebThe Social Web
The Social Web
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online Communities
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0
 

Kürzlich hochgeladen

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Kürzlich hochgeladen (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Dbrec - Music recommendations using DBpedia

  • 1. dbrec Music recommendations using DBpedia Alexandre Passant - DERI, NUI Galway In-Use Track @ ISWC2010 11th November 2010, Shanghai, China
  • 2. Good news, it doesn’t fit anymore in a slide ! Many producers, only a few consumers (besides search engines): BBC, Drupal ,,,
  • 3.
  • 4. Agenda • Semantic Distance over Linked Data • dbrec - architecture, dataset and UI • Evaluation • Lessons learnt • Next steps and conclusion
  • 6. Semantic Distance over Linked Data • Relying only on links • Relying only on instance data • Using dereferencable URIs • And using resources following the LD principles
  • 8. Linked Data e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 9. G = (R, L, I) e:l1 e:r1 e:l1 e:r2 • R = {r , r , ..., r } 1 2 n e:l2 • L = {l , l , ..., l } 1 2 n e:l2 e:l3 e:l3 • I = {i , i , ..., i } 1 2 n e:r3 e:r4
  • 10. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 11. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 12. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 13. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 14. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 15. LDSD
  • 16. The LDSD ontology Our own ontology, but could map with MuSim in the future
  • 17. dbrec
  • 18. At a glance • A system providing recommendations for all DBpedia bands and artists (±40K) using LDSD • And explaining its recommendations • Both using Linked Data and Semantic Web standards (RDF, SPARQL) • Integrating related Web data for an improved user-experience
  • 19. Architecture (2) Dataset reducing (1) Dataset (3) LDSD (4) User identification computation interface RDF Data RDF Data
  • 20. Dataset • Retrieving all artists and bands in DBpedia (±40K) • Including incoming / outcoming links • Approximately 3M triples • Removing datatype properties • 2.2M (75%) • Merging /ontology and /property • 1.7M (55%)
  • 21. Distribution 20K+ artists (50%) are not linked to any other artist
  • 22. Curation • 118 properties linking artists together • 18 mis-used, 35 wrongly defined (e.g. dbprop:klfsgProperty) • 578 properties linking artist to resources • 183 used only once, 36 wrongly defined • 767 properties linking resources to artists • 336 used only once, 115 wrongly defined • Dataset reduced to 1M triples
  • 23. Computing distance • 9,797 minutes Done for all artists in DBpedia Artist Time (sec.) Ramones 25.20 • 2 x AMD Opteron 250 Johnny Cash 61.16 4GB Ubuntu 8.10 U2 50.06 • 50M triples The Clash Bad Religion 43.34 34.98 • Modelled using the The Aggrolites 7.35 LDSD ontology Janis Joplin 23.12
  • 24. Artist Distance Elvis Presley 0.0978 June Carter Cash 0.1056 Willie Nelson 0.1322 Kris Kristofferson 0.1407 Bob Dylan 0.1466 Marty Robbins 0.1673 Rosanne Cash 0.1782 Charlie McCoy 0.1836 Gene Autry 0.1910 Carl Smith 0.1980
  • 26. Sorry, slideshare people, that’s a movie so you won’t be able to see it !
  • 28. Evaluation settings • Off-line and on-line user evaluation • Using common RecSys metrics • 10 subjects • 2 women, 8 men • 24 to 34 years old • 35 to 55 minutes per interview, F2F
  • 29. Metrics • Off-line evaluation - comparison with last.fm • 5 artists / bands • 2 blind list, 10 ranked recommendations per list • Marks from 1 to 5 • On-line recommendation - dbrec only • 5 artists / bands • Browsing recommendations using dbrec • Marks from 1 to 5, plus observations and interviews
  • 30. dbrec vs last.fm • Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
  • 31. Results for the precision (t=X means items are Precision relevant if ranked X or more) Cannot compute recall dbrec dbrec (implies users know all bands in the system) last.fm (off-line) (off+on-line) t=2 92.05 90.59 98.32 t=3 76.63 77.72 87.91 t=4 49.06 51.23 58.05 t=5 20.09 25 25.165
  • 32. Novel recommendations • Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s a good news ! • Evaluated 274 of them on dbrec • 3.05(± 1.09)
  • 33. Observations • Explanations for unknown bands • Checked for 198 / 310 • But also for known ones • 24 / 190 • Helped to understand the recommendation • Even if they already knew the band
  • 34. Interviews User-interface Explanations Enjoyable 9 7 Useful 9 9 Enriching 8 10 Easy to use 10 9 Confusing 0 2 Complicated 0 2 Too geeky 1 6
  • 36. Data quality • Issues with DBpedia properties • Misused : dbprop:notableInstruments • Wrongly defined : dbprop:klfsgProperty • Duplicates : /ontology versus /property • Requires data curation ! • Automated and manual
  • 37. Use, but replicate • More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia But, that’s fair enough. Hosting a SPARQL endpoint is costly and • Difficult to use in production opening-it up fully to anyone would require lots of maintenance, etc. • Requires local replica • But implies synchronisation !
  • 38. Use, but replicate SELECT ?label WHERE { ?x rdfs:label ?label . { ?x a dbpedia:MusicArtist } UNION { ?x a dbpedia:Band } }
  • 39. Use, but replicate • Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • Recompose results The query had more than 40K results, since most artists got their names • Network errors, etc. using different languages. So much more than 8 queries
  • 40. SPARQL, Be quick or be neat • “List all artists / bands sharing common property-values with the current one” • Fits in a single SPARQL query • But does not scale • “Optimisation” has to be done manually by splitting the query and recomposing results using an external script
  • 41. SPARQL, Be quick or be neat Tests done in the local RDF store 1: full-query 2: split by property 3: split by property- object Up to 75% faster Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35 The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  • 43. Next steps • Other data sources • FreeBase, MusicBrainz, etc. • Distance improvement • Propagation, feature selection, etc. • User Interface • User-friendly explanations • LOD-compliance • Mapping with other ontologies, SPARQL endpoint
  • 44. Conclusion • Defined and applied a Semantic Distance measure to Linked Data • Used it to build a end-user music recommender system, with ±40K artists • Evaluated it using RecSys metrics • Learnt several domain-independent lessons regarding LOD consumption
  • 46. Contact: alexandre.passant@deri.org - http://apassant.net - @terraces Acknowledgements: Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2) References: AIII Spring Symposium 2010 - LinkedAI Symposium ESWC2010 - Demo Track ISWC2010 - In-Use Track
  • 47. Pictures credits • http://flickr.com/photos/yumlog2/20896759/ by yuki* • http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch • http://flickr.com/photos/loungerie/2196866243/ by loungerie • http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor • http://flickr.com/photos/homer4k/461407380/ by homer4k • http://flickr.com/photos/jpellgen/2390204986/ by jpellgen • http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee • http://flickr.com/photos/28509009@N03/2668650475/ by marcreis • http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone