10. Our approach: SLADE
• Semantic LAyer for Data Exploration
• A framework to build data-driven apps
• ETL from existing sources / APIs
• Search, discovery, recommendations
• Data access / API
• Generic, config-based, domain-agnostic
11. The pipeline
Data-extraction
and
interlinking
Entity-centric semantic knowledge base
Web data sources (artists, genres, labels, locations...)
Storage
REST-ful interface
Search, discovery and recommendation
seevl products engine, on-top of our graph-database
12. Challenges
• Some technical challenges faced when building
SLADE and seevl.net
• Data models: Chosing the right schemas
• Data access: SPARQL or API or ... ?
• Scalability: Caching and optimisation strategies
• User Experience: User-centric design
14. RDF since day one
• RDF ?
• Agile model (ideal when iterating)
• Intuitive aspect of graph modelling
• Standard toolkits (SPARQL / HTTP)
• OWL? RDFS?
• Minor use of inference (type, hierarchies)
15. Artist data
• Music Ontology
• Label, Genres, Influences,Origins ...
• Collaborations between artists
• Activity period (add-on)
• Additional models/mappings
• e.g. Bio Vocabulary (birth/death), FOAF...
16.
17. Social activities
• SIOC & SIOC-actions
• Social graph / sub-graph
• Action-centric activities (like, listen)
• Inferring user’s taste profile
• Top artist, genres, labels
• Using latest actions
21. Provenance
• Keep trace of every statement in the ETL
• Origin, type and time of extraction
• With a low number of additional triples
• Introducing “data-slices”
• Multiple slices (=subgraphs) per resource
• Quick updates (DELETE / INSERT)
24. SPARQL
• Pros
• W3C Standard, Powerful
• HTTP-based w/ SPARQL Protocol
• SPARQL Update in 1.1
• Cons
• Learning curve for non-RDF people
25. URI patterns + JSON-LD
• Pre-defined URIs mapped to SPARQL
query patterns, returning JSON-LD data
• Search queries or resources description
• Content-negotiation or ?_format=json
• GET and POST
• POST => SPARQL UPDATE
• GET => SPARQL SELECT / ASK
26. JSON-LD
• JSON for Linking Data
• The best of both worlds
• JSON serialization, works with any parser
• Additional semantics (URIs, typed links,
etc.) with JSON-LD parsers
• Use of context/mappings to avoid URIs
27. Search
• /entity/?property=value
• JSON-LD mappings used in URI templates
• Works with literals, dates, resources
• Ranking algorithm / alpha-ranking
• Patterns defined in a single config file
28. Search (text)
• /entity/?
prefLabel=clash&type=artist&_sort=count_desc
• Translated into
SELECT ?x WHERE {
?x a mo:artist ; skos:prefLabel ?x .
?x bif:contains “clash” .
}
33. Resource description
• Patterns mapped to resource URI to
retrieve subset of the resource description
• /entity/seevl_id/infos
• /entity/seevl_id/facts
• /entity/seevl_id/links
• /entity/seevl_id/related(/related_id)
37. Is SPARQL fast enough?
• SPARQL is very powerful, but can be slow
• Some simple queries may lead to deep
graph patterns or transversal queries
depending on the modelling
• FILTERS (e.g. text and date based queries)
are expensive
• Not all triple-stores are equal
38. Splitting queries
• “List all resource sharing common
property-values with the current one,
whatever that property is”
• Fits in a single SPARQL query
• Doesn’t properly scale
• Becoming faster when splitting the query
and recomposing results via internal scripts
39. SPARQL: splitting queries
Direct SPARQL Property-slicing Complete-slicing
Queries Time Queries Time Queries Time
Ramones 1 139.97 20 109.51 66 37.84
Johnny Cash 1 257.81 30 152.60 135 75.35
U2 1 155.53 22 122.91 70 44.03
The Clash 1 146.43 20 110.84 79 42.61
Bad Religion 1 104.08 23 86.49 97 47.35
The Aggrolites 1 145.92 13 114.52 28 28.33
Janis Joplin 1 230.88 27 151.00 98 62.81
40. SPARQL + Redis
• Started by using Memcache to store query
results (e.g. “?x genre $y”)
• Good, but costly for the first user
• Then, materialising results in-memory using
Redis as a key-value cache system
• Low indexing time (few minute on laptop)
• Increasing query-performance, real-time
41. SPARQL + Redis
• Redis
• HSET to define entities (minimal data)
• ZADD to store ordered sets of key-
values, with our own ranking scheme
• ZRANGE to retreive w/ correct order
• Everything in memory, instant query results
44. User-experience
• Interfaces for graph-based/semantic data
• Don’t need to be ugly!
• As long as they’re built for users first
• Focus on vertical-UX, rather than SemWeb-UX
• Check best practices in the domain
• Involve HCI / non-SemWeb people
46. Lessons learnt
• Don’t reinvent the wheel, check existing
stacks and use what fits for the job
• Make it simple for your developers, using
REST-ful interfaces and design patterns
• Accept compromises, be pragmatic
• This of users / create persona who are not
SemWeb-geeks when designing the UX