Unleash Your Potential - Namagunga Girls Coding Club
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
1. Mapping,
Interlinking
and
Exposing
MusicBrainz
as
Linked
Data
1st
Interna*onal
Workshop
on
Seman*c
Music
and
Media
(SMAM2013)
Sydney,
Oct
21,
2013
Peter
Haase
2. What
this
talk
is
about
A
Linked
Data
Perspec=ve
worksOn
publishedTo
affiliation
affiliation (previous)
isAbout
builtWith
participatesIn
participatesIn
3. EUCLID:
EdUca=onal
Curriculum
for
the
usage
of
LinkedData
http://www.euclid-project.eu
Course
eBook
Other channels
@euclid_project
euclidproject
euclidproject
5. MusicBrainz
• MusicBrainz
is
an
open
music
encyclopedia
that
collects
music
metadata
and
makes
it
available
to
the
public.
• MusicBrainz
aims
to
be:
•
The
ul=mate
source
of
music
informa=on
by
allowing
anyone
to
contribute
and
releasing
the
data
under
open
licenses.
•
The
universal
lingua
franca
for
music
by
providing
a
reliable
and
unambiguous
form
of
music
iden*fica*on,
enabling
both
people
and
machines
to
have
meaningful
conversa*ons
about
music.
• Like
Wikipedia,
MusicBrainz
is
maintained
by
a
global
community
of
users
and
we
want
everyone
—
including
you
—
to
par*cipate
and
contribute.
• MusicBrainz
is
operated
by
the
MetaBrainz
Founda*on,
dedicated
to
keeping
MusicBrainz
free
and
open
source.
6. LD
Dataset
Access
Publishing
Rela=onal
Databases
as
RDF:
W3C
RDB2RDF
SPARQL
Endpoint
Publishing
Integrated
Data
in
Triplestore
Vocabulary
Mapping
Interlinking
R2RML
Engine
Cleansing
Task:
Publish
data
from
rela*onal
DBMS
as
Linked
Data
Approach:
map
from
rela*onal
schema
to
seman*c
vocabulary
with
R2RML
Publishing:
two
alterna*ves
–
Data
acquisi*on
•
•
Rela*onal
DBMS
Translate
SPARQL
into
SQL
on
the
fly
Batch
transform
data
into
RDF,
infer,
index
,
integrate
and
provide
SPARQL
access
in
a
triplestore
8. MusicBrainz
Next
Gen
Schema
ar=st
As
pre-‐NGS,
but
further
a`ributes
ar=st_credit
Allows
joint
credit
release_group
Cf.
‘album’
versus:
• work
release
• track
medium
• tracklist
• recording
https://wiki.musicbrainz.org/Next_Generation_Schema
9. Music
Ontology
OWL
ontology
with
following
core
concepts
(classes)
and
rela*onships
(proper*es):
Source: http://musicontology.com
10. R2RML
Class
Mapping
Mapping
tables
to
classes
is
‘easy’:
lb:Artist
a
rr:TriplesMap
;
rr:logicalTable
[rr:tableName
"artist"]
;
rr:subjectMap
[rr:class
mo:MusicArtist
;
rr:template
"http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate
mo:musicbrainz_guid
;
rr:objectMap
[rr:column
"gid"
;
rr:datatype
xsd:string]]
.
11. R2RML
Property
Mapping
Mapping
columns
to
proper*es
can
be
easy:
lb:artist_name
a
rr:TriplesMap
;
rr:logicalTable
[rr:sqlQuery
"""SELECT
artist.gid,
artist_name.name
FROM
artist
INNER
JOIN
artist_name
ON
artist.name
=
artist_name.id"""]
;
rr:subjectMap
[rr:template
"http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate
foaf:name
;
rr:objectMap
[rr:column
"name"]]
.
12. NGS
Advanced
Rela=ons
Major
en**es
(Ar*st,
Release
Group,
Track,
etc.)
plus
URL
are
paired
(l_ar*st_ar*st)
Each
pairing
of
instances
refers
to
a
Link
Links
have
types
(cf.
RDF
proper*es)
and
a`ributes
http://wiki.musicbrainz.org/Advanced_Relationship
13. R2RML
Mapping
Editor
R2RML: Expose data from
relational DBMS as RDF /
via SPARQL Endpoint
Problem: R2RML
Mappings are
hard to create
R2RML
Engine
R2RML
Mappings
R2RML
Edi*ng
Made
Easy!
Hides
vocabulary
intricacies
from
end-‐user
Access
to
metadata
about
rela*onal
databases
Preview
of
generated
triples
and
SQL
queries
Very
expressive
(Supports
most
of
R2RML)
SPARQL
Endpoint
Rela*onal
Database
See our R2RML Mapping Editor in the ISWC Demo Session on Wednesday!
14. Scale
MusicBrainz
RDF
derived
via
R2RML:
150M
Triples
lb:artist_member
a
rr:TriplesMap
;
rr:logicalTable
[rr:sqlQuery
"""SELECT
a1.gid,
a2.gid
AS
band
FROM
artist
a1
INNER
JOIN
l_artist_artist
ON
a1.id
=
l_artist_artist.entity0
INNER
JOIN
link
ON
l_artist_artist.link
=
link.id
INNER
JOIN
link_type
ON
link_type
=
link_type.id
INNER
JOIN
artist
a2
on
l_artist_artist.entity1
=
a2.id
WHERE
link_type.gid='5be4c609-‐9afa-‐4ea0-‐910b-‐12ffb71e3821'"""]
;
rr:subjectMap
[rr:template
"http://musicbrainz.org/artist/{gid}
#_"]
;
rr:predicateObjectMap
[rr:predicate
mo:member_of
;
rr:objectMap
[rr:template
"http://musicbrainz.org/artist/{band}
#_"
;
rr:termType
rr:IRI]]
.
15. Some
Sta=s=cs
–
RDF
Dump
(Lead) Table
area
artist
dbpedia
label
medium
recording
release_group
release
track
work
Triples
59798
36868228
172017
201832
18069143
11400354
3050818
9764887
75506495
1728955
156822527
Time (s)
2
423
13
3
163
209
31
151
794
20
1809
16. Informa=on
Workbench
PlaGorm
for
Linked
Data
Applica=ons
§ Seman*cs-‐
&
Linked
Data-‐based
integra=on
of
private
and
public
data
sources
based
on
data
providers
•
•
•
Generic
and
specific
providers
for
various
data
formats
and
sources
Supports
established
mapping
frameworks
(e.g.
R2RML,
SILK,
…)
Named
graphs
for
managing
contexts
and
provenance
§ Intelligent
Data
Access
and
Analy=cs
•
•
•
Flexible
self-‐service
UI
Visualiza*on,
explora*on,
dashboarding
and
repor*ng
Seman*c
search
§ Collabora=on
and
knowledge
management
•
•
Cura*on
&
authoring
Collabora*ve
workflows
§
Open
standards
and
technologies
•
•
•
Seman*c
Wiki
based
frontend
(Using
SMW
Syntax)
Suppor*ng
W3C
standards
(OWL,
RDF,
SPARQL,,
…)
Community
Edi*on
(Open
Source)
+
Enterprise
Edi*on
(Commercial)
17. Realiza=on
within
the
Informa=on
Workbench
Architecture
Customized
applica*on
solu*ons
Reusable
UI
and
data
integra*on
components
Data
storage
and
management
plajorm
External
resources
to
reuse
data
and
create
mashups
18. The
“MusicBrainz
Explorer”
Applica=on
Music Ontology
Ontology
Data
R2RML
Data Providers
Templates
Widgets
19. Ontology
as
a
“Structural
Backbone”
Resource
page
Defining
UI
structure
Resource
page
mo:Track
mo:Ar=st
Defining
data
structure
rdf:type
Yesterday
UI
templates
Template:
…
Template:mo:Track
Template:mo:Ar=st
Ontology
(RDFS/OWL)
rdf:type
The_Beatles
RDF
Data
Graph
22. Naviga=on
Through
the
Data
Source: http://musicbrainz.fluidops.net/resource/Analytical5
23. SPARQL
visualization
Top ten The Beatles releases according to the sum of
track durations in minutes
SPARQL
Query
SELECT
?release
((SUM(xsd:double(?duration/60000)))
AS
?avg)
WHERE
{
<http://dbpedia.org/resource/The_Beatles>
foaf:made
?release
.
?release
mo:record
?record
.
?record
mo:track
?track
.
?track
mo:duration
?duration
.}
GROUP
BY
?release
ORDER
BY
DESC(?avg)
LIMIT
10
Result
set
24. SPARQL
visualization
Top ten The Beatles releases according to the sum of track durations
in minutes
Widget
{{#widget:
BarChart
|
query
='SELECT
(COUNT(?Release)
AS
?COUNT)
?
label
WHERE
{
<http://musicbrainz.org/artist/8538e728-‐ca0b-‐4321-‐b7e5-‐
cff6565dd4c0#_>
foaf:made
?Release.
?Release
rdf:type
mo:Release
.
?Release
dc:title
?label
.}
GROUP
BY
?label
ORDER
BY
DESC(?COUNT)
LIMIT
20'
|
settings
=
'Settings:barvertical_mb'
|
asynch
=
'true'
|
input
=
'label'
|
output
=
'COUNT'
|
height
=
'300’}}
Visualization:
Bar
chart
25. Information
Workbench:
SPARQL
visualization
Top ten The Beatles releases according to the sum of track
durations in minutes
Other
visualiza*ons
of
the
same
result
set
…
Line
chart:
Pie
chart:
26. Automated
Widget
Suggestion
1
Table
Pivot
view
Bar chart
Line chart
Pie chart
2
Select a suggested visualization
3
Visualization
automatically
built
27. Try
it
out!
R2RML
Mappings
•
h`ps://github.com/LinkedBrainz/MusicBrainz-‐R2RML
MusicBrainz
RDF
Dump
•
h`p://mbsandbox.org/~barry/
MusicBrainz
Linked
Data
Demo
system
• h`p://musicbrainz.fluidops.net/
Informa*on
Workbench
•
h`p://www.fluidops.com/informa*on-‐workbench/
Euclid
Project
•
h`p://euclid-‐project.eu/
28. Acknowledgements
The
Euclid
Project
Barry
Norton
Michael
Meier
Andriy
Nikolov
Yves
Raimond
Kurt
Jacobson
Thomas
Gaengler
Juan
Sequeda
Simon
Dixon
(in
no
par;cular
order)
29. Thank
you!
Contact
Peter
Haase
fluid
Opera*ons
AG
Altro`str.
31
Walldorf
Germany
+49
(0)
6227
358087-‐0
www.fluidops.com
peter.haase@fluidOps.com