3. why things
instead of documents
Html
page
Html
pageHtml
page
Html
pageHtml
page
The nowadays WEB
4. why things
instead of documents
The nowadays WEB
at least 1.85 billion indexed documents
someone says 1 trillion online documents
Html
page
Html
pageHtml
page
Html
pageHtml
page
5. why things
instead of documents
The nowadays WEB
at least 1.85 billion indexed documents
someone says 1 trillion online documents
actually the best HTML parser is still the HUMAN BRAIN
Html
page
Html
pageHtml
page
Html
pageHtml
page
6. why things
instead of documents
The nowadays WEB
is not the WEB that
Tim proposed in 1998
7. why things
instead of documents
The nowadays WEB
is not the WEB that
Tim proposed in 1998
8. why things
instead of documents
The nowadays WEB
is not the WEB that
Tim proposed in 1998
9. what about URIs and RDF
a new way to publish data on the web
ids are ambiguous and suck!
Use URIs
as names for things
Use HTTP URIs
so that people can look up those names
Use the standards (RDF, SPARQL)
providing useful information
Include links to other URIs
so that they can discover more things
linked data principles
Tim Berners-Lee
July 27, 2006
11. what about URIs and RDF
turning web pages in “real” data
ids are ambiguous and suck!
12. […] l’animaletto venne indicato come:
“il tasso del tasso del Tasso”
Achille Campanile
It’s time for machine
(for parsing pages)
13. […] l’animaletto venne indicato come:
“il tasso del tasso del Tasso”
Achille Campanile
It’s time for machine
(for parsing pages)
http://it.dbpedia.org/resource/Meles_meles
http://it.dbpedia.org/resource/Taxus
http://it.dbpedia.org/resource/Torquato_Tasso
http://it.dbpedia.org/resource/Achille_Campanile
(author of the sentence)
14. A new way to design
databases
RDF
(aka ’define knowledge’)
15. Go Triples, go!
the standard (old) approach
ID_P COGNOME NOME REF_ID_SOCIETA GENERE
1 Camarda Diego 1 maschio
2 … … … …
ID_SOCIETA DENOMINAZIONE SITO
1 Regesta.exe srl www.regesta.com
16. Go Triples, go!
the new (cool) approach
<http://www.regesta.com/diego>Subject
17. Go Triples, go!
the new (cool) approach
<http://www.regesta.com/diego>
<http://xmlns.com/foaf/0.1/familyName>
Subject
Predicate
18. Go Triples, go!
the new (cool) approach
<http://www.regesta.com/diego>
<http://xmlns.com/foaf/0.1/familyName>
‘Camarda’.
Subject
Predicate
Object
19. Go Triples, go!
the new (cool) approach
<http://www.regesta.com/diego>
<http://xmlns.com/foaf/0.1/familyName> ‘Camarda’.
<http://www.regesta.com/diego>
<http://xmlns.com/foaf/0.1/firstName> ‘Diego’.
<http://www.regesta.com/diego>
<http://xmlns.com/foaf/0.1/gender> ‘male’.
20. Go Triples, go!
the new (cool) approach
<http://www.regesta.com/diego>
<http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ;
<http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ;
<http://xmlns.com/foaf/0.1/gender> ‘male’ .
40. The Resource Description Framework
is a general-purpose language for representing
information in the Web.
It's time for a new standard
RDF
41. The SPARQL Protocol and RDF Query Language
is a query language and protocol for RDF.
It's time for a new standard
SPARQL
42. On the Semantic Web, vocabularies define
the concepts and relationships
(also referred to as “terms”)
used to describe and represent
an area of concern.
It's time for a new standard
Ontologies
43. PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
foaf:firstName
dc:title
rdfs:label
Pre:fixes (ontologies)
just a few words
46. Resource Description
Framework
› SPARQL endpoint
› dereferenceable URIs
› content negotiation
› standards port, like 80 (HTTP)
› JSONP support
› up-to-date
› the endpoint URL is easy to deduce from resources
› the resources are described by dc:title or rdfs:label
› the endpoint hosts a page for humans
› the resources and the endpoint are on the same domain
SHOULD!
(please do it, for me)
59. DISTINCT, COUNT
GRAPH, PREFIX
isBlank, isIRI, isLiteral, isNumeric
FILTER, REGEX, STR
FILTER NOT EXISTS, MINUS
ORDER BY, OFFSET, LIMIT
for other stuff
http://www.w3.org/TR/sparql11-query/
SPARQL
minimum requirements
60. Please start negotiating content
right now!
Hi dude, I accept:
text/html,application/xhtml+xml
Html
pageGreat! I’ll serve you a web page
Hi dude, I accept:
application/rdf+xml
RDF
dataGreat… 303, redirect!
Hi dude, I accept:
pizza/margherita
406
errormmm… sorry
63. Java : Sesame / Jena
Python : RDFLib
Ruby : RDF.rb
nodeJs : sparql-client
or, as I do,
simple HTTP GET +
parsing result as json or xml
Please start negotiating content
…or a framework!
65. It’s slow
so keep calm
1 record 15 triples
2.949.771 votes 64.948.856 triples
usually
eg. Chamber of deputies
data big data
RDF probably will transform
66. Virtuoso
Sesame
Fuseki (Jena)
Owlim / Bigdata (Sesame)
AllegroGraph
D2R server
ARC2
…
Triplestores
I just need a SPARQL endpoint
I just really need http://yourdomain/sparql
68. select distinct ?o where {?s a ?o}
select ?o count(distinct ?s) where {?s a ?o}
select count(?s) where {?s ?p ?o}
select count(?s) ?class where {?s ?p ?o; a ?class}
select distinct ?p where {?s a <http://classe>; ?p ?o}
select ?p count(?p) where {?s a <http://classe>; ?p ?o}
select ?s where {?s a <http://classe>}
?p ?o where {<http://URI> ?p ?o} ?p ?o ?p1 ?o2
where {<http://URI> ?p ?o. OPTIONAL{?o ?p1 ?o2. FILTER(isBlank(?o))}}
select distinct ?s ?title where {?s a <http://classe>;
dc:title ?title. FILTER(REGEX(? title,’parola’,’i’))} LIMIT 100
SPARQL magic
a query for all seasons
74. W3C standards
http://www.w3.org/standards/semanticweb/
OKFN endpoints status (and list)
http://sparqles.okfn.org
LodLive (a SPRQL navigator)
http://en.lodlive.it
a very good intro to RDF
https://github.com/JoshData/rdfabout/blob/gh-pages/intro-to-rdf.md
Tim Berners-Lee’s “Linked Data – 5 stars ranking”
http://www.w3.org/DesignIssues/LinkedData.html
My github page
http://github.com/dvcama
My email
mailto:diego.camarda@regesta.com