SlideShare a Scribd company logo
1 of 50
Shaping the Big Ball of Data Mud
W3C's Shapes Constraint Language (SHACL)
Richard Cyganiak
Lotico Berlin Semantic Web Meetup, 17 November 2016
Semantic Web
RDF
SPARQL
OWL
RDFS
RDF
SPARQL
OWL
RDFS
Strengths Weaknesses
• Flexible can-say-anything data model
• Merging data is trivial
• Shared, explicit meaning thanks to URIs
• Mixing and matching of schemas;
partial understanding
• Painstakingly developed vocabularies
• “Neutral ground” for modelling
• SPARQL
• Overgeneralisation: works for
anything, but great at nothing
• “RDF tax”
• Logic foundations and web
foundations can be baggage
• Maps poorly to common
programming language data
structures
• Schemaless nature makes
optimisation difficult
• Not good at semi-structured
Application Areas
• Knowledge graphs
• Publishing
• Life sciences
• Fraud detection & identity management
• Data integration & analysis
The V’s of Big Data: Volume, Velocity, Variety
https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
RDF is supposedly self-describing.
RDF
Schema.org
Simple Knowledge Organization Scheme
(SKOS)
Dublin Core
Data Cube Vocabulary
R2RML
Linked Data Platform (LDP)
Why is RDFS not enough?
RDF
SPARQL
OWL
RDFS
Why is RDFS not enough?
• RDF “Schema” — and schemas are for validation, right?
• It’s a misnomer; should be “RDF Vocabulary Definition Language”
• Very limited expressivity
• Not the right semantics for validation
• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?
• Invalid data -> infer more invalid data
=> ex:Germany a ex:City
RDFS
Why is OWL not enough?
RDF
SPARQL
OWL
RDFS
Why is OWL not enough?
• De facto a constraint language: logical contradiction => invalid
• Very expressive
• But targeted at logic modelling, not validity constraints
• Not the right semantics for validation
• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?
• Open world assumption
• No unique name assumption
=> ex:Ireland owl:sameAs ex:USA
OWL
ICV: OWL closed-world semantics in Stardog
Why is SPARQL not enough?
RDF
SPARQL
OWL
RDFS
Why is SPARQL not enough? SPARQL
http://spinrdf.org/
Why is SPARQL not enough?
• SPARQL ASK seems ideal for constraint validation
• Very expressive
• Efficient implementations
• But writing even simple constraints can be tedious
SPARQL
Other proposals
ShEx — Shape Expressions
http://shex.io/
So, something new?
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
SHACL
Shapes Constraint Language
SHACL Overview
• A language for “checking RDF graphs against conditions”
• Produced by W3C Data Shapes Working Group
• Work in progress, some features at risk
• 4th Working Draft: August 2016
• Should be done by June 2017
• Like RDFS and OWL, SHACL constraints are themselves written in RDF
• SPARQL underneath (for evaluation semantics and extensibility)
ex:PersonShape
a sh:Shape ;
sh:targetClass ex:Person ;
sh:property [
sh:predicate ex:ssn ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
sh:property [
sh:predicate ex:child ;
sh:class ex:Person ;
sh:nodeKind sh:IRI ;
] ;
sh:property [
sh:path [ sh:inversePath ex:child ] ;
sh:name "parent" ;
sh:maxCount 2 ;
] .
How a Shape works
Diagram: Dimitris Kontokostas
Targets: Initial selection of focus nodes
• Node target
• Class instance target
• Subjects-of target
• Objects-of target
• SPARQL-based selection (advanced)
Node constraints
Constraints about the focus node itself:
• Node kind (IRI, blank, literal)
• IRI stem (namespace)
• IRI regex
• SPARQL query constraint (advanced)
Property constraints
Constraints about a certain outgoing or incoming property of the focus
node(s):
• Cardinality
• Class
• Datatype
• Node kind (IRI, blank node, literal)
• String min/max length, string regex
• Numeric min/max
• Value must match another shape
• Value must not match another shape
Other features
• Combine constraints with logical OR/any (default: AND/all)
• Property-pair comparison (=, <, >)
• Severities (Violation, Warning, Info)
• Annotations (name, description, grouping, order)
• Define additional types of constraints based on SPARQL (advanced)
Violation reports can be produced in RDF
ex:ExampleConstraintViolation
a sh:ValidationResult ;
sh:severity sh:Violation ;
sh:focusNode ex:Bob ;
sh:path ex:age ;
sh:value "twenty two" ;
sh:message "ex:age must be literal of datatype xsd:integer." ;
sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
sh:sourceShape ex:PersonShape .
Relationship to Rules
• Rules: “If someone says this, then I say that.”
• SHACL can’t do this.
• Does not replace SWRL, Jena Rules, RIF, SPIN Rules
Uses and implementations
SHACL in TopBraid Composer:
Shapes + Constraints
SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
SHACL in TopBraid Composer: SPARQL-based constraints
SHACL in TopQuadrant’s web products (EVN, EDG)
SHACL Protégé Plugin
http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html
Repairing SKOS taxonomies with SHACL
Validation of SKOS with SHACL, and extension of SHACL with
specification of repair strategies.
Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
Validating the “bag of crisps”…
• Validation is often not about correct/incorrect or valid/invalid
• Constraints-first (e.g., SQL)
• Well-formed vs valid (e.g., XML Schema)
• Validation is often about completeness and correctness for a specific
purpose: “This is what I produce”; “This is what I understand”
• Assumption is that there may be other statements
• Different consumers may apply different constraints
• SHACL should work well in this flexible, multi-source, multi-consumer
world.
“Anyone can say anything about anything”
RDF
SPARQL
OWL
RDFS
Statements: What is being said?
What words do
we have?
What makes logical sense to say?
What did you say
about XYZ?
OWL SHACL
Is that word used correctly?
What do you need to know from me?
You can't say that here!
I’d never say that!
richard@topquadrant.com
Backup slides
SHACL: Shaping the Big Ball of Data Mud

More Related Content

What's hot

SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshellFabien Gandon
 
SPARQL 사용법
SPARQL 사용법SPARQL 사용법
SPARQL 사용법홍수 허
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
BLOGIC. (ISWC 2009 Invited Talk)
BLOGIC.  (ISWC 2009 Invited Talk)BLOGIC.  (ISWC 2009 Invited Talk)
BLOGIC. (ISWC 2009 Invited Talk)Pat Hayes
 
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Dongbum Kim
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraMarkus Lanthaler
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeAdriel Café
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
 
REST vs GraphQL
REST vs GraphQLREST vs GraphQL
REST vs GraphQLSquareboat
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Edureka!
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013Juan Sequeda
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Fabien Gandon
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j FundamentalsMax De Marzi
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
Graphql presentation
Graphql presentationGraphql presentation
Graphql presentationVibhor Grover
 

What's hot (20)

RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshell
 
SPARQL 사용법
SPARQL 사용법SPARQL 사용법
SPARQL 사용법
 
SPARQL Tutorial
SPARQL TutorialSPARQL Tutorial
SPARQL Tutorial
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
BLOGIC. (ISWC 2009 Invited Talk)
BLOGIC.  (ISWC 2009 Invited Talk)BLOGIC.  (ISWC 2009 Invited Talk)
BLOGIC. (ISWC 2009 Invited Talk)
 
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
 
JSON-LD
JSON-LDJSON-LD
JSON-LD
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and Hydra
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & Practice
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Language
 
REST vs GraphQL
REST vs GraphQLREST vs GraphQL
REST vs GraphQL
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
Graphql presentation
Graphql presentationGraphql presentation
Graphql presentation
 

Similar to SHACL: Shaping the Big Ball of Data Mud

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesKurt Cagle
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebShamod Lacoul
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic webMarakana Inc.
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CIvan Herman
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...Dr.-Ing. Thomas Hartmann
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web workPaul Houle
 

Similar to SHACL: Shaping the Big Ball of Data Mud (20)

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data Frames
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic web
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
Linked services
Linked servicesLinked services
Linked services
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
SPIN and Shapes
SPIN and ShapesSPIN and Shapes
SPIN and Shapes
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
SPIN in Five Slides
SPIN in Five SlidesSPIN in Five Slides
SPIN in Five Slides
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 

More from Richard Cyganiak

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsRichard Cyganiak
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Richard Cyganiak
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyRichard Cyganiak
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksRichard Cyganiak
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government DataRichard Cyganiak
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesRichard Cyganiak
 

More from Richard Cyganiak (11)

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations Ontology
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

SHACL: Shaping the Big Ball of Data Mud

  • 1. Shaping the Big Ball of Data Mud W3C's Shapes Constraint Language (SHACL) Richard Cyganiak Lotico Berlin Semantic Web Meetup, 17 November 2016
  • 4. Strengths Weaknesses • Flexible can-say-anything data model • Merging data is trivial • Shared, explicit meaning thanks to URIs • Mixing and matching of schemas; partial understanding • Painstakingly developed vocabularies • “Neutral ground” for modelling • SPARQL • Overgeneralisation: works for anything, but great at nothing • “RDF tax” • Logic foundations and web foundations can be baggage • Maps poorly to common programming language data structures • Schemaless nature makes optimisation difficult • Not good at semi-structured
  • 5. Application Areas • Knowledge graphs • Publishing • Life sciences • Fraud detection & identity management • Data integration & analysis The V’s of Big Data: Volume, Velocity, Variety
  • 7.
  • 9. RDF is supposedly self-describing. RDF
  • 14. R2RML
  • 16. Why is RDFS not enough? RDF SPARQL OWL RDFS
  • 17. Why is RDFS not enough? • RDF “Schema” — and schemas are for validation, right? • It’s a misnomer; should be “RDF Vocabulary Definition Language” • Very limited expressivity • Not the right semantics for validation • ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …? • Invalid data -> infer more invalid data => ex:Germany a ex:City RDFS
  • 18. Why is OWL not enough? RDF SPARQL OWL RDFS
  • 19. Why is OWL not enough? • De facto a constraint language: logical contradiction => invalid • Very expressive • But targeted at logic modelling, not validity constraints • Not the right semantics for validation • ex:Dublin ex:inCountry ex:Ireland, ex:USA => …? • Open world assumption • No unique name assumption => ex:Ireland owl:sameAs ex:USA OWL
  • 20. ICV: OWL closed-world semantics in Stardog
  • 21. Why is SPARQL not enough? RDF SPARQL OWL RDFS
  • 22. Why is SPARQL not enough? SPARQL
  • 24. Why is SPARQL not enough? • SPARQL ASK seems ideal for constraint validation • Very expressive • Efficient implementations • But writing even simple constraints can be tedious SPARQL
  • 26. ShEx — Shape Expressions http://shex.io/
  • 29. SHACL Overview • A language for “checking RDF graphs against conditions” • Produced by W3C Data Shapes Working Group • Work in progress, some features at risk • 4th Working Draft: August 2016 • Should be done by June 2017 • Like RDFS and OWL, SHACL constraints are themselves written in RDF • SPARQL underneath (for evaluation semantics and extensibility)
  • 30. ex:PersonShape a sh:Shape ; sh:targetClass ex:Person ; sh:property [ sh:predicate ex:ssn ; sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; sh:property [ sh:predicate ex:child ; sh:class ex:Person ; sh:nodeKind sh:IRI ; ] ; sh:property [ sh:path [ sh:inversePath ex:child ] ; sh:name "parent" ; sh:maxCount 2 ; ] .
  • 31. How a Shape works Diagram: Dimitris Kontokostas
  • 32. Targets: Initial selection of focus nodes • Node target • Class instance target • Subjects-of target • Objects-of target • SPARQL-based selection (advanced)
  • 33. Node constraints Constraints about the focus node itself: • Node kind (IRI, blank, literal) • IRI stem (namespace) • IRI regex • SPARQL query constraint (advanced)
  • 34. Property constraints Constraints about a certain outgoing or incoming property of the focus node(s): • Cardinality • Class • Datatype • Node kind (IRI, blank node, literal) • String min/max length, string regex • Numeric min/max • Value must match another shape • Value must not match another shape
  • 35. Other features • Combine constraints with logical OR/any (default: AND/all) • Property-pair comparison (=, <, >) • Severities (Violation, Warning, Info) • Annotations (name, description, grouping, order) • Define additional types of constraints based on SPARQL (advanced)
  • 36. Violation reports can be produced in RDF ex:ExampleConstraintViolation a sh:ValidationResult ; sh:severity sh:Violation ; sh:focusNode ex:Bob ; sh:path ex:age ; sh:value "twenty two" ; sh:message "ex:age must be literal of datatype xsd:integer." ; sh:sourceConstraintComponent sh:DatatypeConstraintComponent ; sh:sourceShape ex:PersonShape .
  • 37. Relationship to Rules • Rules: “If someone says this, then I say that.” • SHACL can’t do this. • Does not replace SWRL, Jena Rules, RIF, SPIN Rules
  • 39. SHACL in TopBraid Composer: Shapes + Constraints SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
  • 40. SHACL in TopBraid Composer: SPARQL-based constraints
  • 41. SHACL in TopQuadrant’s web products (EVN, EDG)
  • 42.
  • 44. Repairing SKOS taxonomies with SHACL Validation of SKOS with SHACL, and extension of SHACL with specification of repair strategies. Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
  • 45.
  • 46. Validating the “bag of crisps”… • Validation is often not about correct/incorrect or valid/invalid • Constraints-first (e.g., SQL) • Well-formed vs valid (e.g., XML Schema) • Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand” • Assumption is that there may be other statements • Different consumers may apply different constraints • SHACL should work well in this flexible, multi-source, multi-consumer world.
  • 47. “Anyone can say anything about anything” RDF SPARQL OWL RDFS Statements: What is being said? What words do we have? What makes logical sense to say? What did you say about XYZ? OWL SHACL Is that word used correctly? What do you need to know from me? You can't say that here! I’d never say that!

Editor's Notes

  1. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.
  2. Talk will be about validation and SHACL, but I’d like to start by setting the scene Where is the Semantic Web on the hype cycle? Arguably, it went over the bump twice already: with focus on logic/AI around 2000, and focus on Linked Data around 2010. I helped to fan the flames of the second hype. The base standards: Today it's no longer that exciting. Overblown expectations have cooled off. It’s no longer expected to change the world. Getting stable and mature. Specific applications can be elsewhere on the cycle. See “Enterprise Taxonomy and Ontology Management”. That’s actually what TQ does.
  3. If you work with these technologies, life is pretty good these days, and still getting better. Maturing standards and tool support. And today we really understand what the technologies are good at, and what not.
  4. “Maps poorly to programming languages”: property names are not simple identifiers, every property can be multivalued; need navigability along incoming and outgoing arcs; ordering is difficult Semi-structured is important in big data
  5. We know where it works and where it doesn’t. It’s productive in a number of niches. RDF is good at dealing with Variety. (But not good enough: contextual validation, fuzzy/statistical matching for the semi-structured stuff) Variety tends to make logic approaches difficult—no single global truth—less OWL, more SPARQL
  6. Tim Berners-Lee deconstructing a bag of crisps Perfect metaphor for the strengths of the SW Different information co-exists on the packaging: the plain English “potato chips” the nutrition information on the back, standardized by the U.S. food and drug administration some allergy information that many people don’t pay any attention to, but those with allergies read very carefully. the UPC code that can be read by any retail checkout machine in the world some numbers on the bottom edge of the package that make no sense to him whatsoever. Mixing and matching of different vocabularies, standardised by different organisations, intended for different consumers. Partial understanding. Once you have agreed on an identifier for a thing *and a location for data about it*, different data producers and consumers can use it without stepping on each others' toes.
  7. The two main open source implementations of the technology stack, Jena and Sesame, are now at the Apache Foundation and at the Eclipse Foundation—big, established, mature, enterprisey organisations.
  8. So life is pretty good. Maturing technology stack, clearly understood strengths and weaknesses, productive niches, improving tools. But… We never solved validation. That’s kind of surprising. After all, each of these technologies has aspects that address these needs. Review one by one
  9. Every class and property has a URI. The URI references an ontology that defines the term. So each triple describes itself, right? One of the major strengths, right? No. Actually, most of the meaning is just not given in the ontology. Too much of the meaning is implicit, or just written down in text somewhere and cannot be automatically checked. Let me give examples.
  10. Arguably most important ontology in existence. Examples of things they want to validate in a tool for webmasters, came out of the workshop that kicked off the Data Shapes WG See https://www.w3.org/2001/sw/wiki/images/0/00/SimpleApplication-SpecificConstraintsforRDFModels.pdf https://www.w3.org/TR/shacl-ucr/#uc23-schema.org-constraints
  11. DC is widely used. It’s easy enough to agree on calling a title “dc:title” and an author “dc:creator”, but different orgs have widely differing views on what constitutes a complete metadata record. DC Application profiles as a response. DC developed its own way to represent those. Not a standard, not used apart from the DC community.
  12. I’ve been involved. We wrote constraint in prose, and added SPARQL queries to make it more formal/explicit. And yes people can copy-paste them. But still no way of just running all of them automatically against a published dataset! And no error reporting—just true/false.
  13. I’ve been involved. Mapping files are written in RDF; goal was to be very clear about what constitutes a valid mapping file. This is semi-formal. Surely this should be representable in some standard machine-readable way?
  14. Read/write Linked Data. Applications want to put constraints on the kind of data they can receive. Address book application wants to say that there should be an address in the RDF you PUT/POST. But completely punted on saying how to achieve it. “machine-readable ones facilitate better client interaction”—no shit!
  15. So, lots of initiatives that are serious about using SW in an interoperable and robust way end up just putting constraints in prose text, where it should really be in a machine-processable form. Same problem everywhere! But we have RDF Schema. SCHEMA!
  16. RDF Schema in analogy to XML Schema, but they really do very different things.
  17. So RDFS is just not powerful enough. But OWL surely gets us there?
  18. Clark&Parsia. Use OWL syntax, but switch to a semantics based on CW and UNA. This works pretty well! But can be a bit confusing—if you find some OWL, what semantics is intended? And OWL, while expressive, lacks some things that one would like to have in validation.
  19. We saw the Data Cube example where SPARQL was used to query the graph to see if it’s complete. Isn’t that enough to solve all validation issues?
  20. SPIN is a technology introduced by TQ. A bunch of things (rules written in SPARQL, templated SPARQL queries, defining custom SPARQL functions, etc.) We have used this for years and it actually works very well.
  21. Custom syntax. Somewhere between SPARQL, regular expressions, and grammar parsing. “regex for graphs.” Pretty cool. Concise. Needs new parsers.
  22. So, several good solutions around—but none has enough mindshare to take over. Meet at W3C, make a standard with the best aspects of each. (Or with the worst aspects of each—fingers crossed.)
  23. Some features and aspects are still highly controversial.
  24. When a violation occurs, the result is not just “false”. It’s a structure with info. Can process it in various ways. Just display it? Attach it to the right form field based on sh:path? Just count the violations per type in a large dataset? Different behaviour for different severitites?
  25. It’s still early days. Mostly individuals and organisations that are active in the working group.
  26. SHACL is getting really important to our products. We have made a major contribution. TQ’s Holger Knublauch is one of the editors of the spec. TBC is an SW IDE, workbench for SW professionals. At its heart is a schema/ontology editor. It supports editing of SHACL constraints through nice UI.
  27. EVN is a taxonomy and ontology management platform. EDG a data governance solution. SHACL allows our customers to add custom constraints over their own data models. Very powerful.
  28. Note the suggestions for fixing the problem. Goes beyond standard SHACL but an obvious addition and very cool.
  29. Not sure how well maintained and if it’s following the spec.
  30. Semantic Web Company. Nice application that shows using SHACL for bulk validation. Automated repair—somewhat similar to our suggestions extension.
  31. Dimitris Kontokostas (SHACL spec editor) and team at Uni Leipzig. Organising entire test suites, expressed originally in SPARQL but now with SHACL support, for data quality of large data sets. Used in context of DBpedia.
  32. So, how do the parts of the stack fit together? High-level view. Let’s run with the metaphor that anyone can say anything about anything. First we should note: Just because you can say anything about anything doesn’t mean you should! RDF is triples. But we also call them RDF statements. Each triple is a statement of some fact.
  33. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.