SlideShare a Scribd company logo
1 of 31
Download to read offline
Data Quality Assessment
for Linked Data: A Survey
Amrapali Zaveri, Anisa Rula, Andrea Maurino,
Ricardo Pietrobon, Jens Lehmann, Sören Auer
1Data Quality Tutorial, September 12, 2016
Outline
Survey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice
2
Outline
Survey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice
3
Survey Methodology — Steps I
Related Surveys
Research
Questions
Eligibility
Criteria
Search
Strategy
Title & Abstract
Reviewing
4
Survey Methodology — Research Questions
• How can one assess the quality of Linked Data employing a
conceptual framework integrating prior approaches?

• What are the data quality problems that each approach assesses?

• Which are the data quality dimensions and metrics supported by
the proposed approaches?

• What kinds of tools are available for data quality assessment?
5
Survey Methodology — Eligibility Criteria
Inclusion criteria:
Must satisfy:
• published between
2002 and 2014.

Should satisfy:
• data quality
assessment

• trust assessment 

• proposed and/or
implemented an
approach 

• assessed the quality
of LD or information
systems based on LD
Exclusion criteria:
• not peer-reviewed
• published as a poster abstract

• data quality management

• other forms of structured data
• did not propose any methodology or
framework
6
Survey Methodology — Steps
Remove duplicates
Further potential
articles
Compare short-
listed articles
Quantitative
analysis
Qualitative
analysis
7
Survey Methodology — Results
8
30 core
articles
Conference - 21
Journal - 8
Masters Thesis - 1
18 Dimensions
69 Metrics
Outline
Survey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice
9
LDQ Dimensions & Metrics
• Data Quality: commonly conceived as a multi-dimensional
construct with a popular definition ‘fitness for use’*.
• Dimension: characteristics of a dataset.
• Metric: or indicator is a procedure for measuring an information
quality dimension.
10
*Juran et al., The Quality Control Handbook, 1974
18 LDQ Dimensions
11
LDQ Dimensions - Accessibility dimensions & metrics
• Availability - extent to which data (or some portion of it) is present, obtainable and
ready for use

• accessibility of the SPARQL endpoint and the server

• dereferenceability of the URI

• Interlinking - degree to which entities that represent the same concept are linked to
each other, be it within or between two or more data sources

• detection of the existence and usage of external URIs
• detection of all local in-links or back-links: all triples from a dataset that have the
resource’s URI as the object
12
LDQ Dimensions - Representational dimensions & metrics
• Interoperability - degree to which the format and structure of the information conforms to
previously returned information as well as data from other sources

• detection of whether existing terms from all relevant vocabularies for that particular
domain have been reused

• usage of existing vocabularies for a particular domain

• Interpretability - refers to technical aspects of the data, that is, whether information is
represented using an appropriate notation and whether the machine is able to process the
data 

• detection of invalid usage of undefined classes and properties

• detecting the use of appropriate language, symbols, units, datatypes and clear definitions
13
LDQ Dimensions - Intrinsic dimensions & metrics
• Syntactic Validity - degree to which an RDF document conforms to
the specification of the serialization format

• detecting syntax errors using (i) validators, (ii) via crowdsourcing

• by (i) use of explicit definition of the allowed values for a datatype,
(ii) syntactic rules (type of characters allowed and/or the pattern of
literal values)

14
LDQ Dimensions - Intrinsic dimensions & metrics
• Completeness
• Schema - ontology completeness
• no. of classes and properties represented / total no. of classes and properties
• Property - missing values for a specific property
• no. of values represented for a specific property / total no. of values for a
specific property
• Population - % of all real-world objects of a particular type
• Interlinking - degree to which instances in the dataset are interlinked
15
LDQ Dimensions - Contextual dimensions & metrics
• Understandability - refers to the ease with which data can be comprehended
without ambiguity and be used by a human information consumer
• human-readable labelling of classes, properties and entities as well as
presence of metadata

• indication of the vocabularies used in the dataset

• Timeliness - measures how up-to-date data is relative to a specific task

• freshness of datasets based on currency and volatility

• freshness of datasets based on their data source
16
Outline
Survey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice
17
LDQ Assessment Tools
18
LDQ Assessment Tools - RDFUnit
http://aksw.org/Projects/RDFUnit.html 19
Syntactic
Validity
Semantic
Accuracy
Consistency
LDQ Assessment Tools - Dacura
http://dacura.cs.tcd.ie/about-dacura/ 20
Interpretability
Semantic
Accuracy
Consistency
Outline
Survey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice
21
Linked Data Quality — In Practice
22
Linked Data
Quality
Methodologies
Tools
Use Cases
Beyond Data
Vocabulary
23
Crowdsourcing Linked Data Quality Assessment
LDQ Assessment Tools — Luzzu
http://eis-bonn.github.io/Luzzu/index.html
24
2
Assess
3 Clean
4 Store5 Rank
1 Metric
LDQ Assessment Tools — LODLaundromat
http://lodlaundromat.org/
25
LDQ Use Cases — Open Data Portals
26
Automated Quality Assessment of Metadata across Open Data Portals.
Neumaier et. al., JDIQ 2016.
Completeness Interoperability
Relevancy Accuracy
Openness
LDQ Beyond Data — Mapping Quality
27
Dimou et al. Assessing and Refining Mappings to RDF to Improve Dataset Quality.
ISWC 2015.
https://github.com/RMLio/RML-Validator
28
W3C
Data
Quality
Vocabulary
https://www.w3.org/
TR/vocab-dqv/
W3C Data Quality Vocabulary
29
https://www.w3.org/TR/vocab-dqv/
dqv:Category
dqv:Dimension
dqv:Metric
dqv:QualityMe
asurement
qb:Observation
dqv:QualityMeas
urementDataset
qb:DataSet
dqv:inDimension
dqv:inCategory
dqv:isMeasurementOf
dqv:hasQuality
Measurement
Challenges
• Propagation of errors

• Management/Improvement

• Usage of the standard vocabulary

• Quality-based search engines
30
Thank you!

Questions?
amrapali@stanford.edu

@AmrapaliZ
Quality assessment for linked data: A survey
A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer
Semantic Web 7 (1), 63-93

More Related Content

What's hot

Metadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionMetadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionPéter Király
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Péter Király
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsPéter Király
 
Quality aware subgraph matching over inconsistent probabilistic graph databases
Quality aware subgraph matching over inconsistent probabilistic graph databasesQuality aware subgraph matching over inconsistent probabilistic graph databases
Quality aware subgraph matching over inconsistent probabilistic graph databasesieeechennai
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataEvangelia Daskalaki
 
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...Anastasija Nikiforova
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...GUANGYUAN PIAO
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.Giuseppe Ricci
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked DataENCODE-DCC
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Jyotindra Zaveri
 

What's hot (20)

Metadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionMetadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full version
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
 
Quality aware subgraph matching over inconsistent probabilistic graph databases
Quality aware subgraph matching over inconsistent probabilistic graph databasesQuality aware subgraph matching over inconsistent probabilistic graph databases
Quality aware subgraph matching over inconsistent probabilistic graph databases
 
Konrad cedem praesi
Konrad cedem praesiKonrad cedem praesi
Konrad cedem praesi
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
 
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
RDF data clustering
RDF data clusteringRDF data clustering
RDF data clustering
 
PhD defense
PhD defense PhD defense
PhD defense
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 

Viewers also liked

Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzujerdeb
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...semanticsconference
 
Managing Completeness of Web Data
Managing Completeness of Web DataManaging Completeness of Web Data
Managing Completeness of Web DataFariz Darari
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
Security Risk Assessment for Quality Web Design
Security Risk Assessment for Quality Web DesignSecurity Risk Assessment for Quality Web Design
Security Risk Assessment for Quality Web DesignTing Yin
 
Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...Routine Health Information NetwOrk (RHINO)
 
LDIF Lightening Talk
LDIF Lightening TalkLDIF Lightening Talk
LDIF Lightening TalkWilliam Smith
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
 
2014 review of data quality assessment methods
2014 review of data quality assessment methods2014 review of data quality assessment methods
2014 review of data quality assessment methodsRoger Zapata
 
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentLeveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentUmair ul Hassan
 
Data quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, NepalData quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, NepalSurvey Department
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked dataWilliam Smith
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityNuffield Trust
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityFariz Darari
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Alex Rayón Jerez
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Pablo Mendes
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentOlaf Hartig
 
Linked data the next 5 years - From Hype to Action
Linked data the next 5 years - From Hype to ActionLinked data the next 5 years - From Hype to Action
Linked data the next 5 years - From Hype to ActionAndreas Blumauer
 

Viewers also liked (20)

Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
 
Managing Completeness of Web Data
Managing Completeness of Web DataManaging Completeness of Web Data
Managing Completeness of Web Data
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
Security Risk Assessment for Quality Web Design
Security Risk Assessment for Quality Web DesignSecurity Risk Assessment for Quality Web Design
Security Risk Assessment for Quality Web Design
 
Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...
 
LDIF Lightening Talk
LDIF Lightening TalkLDIF Lightening Talk
LDIF Lightening Talk
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
 
2014 review of data quality assessment methods
2014 review of data quality assessment methods2014 review of data quality assessment methods
2014 review of data quality assessment methods
 
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentLeveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
 
LDQ 2014 DQ Methodology
LDQ 2014 DQ MethodologyLDQ 2014 DQ Methodology
LDQ 2014 DQ Methodology
 
Data quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, NepalData quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data Quality
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality Assessment
 
Linked data the next 5 years - From Hype to Action
Linked data the next 5 years - From Hype to ActionLinked data the next 5 years - From Hype to Action
Linked data the next 5 years - From Hype to Action
 

Similar to Linked Data Quality Assessment: A Survey

Workshop on Data Quality Management in Wikidata
Workshop on Data Quality Management in WikidataWorkshop on Data Quality Management in Wikidata
Workshop on Data Quality Management in WikidataAmrapali Zaveri, PhD
 
Chapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptChapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptAnasSamara3
 
Metadata quality in digital repositories
Metadata quality in digital repositoriesMetadata quality in digital repositories
Metadata quality in digital repositoriesNikos Palavitsinis, PhD
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
2006 qualitative strategies conference uni durham - are we there yet - grow...
2006 qualitative strategies conference   uni durham - are we there yet - grow...2006 qualitative strategies conference   uni durham - are we there yet - grow...
2006 qualitative strategies conference uni durham - are we there yet - grow...Christopher Thorn
 
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Kathmandu Living Labs
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt doPonnuthuraiSelvaraj1
 
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Visual Resources Association
 
Survival Guide: Taming the Data Quality Beast
Survival Guide: Taming the Data Quality BeastSurvival Guide: Taming the Data Quality Beast
Survival Guide: Taming the Data Quality BeastTechWell
 
Governmental Linked Open Data: A Data Management Perspective
Governmental Linked Open Data: A Data Management PerspectiveGovernmental Linked Open Data: A Data Management Perspective
Governmental Linked Open Data: A Data Management Perspectivegreco_ufrj
 
Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...
Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...
Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...Kathmandu Living Labs
 
Introduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdfIntroduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdfAhmedHany Sayed
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorialDirk Roorda
 
Active actionable DMPs
Active actionable DMPsActive actionable DMPs
Active actionable DMPsSarah Jones
 
Connected development data
Connected development dataConnected development data
Connected development dataRob Worthington
 

Similar to Linked Data Quality Assessment: A Survey (20)

Workshop on Data Quality Management in Wikidata
Workshop on Data Quality Management in WikidataWorkshop on Data Quality Management in Wikidata
Workshop on Data Quality Management in Wikidata
 
Chapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptChapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.ppt
 
Metadata quality in digital repositories
Metadata quality in digital repositoriesMetadata quality in digital repositories
Metadata quality in digital repositories
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
2006 qualitative strategies conference uni durham - are we there yet - grow...
2006 qualitative strategies conference   uni durham - are we there yet - grow...2006 qualitative strategies conference   uni durham - are we there yet - grow...
2006 qualitative strategies conference uni durham - are we there yet - grow...
 
NISO Update ODI June 2014 Morse
NISO Update ODI June 2014 MorseNISO Update ODI June 2014 Morse
NISO Update ODI June 2014 Morse
 
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
 
Survival Guide: Taming the Data Quality Beast
Survival Guide: Taming the Data Quality BeastSurvival Guide: Taming the Data Quality Beast
Survival Guide: Taming the Data Quality Beast
 
Governmental Linked Open Data: A Data Management Perspective
Governmental Linked Open Data: A Data Management PerspectiveGovernmental Linked Open Data: A Data Management Perspective
Governmental Linked Open Data: A Data Management Perspective
 
Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...
Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...
Prof. Melinda Laituri, Colorado State University | Ethics's Guidelines for Se...
 
Introduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdfIntroduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdf
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial
 
Active actionable DMPs
Active actionable DMPsActive actionable DMPs
Active actionable DMPs
 
Connected development data
Connected development dataConnected development data
Connected development data
 

More from Amrapali Zaveri, PhD

Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principlesAmrapali Zaveri, PhD
 
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental DesignCrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental DesignAmrapali Zaveri, PhD
 
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentMetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentAmrapali Zaveri, PhD
 
smartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIssmartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIsAmrapali Zaveri, PhD
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionAmrapali Zaveri, PhD
 
User-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpediaUser-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpediaAmrapali Zaveri, PhD
 

More from Amrapali Zaveri, PhD (12)

Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principles
 
ESOF Panel 2018
ESOF Panel 2018ESOF Panel 2018
ESOF Panel 2018
 
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental DesignCrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
 
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentMetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
 
smartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIssmartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIs
 
Introduction to Bio SPARQL
Introduction to Bio SPARQL Introduction to Bio SPARQL
Introduction to Bio SPARQL
 
LOD-SEM
LOD-SEMLOD-SEM
LOD-SEM
 
TripleCheckMate
TripleCheckMateTripleCheckMate
TripleCheckMate
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
 
User-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpediaUser-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpedia
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
ReDD-Observatory
ReDD-ObservatoryReDD-Observatory
ReDD-Observatory
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Linked Data Quality Assessment: A Survey

  • 1. Data Quality Assessment for Linked Data: A Survey Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer 1Data Quality Tutorial, September 12, 2016
  • 2. Outline Survey Methodology LDQ Dimensions and Metrics LDQ Assessment Tools LDQ In Practice 2
  • 3. Outline Survey Methodology LDQ Dimensions and Metrics LDQ Assessment Tools LDQ In Practice 3
  • 4. Survey Methodology — Steps I Related Surveys Research Questions Eligibility Criteria Search Strategy Title & Abstract Reviewing 4
  • 5. Survey Methodology — Research Questions • How can one assess the quality of Linked Data employing a conceptual framework integrating prior approaches? • What are the data quality problems that each approach assesses? • Which are the data quality dimensions and metrics supported by the proposed approaches? • What kinds of tools are available for data quality assessment? 5
  • 6. Survey Methodology — Eligibility Criteria Inclusion criteria: Must satisfy: • published between 2002 and 2014. Should satisfy: • data quality assessment • trust assessment • proposed and/or implemented an approach • assessed the quality of LD or information systems based on LD Exclusion criteria: • not peer-reviewed • published as a poster abstract • data quality management • other forms of structured data • did not propose any methodology or framework 6
  • 7. Survey Methodology — Steps Remove duplicates Further potential articles Compare short- listed articles Quantitative analysis Qualitative analysis 7
  • 8. Survey Methodology — Results 8 30 core articles Conference - 21 Journal - 8 Masters Thesis - 1 18 Dimensions 69 Metrics
  • 9. Outline Survey Methodology LDQ Dimensions and Metrics LDQ Assessment Tools LDQ In Practice 9
  • 10. LDQ Dimensions & Metrics • Data Quality: commonly conceived as a multi-dimensional construct with a popular definition ‘fitness for use’*. • Dimension: characteristics of a dataset. • Metric: or indicator is a procedure for measuring an information quality dimension. 10 *Juran et al., The Quality Control Handbook, 1974
  • 12. LDQ Dimensions - Accessibility dimensions & metrics • Availability - extent to which data (or some portion of it) is present, obtainable and ready for use • accessibility of the SPARQL endpoint and the server • dereferenceability of the URI • Interlinking - degree to which entities that represent the same concept are linked to each other, be it within or between two or more data sources • detection of the existence and usage of external URIs • detection of all local in-links or back-links: all triples from a dataset that have the resource’s URI as the object 12
  • 13. LDQ Dimensions - Representational dimensions & metrics • Interoperability - degree to which the format and structure of the information conforms to previously returned information as well as data from other sources • detection of whether existing terms from all relevant vocabularies for that particular domain have been reused • usage of existing vocabularies for a particular domain • Interpretability - refers to technical aspects of the data, that is, whether information is represented using an appropriate notation and whether the machine is able to process the data • detection of invalid usage of undefined classes and properties • detecting the use of appropriate language, symbols, units, datatypes and clear definitions 13
  • 14. LDQ Dimensions - Intrinsic dimensions & metrics • Syntactic Validity - degree to which an RDF document conforms to the specification of the serialization format • detecting syntax errors using (i) validators, (ii) via crowdsourcing • by (i) use of explicit definition of the allowed values for a datatype, (ii) syntactic rules (type of characters allowed and/or the pattern of literal values)
 14
  • 15. LDQ Dimensions - Intrinsic dimensions & metrics • Completeness • Schema - ontology completeness • no. of classes and properties represented / total no. of classes and properties • Property - missing values for a specific property • no. of values represented for a specific property / total no. of values for a specific property • Population - % of all real-world objects of a particular type • Interlinking - degree to which instances in the dataset are interlinked 15
  • 16. LDQ Dimensions - Contextual dimensions & metrics • Understandability - refers to the ease with which data can be comprehended without ambiguity and be used by a human information consumer • human-readable labelling of classes, properties and entities as well as presence of metadata • indication of the vocabularies used in the dataset • Timeliness - measures how up-to-date data is relative to a specific task • freshness of datasets based on currency and volatility • freshness of datasets based on their data source 16
  • 17. Outline Survey Methodology LDQ Dimensions and Metrics LDQ Assessment Tools LDQ In Practice 17
  • 19. LDQ Assessment Tools - RDFUnit http://aksw.org/Projects/RDFUnit.html 19 Syntactic Validity Semantic Accuracy Consistency
  • 20. LDQ Assessment Tools - Dacura http://dacura.cs.tcd.ie/about-dacura/ 20 Interpretability Semantic Accuracy Consistency
  • 21. Outline Survey Methodology LDQ Dimensions and Metrics LDQ Assessment Tools LDQ In Practice 21
  • 22. Linked Data Quality — In Practice 22 Linked Data Quality Methodologies Tools Use Cases Beyond Data Vocabulary
  • 23. 23 Crowdsourcing Linked Data Quality Assessment
  • 24. LDQ Assessment Tools — Luzzu http://eis-bonn.github.io/Luzzu/index.html 24 2 Assess 3 Clean 4 Store5 Rank 1 Metric
  • 25. LDQ Assessment Tools — LODLaundromat http://lodlaundromat.org/ 25
  • 26. LDQ Use Cases — Open Data Portals 26 Automated Quality Assessment of Metadata across Open Data Portals. Neumaier et. al., JDIQ 2016. Completeness Interoperability Relevancy Accuracy Openness
  • 27. LDQ Beyond Data — Mapping Quality 27 Dimou et al. Assessing and Refining Mappings to RDF to Improve Dataset Quality. ISWC 2015. https://github.com/RMLio/RML-Validator
  • 29. W3C Data Quality Vocabulary 29 https://www.w3.org/TR/vocab-dqv/ dqv:Category dqv:Dimension dqv:Metric dqv:QualityMe asurement qb:Observation dqv:QualityMeas urementDataset qb:DataSet dqv:inDimension dqv:inCategory dqv:isMeasurementOf dqv:hasQuality Measurement
  • 30. Challenges • Propagation of errors • Management/Improvement • Usage of the standard vocabulary • Quality-based search engines 30
  • 31. Thank you! Questions? amrapali@stanford.edu @AmrapaliZ Quality assessment for linked data: A survey A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer Semantic Web 7 (1), 63-93