4. Survey Methodology — Steps I
Related Surveys
Research
Questions
Eligibility
Criteria
Search
Strategy
Title & Abstract
Reviewing
4
5. Survey Methodology — Research Questions
• How can one assess the quality of Linked Data employing a
conceptual framework integrating prior approaches?
• What are the data quality problems that each approach assesses?
• Which are the data quality dimensions and metrics supported by
the proposed approaches?
• What kinds of tools are available for data quality assessment?
5
6. Survey Methodology — Eligibility Criteria
Inclusion criteria:
Must satisfy:
• published between
2002 and 2014.
Should satisfy:
• data quality
assessment
• trust assessment
• proposed and/or
implemented an
approach
• assessed the quality
of LD or information
systems based on LD
Exclusion criteria:
• not peer-reviewed
• published as a poster abstract
• data quality management
• other forms of structured data
• did not propose any methodology or
framework
6
10. LDQ Dimensions & Metrics
• Data Quality: commonly conceived as a multi-dimensional
construct with a popular definition ‘fitness for use’*.
• Dimension: characteristics of a dataset.
• Metric: or indicator is a procedure for measuring an information
quality dimension.
10
*Juran et al., The Quality Control Handbook, 1974
12. LDQ Dimensions - Accessibility dimensions & metrics
• Availability - extent to which data (or some portion of it) is present, obtainable and
ready for use
• accessibility of the SPARQL endpoint and the server
• dereferenceability of the URI
• Interlinking - degree to which entities that represent the same concept are linked to
each other, be it within or between two or more data sources
• detection of the existence and usage of external URIs
• detection of all local in-links or back-links: all triples from a dataset that have the
resource’s URI as the object
12
13. LDQ Dimensions - Representational dimensions & metrics
• Interoperability - degree to which the format and structure of the information conforms to
previously returned information as well as data from other sources
• detection of whether existing terms from all relevant vocabularies for that particular
domain have been reused
• usage of existing vocabularies for a particular domain
• Interpretability - refers to technical aspects of the data, that is, whether information is
represented using an appropriate notation and whether the machine is able to process the
data
• detection of invalid usage of undefined classes and properties
• detecting the use of appropriate language, symbols, units, datatypes and clear definitions
13
14. LDQ Dimensions - Intrinsic dimensions & metrics
• Syntactic Validity - degree to which an RDF document conforms to
the specification of the serialization format
• detecting syntax errors using (i) validators, (ii) via crowdsourcing
• by (i) use of explicit definition of the allowed values for a datatype,
(ii) syntactic rules (type of characters allowed and/or the pattern of
literal values)
14
15. LDQ Dimensions - Intrinsic dimensions & metrics
• Completeness
• Schema - ontology completeness
• no. of classes and properties represented / total no. of classes and properties
• Property - missing values for a specific property
• no. of values represented for a specific property / total no. of values for a
specific property
• Population - % of all real-world objects of a particular type
• Interlinking - degree to which instances in the dataset are interlinked
15
16. LDQ Dimensions - Contextual dimensions & metrics
• Understandability - refers to the ease with which data can be comprehended
without ambiguity and be used by a human information consumer
• human-readable labelling of classes, properties and entities as well as
presence of metadata
• indication of the vocabularies used in the dataset
• Timeliness - measures how up-to-date data is relative to a specific task
• freshness of datasets based on currency and volatility
• freshness of datasets based on their data source
16
26. LDQ Use Cases — Open Data Portals
26
Automated Quality Assessment of Metadata across Open Data Portals.
Neumaier et. al., JDIQ 2016.
Completeness Interoperability
Relevancy Accuracy
Openness
27. LDQ Beyond Data — Mapping Quality
27
Dimou et al. Assessing and Refining Mappings to RDF to Improve Dataset Quality.
ISWC 2015.
https://github.com/RMLio/RML-Validator