This document discusses the issue of data diversity in smart city data hubs. It argues that data can be diverse in many ways, not just content but also in attributes like licensing and policies. This diversity needs to be managed carefully. The document presents an example of WiFi sensors generating diverse data for a smart city project in Milton Keynes. It proposes that semantic representations of data flows and relationships can help manage diversity at the metadata level and enable reasoning about how attributes propagate across linked data. Formal ontologies are needed to clearly describe diverse data artifacts and their relationships in smart city applications.
Dealing with Data Diversity in a Smart City Data Hub
1. Dealing with Data Diversity in a
Smart City Data Hub
Mathieu d'Aquin - @mdaquin
slideshare.net/mdaquin
Knowledge Media Institute, The Open University
3. Why should we care about
diversity?
Because diversity is good, and what
makes data diverse is not the same as
what makes it more or less relevant
4. Why should we care about
diversity?
Because it is hard to manage
How many species of species of
penguins/animals/things?
How many biologist to classify them?
and that's purely static... unlike species, new data
appear all the time...
5. Why should we care about
diversity?
The
Eskimo language
has 255 different
words for
"visiting linguist"
Because we might have a lot of it, or
what we need to manage is very
granular
6. Data diversity in a Smart City
Example of the MK:Smart project in
Milton Keynes, UK (mksmart.org)
21. How do we usually deal with
this
data heterogenity
for we use alignments, mappings, links, etc.
Example: The LinkedUp Catalogue of datasets
for education includes mappings between
the vocanulaties of different datasets
data.linkededucation.org/linkedup/catalogue/
32. DataNode
Captures the essence of dataflows rather than the process, as a basis for
meta-information propagation.
33. Propagating meta information
accross dataflows
Examples of rules:
Duties such as attributions propagate over relations of derivation, but
not necessraly others
Permissions such as the right to redistribute however do not
propagate over relations of derivation, except of specific cases (e.g.
copies)
Prohibitions such as preventing commercial exploitation propage over
derivations
34. Discussion/future
A lot of the semantics for Smart Cities work focus on data heterogeneity.
There is a need to look at data diversity at the meta-information level
(here we focus on policy related information).
How to manage, catalogue, keep track of and manipulate a large
number of datasets with diverse rights, access, validity, scope.
How do we help users/developers in exploring and exploiting this
diversity...
36. Discussion/future
Need for a clear, semantic (i.e. ontological) foundation for describing
and defining data artefacts.
DataNode is a step towards defining their relationships. Vocabularies
such as ODRL and VOID focus on specific aspects.
More is needed to formally represent the foundamental descriptors of
data (scope, validity, policy, ...)
37. Thanks!
Mathieu d'Aquin Alessandro Adamou Enrico Daga
Shuangyan Liu Keerthi Thomas Enrico Motta