The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
7. Creating Knowledge
out of Interlinked Data
Inter-linking/
Fusing
Classifi-cation/
Enrichment
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploration
Extraction
Storage/
Querying
Manual
revision/
authoring
Linked Data
Lifecycle
8. Creating Knowledge
out of Interlinked Data
Extraction
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
9. Creating Knowledge
out of Interlinked Data
From unstructured sources
• NLP, text mining, annotation
From semi-structured sources
• DBpedia, LinkedGeoData, DataCube
From structured sources
• RDB2RDF
Extraction
10. Creating Knowledge
out of Interlinked Data
Many different approaches: D2R, Virtuoso RDF Views, Triplify,
No agreement on a formal
semantics of RDF2RDF
mapping
• LOD readiness,
SPARQL-SQL translation
W3C RDB2RDF WG
Extraction Relational Data
Tool Triplify Sparqlify D2RQ
Virtuoso
RDF Views
Technology
Scripting
languages
(PHP)
Java Java
Whole
middleware
solution
SPARQL
endpoint
- X X X
Mapping
language
SQL
SPARQL
CONSTRUCT
Views + SQL
RDF based RDF based
Mapping
generation
Manual
Semi-
automatic
Semi-
automatic
Manual
Scalability
Medium-
high
(but no
SPARQL)
Very high Medium High
Malhotra, Auer, Erling, Hausenblas: W3C RDB2RDF Incubator Group Report. W3C RDB2RDF Incubator Group, 2009.
11. Creating Knowledge
out of Interlinked Data
• Rationale: Exploit existing formalisms
(SQL, SPARQL Construct) as much as
possible
• flexible & versatile mapping language
• translating one SPARQL query into
exactly one efficiently executable SQL
query
• Solid theoretical formalization based on
SPARQL-relational algebra
transformations
• Extremely scalable through elaborated
view candidate selection mechanism
• Used to publish 20B triples for
LinkedGeoData
Sparqlify
Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases.
Submitted to VLDB-Journal.
SPARQL
Construct
SQL
View
Bridge
12. Creating Knowledge
out of Interlinked Data
Storage and Querying
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
14. Creating Knowledge
out of Interlinked Data
1. Semantic (Text) Wikis
• Authoring of semantically
annotated texts
2. Semantic Data Wikis
• Direct authoring of
structured information
(i.e. RDF, RDF-Schema,
OWL)
Two Kinds of Semantic Wikis
15. Creating Knowledge
out of Interlinked Data
The situation at Daimler (€97.76 billion revenue, 250.000
employees):
• 3.000 heterogeneous IT systems
• Different units (car, bus, truck etc.) with very different views
• No common language
• Inability to identify crucial entities (parts, locations etc.)
enterprise wide
There is no (can not be a) single Enterprise Information Model
A distributed, iterative, bottom-up integration approach such as
Linked Data might be able to help (pay-as-you-go).
Can Linked Data help to solve the EII
problem in a fortune-500 company?
19. Creating Knowledge
out of Interlinked Data
Management of Enterprise Taxonomies with OntoWiki
Based on the W3C SKOS standard
Corporate Language Management at Daimler: 500k concepts in
20 languages
20. Creating Knowledge
out of Interlinked Data
Search after
Showing recommondations
from the knowledge base
integrating car model data
and enterprise taxonomy
21. Creating Knowledge
out of Interlinked Data
You can search for „Kombi“
(station wagon) and find T-
Models (Daimler term for
station waggon)
24. Creating Knowledge
out of Interlinked Data
In an uncontrolled
environment as the Data
Web, there will be a
proliferation of equivalent
or similar entity identifiers
Manual Link discovery:
• Sindice integration into UIs
• Semantic Pingback
Semi-automatic:
• SILK
• LIMES
Automatic/ Supervised:
• Raven [1]
Linking Entities on the Data Web
[1] Ngonga, Lehmann, Auer, Höffner: RAVEN -- Active Learning of Link Specifications, OM@ISWC, 2011.
25. Creating Knowledge
out of Interlinked Data
Enrichment
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
26. Creating Knowledge
out of Interlinked Data
Linked Data is mainly instance data!!!
ORE (Ontology Repair and Enrichment) tool allows to improve an
OWL ontology by fixing inconsistencies & making suggestions for
adding further axioms.
• Ontology Debugging: OWL reasoning to detect inconsistencies and
satisfiable classes + detect the most likely sources for the problems.
user can create a repair plan, while maintaining full control.
• Ontology Enrichment: uses the DL-Learner framework to suggest
definitions & super classes for existing classes in the KB. works if
instance data is available for harmonising schema and data.
http://aksw.org/Projects/ORE
Enrichment & Repair
Lehmann, Auer, Tramp: Class Expression Learning for Ontology Engineering. Journal of Web Semantics (JWS), 2011.
27. Creating Knowledge
out of Interlinked Data
Analysis
Quality
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
CC BY SA Wikipedia
28. Creating Knowledge
out of Interlinked Data
Quality on the Data Web is varying a lot
• Hand crafted or expensively curated knowledge base
(e.g. DBLP, UMLS) vs. extracted from text or Web
2.0 sources (DBpedia)
Research Challenge
• Establish measures for assessing the authority,
provenance, reliability of Data Web resources
Opportunity for EII: Employ crowd-sourced
knowledge from the Data Web in the Enterprise
Linked Data Quality Analysis
FP7-IP DIACHRON Managing the Evolution and Preservation of the Data Web
Started April 2013
30. Creating Knowledge
out of Interlinked Data
Exploration
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
31. Creating Knowledge
out of Interlinked Data
An ecosystem of LOD visualizations
LODExploration
Widgets
Spatial faceted-
browsing
Faceted-
browsing
Statistical
visualization
Entity-/faceted-
Based browsing
Domain specific
visualizations … …
LODDatasetsChoreography
layer
• Dataset analysis (size, vocabularies, property histograms etc.)
• Selection of suitable visualization widgets
Brunetti, Auer, García: The Linked Data Visualization Model. To appear in IJSWIS, 2012.
32. Creating Knowledge
out of Interlinked Data
LOD Life-(Washing-)cycle supported by Debian
based LOD2 Stack
http://stack.lod2.eu
33. Creating Knowledge
out of Interlinked Data
Linked Enterprise Intra Data Webs fill the gap
between Intra-/Extranets and EIS/ERP
Unstructured Information
Management
Structured Information
Management
Support the long tail of enterprise information domains
• Human-resources
• Requirements engineering
• Supply-chains
34. Creating Knowledge
out of Interlinked Data
• Linked Data is a promising technology for closing the
gap between SOA and unstructured information
management
• wealth of knowledge available as LOD can be
leveraged as background knowledge for Enterprise
applications
• The application of Linked Data in the enterprise is still
largely unexplored (opportunity)
• Linked Data will make Enterprise Information Integration
more flexible, iterative, cost effective
Take home messages
Auer, Frischmuth, Klímek, Tramp, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration
Submitted to Semantic Web Journal.
35. Creating Knowledge
out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.informatik.uni-leipzig.de/~auer | http://aksw.org | http://lod2.org
auer@cs.uni-bonn.de