SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Empirical Semantics
modelling knowledge as it is,
not as it should be
Frank van Harmelen
Vrije Universiteit Amsterdam
Creative Commons License
CC BY 3.0:
Allowed to copy, redistribute
remix & transform
But must attribute
1
Many thanks to all at
KR&R@VU: Wouter Beek, Joe
Raad, Peter Bloem, Stefan
Schlobach, Zhisheng Huang,
and many others over the years
The ‘K’ in ‘Semantic Web’
stands for ‘Knowledge’
Frank van Harmelen
Vrije Universiteit Amsterdam
Creative Commons License
CC BY 3.0:
Allowed to copy, redistribute
remix & transform
But must attribute
2
Many thanks to all at
KR&R@VU: Wouter Beek, Joe
Raad, Peter Bloem, Stefan
Schlobach, Zhisheng Huang,
and many others over the years
Prescriptive Semantics
versus
Descriptive Semantics
3
Formal Semantics
versus
Empirical Semantics
4
OWL Semantics fits on one A4
OWL Semantics fits on one A4
• The world consists of
– Objects (“individuals”)
– Sets of objects (“types”)
– Pairs of objects (“relations”)
• The world can be described by operations of
these sets: 𝑇1 ∪ 𝑇2, 𝑇1 ∩ 𝑇2, 𝑇1 T2
7
8
Empirical Semantics
requires:
observing knowledge
at scale
9
So we need an
observational
instrument
LOD Laundromat
Beek & Rietveld et al. 2014,
LOD laundromat: a uniform way of
publishing other people's dirty data
http://lodlaundromat.org/pdf/lodla
undry.pdf
HDT
Fernández & Martínez-Prieto &
Gutiérrez, 2013, Binary RDF
representation for publication and
exchange (HDT)
LDF
Verborgh & Vander Sande et al.
2014, Web-Scale Querying through
Linked Data Fragments
LOD-a-lot
1 file
28,362,198,927 unique triples
>650K data documents
LDF queries in real time
Surprisingly efficient
524 GB of disk space
16 GB of RAM
Only 144 secs loading time
Only €305,- hardware cost
Meta-Data for a lot of LOD
http://www.semantic-web-journal.net/content/meta-data-lot-lod-2
http://lod-a-lot.lod.labs.vu.nl/
Insights from
Empirical Semantics:
1. Identity correctness
14
Joe Raad Wouter Beek
owl:sameAs is not optional
15
But in practice
it’s broken under
the formal semantics
Meet our observatory:
http://SameAs.cc
• 559 million owl:sameAs statements
(we created an HDT file in 4 hours on 1 CPU core)
= 4.5GB + 2.2GB index)
• 50 million equivalence classes after inference
(5 hours on 2CPU cores; 9.3Gb disk only(!) RocksDB)
16
The largest equivalence class has 177.749 entities
and contains:
• Albert Einstein
• all countries of the world
• the empty string
Formal Semantics says:
This is obviously broken…. 17
Refl: ∀𝑥: (𝑥 = 𝑥)
Symm: ∀𝑥, 𝑦: (𝑥 = 𝑦) → (𝑦 = 𝑥)
Trans: ∀𝑥, 𝑦, 𝑧: 𝑥 = 𝑦 ∧ 𝑦 = 𝑧 → (𝑥 = 𝑧)
Oldest known
knowledge
graph 
(Pssss, this is not a new problem…)
18
FatherSon
Holy
Spirit
A modern example: Barak Obama
A modern example: Barak Obama
Community 0
1. dbpedia.org/resource/B_hussein_obama
2. dbpedia.org/resource/Barack_H_Obama,_Jr
3. dbpedia.org/resource/Barak_hussein_obama
4. dbpedia.org/resource/President_Barack
5. dbpedia.org/resource/Senator_Barack_Obama
6. dbpedia.org/resource/Obama
…
99. dbpedia.org/resource/Hussein_Obama
Community 3
1. dbpedia.org/resource/Presidency_of_Barack_Obama
2. dbpedia.org/resource/Barack_Obama_Administration
3. dbpedia.org/resource/Barack_Obama_Cabinet
4. dbpedia.org/resource/Obama_White_House
5. dbpedia.org/resource/Obama_regime
6. dbpedia.org/resource/America_under_Obama
…
52. dbpedia.org/resource/Presidential_transition_of_Barac
k_Obama
Debugging identity
by community detection
Communities correspond to roles:
- Person
- Senator
- President
- Government
Message from Empirical
Semantics
It’s not the users that got owl:sameAs wrong,
It’s the formal semantics that got reality wrong
Challenge:
What alternative semantic model of equality
would fit the empirically observed usage better?
Insights from
Empirical Semantics:
2. Meaningful names
23
Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016)
http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
Symbols or words?
(or: blasphemy for logicians)
Formal Semantics says:
Symbol names are supposed to be meaningless
Aspirin headache
analgesic pain
symptomdrug
treats
treats
Measure mutual information content
between URL-string and semantics
E(x) = efficient encoding of x,
If x  y then E(x+y)  E(x) else E(x+y)  E(x)+E(y)
Mutual information content
M(x,y) =E(x) + E(y) – E(x+y)
Take x = symbol name of x as a string
Take 𝑦1 = types of x (≈ semantics of x)
Take 𝑦2 = properties of x (≈ semantics of x)
Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols
in 600k datasets
But URL-strings do encode meaning!
Fraction of datasets with redundancy for types/predicates
at significance level > 0.99
BTW, this is 600.000 datapoints (RDF docs)
Properties
Types
Message from Empirical
Semantics
Users shouldn’t stop using meaningful names,
Formal semantics should capture their meaning
Challenge:
What alternative semantic models
could capture meaningful names?
Zhisheng Huang
(ISWC 2008)
Insights from
Empirical Semantics:
3. Meaningful names
for local consistency
Knowledge will be inconsistent
Because of:
• Homonyms
• Different ontological models
• migration from legacy data
• integration of multiple sources
• ….
Inconsistency through migration
DICE terminology,
in daily use at Amsterdam Medical Centre
for registration of Intensive Care patients
• Brain  CentralNervousSystem
• Brain  BodyPart
• CentralNervousSystem  NervousSystem
• BodyPart  NervousSystem
Inconsistency through automated learning
• Reservoir  Lake
• Lake  WaterRegion
• Reservoir  HydrographicStructure
• HydrographicStrure  Facility
• Disjoint(WaterRegion, Facility),
100% expert agreement
on this disjointness….
Inconsistency through merging
SUMO(1000) + CYC(1.6M) → 6000 inconsistencies…
Local consistency
s(T,,2)s(T,,0)s(T,,1)
But… how to define s(T,𝜙,n)?
Symbols as words
Waterregion
basin Lake
Reservoir
H. structure
Facility
Google Distance
(symbols as words!)
Reservoir  Lake
Lake  WaterRegion
Reservoir  HydrographicStructure
HydrographicStrure  Facility
Disjoint(WaterRegion, Facility)
Google Distance for selection function in
local consistency reasoning
ISWC08
Formal
Semantics
says: this isn’t
supposed to
work!
Insight from
Empirical Semantics
Users shouldn’t stop using meaningful names,
Formal semantics should capture their meaning
Challenge:
What alternative semantic models
would capture meaningful names?
Challenge for
Empirical Semantics:
4. network structures
for different predicates
Tobias Kuhn Wouter Beek
http://ceur-ws.org/Vol-1946/paper-05.pdf
skos:exactMatch
foaf:knows
osspr:contains
Geopolitics:hasborderWith
Message from
Empirical Semantics
None of these patterns have any semantic impact
(you can’t even detect them under the traditional semantics)
Challenge:
What alternative semantic models would
take such different patterns into account?
So what…
So what #1 (pragmatic)
• We now have larger KB’s than ever before
• We now have the instruments
to observe and analyse these very large KB’s
• We can use these insights for better tools:
– query & inference
– publish & maintain
– visualise & explain
– …
My secret hope is that this will help us
to understand the patterns of knowledge:
Not a prescriptive theory of
what knowledge should be,
But a descriptive theory of
what knowledge is actually like
So what #2 (pretentious)

Weitere ähnliche Inhalte

Was ist angesagt?

An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQLOlaf Hartig
 
Welcome to the FOLIO Community
Welcome to the FOLIO CommunityWelcome to the FOLIO Community
Welcome to the FOLIO CommunitySimeon Warner
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial_Emw
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic webR A Akerkar
 
LOD (linked open data) part 2 lod 구축과 현황
LOD (linked open data) part 2   lod 구축과 현황LOD (linked open data) part 2   lod 구축과 현황
LOD (linked open data) part 2 lod 구축과 현황LiST Inc
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Using MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - Berlin
Using MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - BerlinUsing MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - Berlin
Using MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - BerlinSébastien Le Marchand
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성Hansung University
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Fabien Gandon
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 

Was ist angesagt? (20)

An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Welcome to the FOLIO Community
Welcome to the FOLIO CommunityWelcome to the FOLIO Community
Welcome to the FOLIO Community
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Invisible Web
Invisible Web Invisible Web
Invisible Web
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic web
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
LOD (linked open data) part 2 lod 구축과 현황
LOD (linked open data) part 2   lod 구축과 현황LOD (linked open data) part 2   lod 구축과 현황
LOD (linked open data) part 2 lod 구축과 현황
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
CKAN overview
CKAN overviewCKAN overview
CKAN overview
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Using MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - Berlin
Using MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - BerlinUsing MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - Berlin
Using MyBatis in Alfresco custom extensions - Alfresco Devcon 2012 - Berlin
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
Slsh
SlshSlsh
Slsh
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 

Ähnlich wie Empirical Semantics

From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgeFranz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgetaxonbytes
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...taxonbytes
 
Iconclass aat cidoc 2017 tbilisi
Iconclass aat cidoc 2017 tbilisiIconclass aat cidoc 2017 tbilisi
Iconclass aat cidoc 2017 tbilisiReem Weda
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...Neo4j
 
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgVocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgKeith.May
 
Franz 2017 sols cbs seminar the limits of synthesis for integrative biology
Franz 2017 sols cbs seminar the limits of synthesis for integrative biologyFranz 2017 sols cbs seminar the limits of synthesis for integrative biology
Franz 2017 sols cbs seminar the limits of synthesis for integrative biologytaxonbytes
 
Using the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to itUsing the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to itMathieu d'Aquin
 
Reuse of Ontology Mappings
Reuse of Ontology MappingsReuse of Ontology Mappings
Reuse of Ontology MappingsAnika Groß
 
Big Data Case Studies
Big Data Case Studies Big Data Case Studies
Big Data Case Studies UIResearchPark
 
A Sightseeing Tour of Provenance in Databases & Workflows
A Sightseeing Tour of Provenance in Databases & WorkflowsA Sightseeing Tour of Provenance in Databases & Workflows
A Sightseeing Tour of Provenance in Databases & WorkflowsBertram Ludäscher
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysisDirk Roorda
 

Ähnlich wie Empirical Semantics (20)

From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgeFranz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
 
Iconclass aat cidoc 2017 tbilisi
Iconclass aat cidoc 2017 tbilisiIconclass aat cidoc 2017 tbilisi
Iconclass aat cidoc 2017 tbilisi
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
 
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgVocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
 
Franz 2017 sols cbs seminar the limits of synthesis for integrative biology
Franz 2017 sols cbs seminar the limits of synthesis for integrative biologyFranz 2017 sols cbs seminar the limits of synthesis for integrative biology
Franz 2017 sols cbs seminar the limits of synthesis for integrative biology
 
Using the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to itUsing the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to it
 
Reuse of Ontology Mappings
Reuse of Ontology MappingsReuse of Ontology Mappings
Reuse of Ontology Mappings
 
Big Data Case Studies
Big Data Case Studies Big Data Case Studies
Big Data Case Studies
 
A Sightseeing Tour of Provenance in Databases & Workflows
A Sightseeing Tour of Provenance in Databases & WorkflowsA Sightseeing Tour of Provenance in Databases & Workflows
A Sightseeing Tour of Provenance in Databases & Workflows
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysis
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 

Mehr von Frank van Harmelen

Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Frank van Harmelen
 
Modular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyModular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyFrank van Harmelen
 
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Frank van Harmelen
 
Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Frank van Harmelen
 
The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)Frank van Harmelen
 
On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...Frank van Harmelen
 
The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)Frank van Harmelen
 
Linked Open Data for Medical Guidelines Interactions
Linked Open Data for Medical  Guidelines InteractionsLinked Open Data for Medical  Guidelines Interactions
Linked Open Data for Medical Guidelines InteractionsFrank van Harmelen
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Semantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years agoSemantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years agoFrank van Harmelen
 
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Frank van Harmelen
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural scienceFrank van Harmelen
 
4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic WebFrank van Harmelen
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Frank van Harmelen
 
Ontology mapping needs context & approximation
Ontology mapping needs context & approximationOntology mapping needs context & approximation
Ontology mapping needs context & approximationFrank van Harmelen
 
Ontology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel TowerOntology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel TowerFrank van Harmelen
 

Mehr von Frank van Harmelen (20)

Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)
 
Modular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyModular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxology
 
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019
 
Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019
 
The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)
 
On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...
 
The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)
 
Linked Open Data for Medical Guidelines Interactions
Linked Open Data for Medical  Guidelines InteractionsLinked Open Data for Medical  Guidelines Interactions
Linked Open Data for Medical Guidelines Interactions
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Semantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years agoSemantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years ago
 
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
 
4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
Het slimme Web 3.0
Het slimme Web 3.0Het slimme Web 3.0
Het slimme Web 3.0
 
OWL briefing
OWL briefingOWL briefing
OWL briefing
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...
 
Ontology mapping needs context & approximation
Ontology mapping needs context & approximationOntology mapping needs context & approximation
Ontology mapping needs context & approximation
 
Ontology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel TowerOntology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel Tower
 

Kürzlich hochgeladen

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 

Kürzlich hochgeladen (20)

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

Empirical Semantics

  • 1. Empirical Semantics modelling knowledge as it is, not as it should be Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License CC BY 3.0: Allowed to copy, redistribute remix & transform But must attribute 1 Many thanks to all at KR&R@VU: Wouter Beek, Joe Raad, Peter Bloem, Stefan Schlobach, Zhisheng Huang, and many others over the years
  • 2. The ‘K’ in ‘Semantic Web’ stands for ‘Knowledge’ Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License CC BY 3.0: Allowed to copy, redistribute remix & transform But must attribute 2 Many thanks to all at KR&R@VU: Wouter Beek, Joe Raad, Peter Bloem, Stefan Schlobach, Zhisheng Huang, and many others over the years
  • 5. OWL Semantics fits on one A4
  • 6. OWL Semantics fits on one A4 • The world consists of – Objects (“individuals”) – Sets of objects (“types”) – Pairs of objects (“relations”) • The world can be described by operations of these sets: 𝑇1 ∪ 𝑇2, 𝑇1 ∩ 𝑇2, 𝑇1 T2
  • 7. 7
  • 8. 8
  • 10. So we need an observational instrument
  • 11.
  • 12. LOD Laundromat Beek & Rietveld et al. 2014, LOD laundromat: a uniform way of publishing other people's dirty data http://lodlaundromat.org/pdf/lodla undry.pdf HDT Fernández & Martínez-Prieto & Gutiérrez, 2013, Binary RDF representation for publication and exchange (HDT) LDF Verborgh & Vander Sande et al. 2014, Web-Scale Querying through Linked Data Fragments
  • 13. LOD-a-lot 1 file 28,362,198,927 unique triples >650K data documents LDF queries in real time Surprisingly efficient 524 GB of disk space 16 GB of RAM Only 144 secs loading time Only €305,- hardware cost Meta-Data for a lot of LOD http://www.semantic-web-journal.net/content/meta-data-lot-lod-2 http://lod-a-lot.lod.labs.vu.nl/
  • 14. Insights from Empirical Semantics: 1. Identity correctness 14 Joe Raad Wouter Beek
  • 15. owl:sameAs is not optional 15 But in practice it’s broken under the formal semantics
  • 16. Meet our observatory: http://SameAs.cc • 559 million owl:sameAs statements (we created an HDT file in 4 hours on 1 CPU core) = 4.5GB + 2.2GB index) • 50 million equivalence classes after inference (5 hours on 2CPU cores; 9.3Gb disk only(!) RocksDB) 16
  • 17. The largest equivalence class has 177.749 entities and contains: • Albert Einstein • all countries of the world • the empty string Formal Semantics says: This is obviously broken…. 17 Refl: ∀𝑥: (𝑥 = 𝑥) Symm: ∀𝑥, 𝑦: (𝑥 = 𝑦) → (𝑦 = 𝑥) Trans: ∀𝑥, 𝑦, 𝑧: 𝑥 = 𝑦 ∧ 𝑦 = 𝑧 → (𝑥 = 𝑧)
  • 18. Oldest known knowledge graph  (Pssss, this is not a new problem…) 18 FatherSon Holy Spirit
  • 19. A modern example: Barak Obama
  • 20. A modern example: Barak Obama
  • 21. Community 0 1. dbpedia.org/resource/B_hussein_obama 2. dbpedia.org/resource/Barack_H_Obama,_Jr 3. dbpedia.org/resource/Barak_hussein_obama 4. dbpedia.org/resource/President_Barack 5. dbpedia.org/resource/Senator_Barack_Obama 6. dbpedia.org/resource/Obama … 99. dbpedia.org/resource/Hussein_Obama Community 3 1. dbpedia.org/resource/Presidency_of_Barack_Obama 2. dbpedia.org/resource/Barack_Obama_Administration 3. dbpedia.org/resource/Barack_Obama_Cabinet 4. dbpedia.org/resource/Obama_White_House 5. dbpedia.org/resource/Obama_regime 6. dbpedia.org/resource/America_under_Obama … 52. dbpedia.org/resource/Presidential_transition_of_Barac k_Obama Debugging identity by community detection Communities correspond to roles: - Person - Senator - President - Government
  • 22. Message from Empirical Semantics It’s not the users that got owl:sameAs wrong, It’s the formal semantics that got reality wrong Challenge: What alternative semantic model of equality would fit the empirically observed usage better?
  • 23. Insights from Empirical Semantics: 2. Meaningful names 23 Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016) http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
  • 24. Symbols or words? (or: blasphemy for logicians) Formal Semantics says: Symbol names are supposed to be meaningless Aspirin headache analgesic pain symptomdrug treats treats
  • 25. Measure mutual information content between URL-string and semantics E(x) = efficient encoding of x, If x  y then E(x+y)  E(x) else E(x+y)  E(x)+E(y) Mutual information content M(x,y) =E(x) + E(y) – E(x+y) Take x = symbol name of x as a string Take 𝑦1 = types of x (≈ semantics of x) Take 𝑦2 = properties of x (≈ semantics of x) Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols in 600k datasets
  • 26. But URL-strings do encode meaning! Fraction of datasets with redundancy for types/predicates at significance level > 0.99 BTW, this is 600.000 datapoints (RDF docs) Properties Types
  • 27. Message from Empirical Semantics Users shouldn’t stop using meaningful names, Formal semantics should capture their meaning Challenge: What alternative semantic models could capture meaningful names?
  • 28. Zhisheng Huang (ISWC 2008) Insights from Empirical Semantics: 3. Meaningful names for local consistency
  • 29. Knowledge will be inconsistent Because of: • Homonyms • Different ontological models • migration from legacy data • integration of multiple sources • ….
  • 30. Inconsistency through migration DICE terminology, in daily use at Amsterdam Medical Centre for registration of Intensive Care patients • Brain  CentralNervousSystem • Brain  BodyPart • CentralNervousSystem  NervousSystem • BodyPart  NervousSystem
  • 31. Inconsistency through automated learning • Reservoir  Lake • Lake  WaterRegion • Reservoir  HydrographicStructure • HydrographicStrure  Facility • Disjoint(WaterRegion, Facility), 100% expert agreement on this disjointness…. Inconsistency through merging SUMO(1000) + CYC(1.6M) → 6000 inconsistencies…
  • 33. Symbols as words Waterregion basin Lake Reservoir H. structure Facility Google Distance (symbols as words!)
  • 34. Reservoir  Lake Lake  WaterRegion Reservoir  HydrographicStructure HydrographicStrure  Facility Disjoint(WaterRegion, Facility) Google Distance for selection function in local consistency reasoning ISWC08 Formal Semantics says: this isn’t supposed to work!
  • 35. Insight from Empirical Semantics Users shouldn’t stop using meaningful names, Formal semantics should capture their meaning Challenge: What alternative semantic models would capture meaningful names?
  • 36. Challenge for Empirical Semantics: 4. network structures for different predicates Tobias Kuhn Wouter Beek http://ceur-ws.org/Vol-1946/paper-05.pdf
  • 41. Message from Empirical Semantics None of these patterns have any semantic impact (you can’t even detect them under the traditional semantics) Challenge: What alternative semantic models would take such different patterns into account?
  • 43. So what #1 (pragmatic) • We now have larger KB’s than ever before • We now have the instruments to observe and analyse these very large KB’s • We can use these insights for better tools: – query & inference – publish & maintain – visualise & explain – …
  • 44. My secret hope is that this will help us to understand the patterns of knowledge: Not a prescriptive theory of what knowledge should be, But a descriptive theory of what knowledge is actually like So what #2 (pretentious)