SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Contextualized Knowledge Graph
from two perspectives
Semantic Web and Graph Database
with an application in
Presenter: Vinh Nguyen
2
3
What is Knowledge Graph?
10/25/2018 4
What is Knowledge Graph?
10/25/2018 5
What is Contextualized Knowledge Graph?
10/25/2018 6
A contextualized knowledge graph is a knowledge graph in which
every fact is qualified with a set of contextual properties.
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Bob Dylan marriedTo Carolyn Dennis 1986-06-## 1992-10-##
Motivation Scenario
Facts:
Meta Queries:
Query type Sample query
Provenance P1. Where is this fact from?
P2. When was it created?
P3. Who created this fact?
Time T1. When did this fact occur?
T2. What is the time span of this fact?
T3. Which events happened in the same year?
Location L1. What is the location associated with this fact?
L2. Which events happened at the same place?
Certainty C1. What is the author confidence of this fact?
7
Subject Predicate Object
Bob Dylan marriedTo Sarah Lownds
Bob Dylan marriedTo Carolyn Dennis
8
Contextualized Knowledge Graph
from
Semantic Web perspective
9
2973 datasets with 149 billion triples
Linked Data principles
Use URIs as names
Use HTTP URLs to be looked up
URI provides useful info using
standard
Include links to other URIs to
discover more
10
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
RDF Reification
Form of Triples: RDF Reification
Pros:
1. Intuitive, easy to understand
Cons:
1. Takes 3N triples (4N if including
Statement typing) to represent a
statement => Not scalable
2. No formal semantics defined =>
Semantics is unclear
3. Discouraged in LOD!
Time-aware Facts:
11
Subject Predicate Object
#stmt1 type Statement
#stmt1 hasSubject BobDylan
#stmt1 hasProperty marriedTo
#stmt1 hasObject Sara Lownds
Bob Dylan marriedTo Sarah Lownds
#stmt1 starts 1965-11-22
#stmt1 ends 1977-06-29
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
RDF Reification
RDF Reification vs. Singleton Property
Time-aware Facts:
Subject Predicate Object
#stmt1 type Statement
#stmt1 hasSubject BobDylan
#stmt1 hasProperty marriedTo
#stmt1 hasObject Sara Lownds
Bob Dylan marriedTo Sarah Lownds
#stmt1 starts 1965-11-22
#stmt1 ends 1977-06-29
Subject Predicate Object
marriedTo#1 rdf:sp marriedTo
BobDylan marriedTo#1 Sarah Lownds
marriedTo#1 starts 1965-11-22
marriedTo#1 ends 1977-06-29
Singleton Property
12
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements
using singleton property." In Proceedings of the 23rd international conference on World wide web, pp. 759-770. ACM,
2014.
Subject Predicate Object Source DateExtracted
Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07
Form of Triples: PaCE
Pros:
1. Save ~50% number of triples
compared to reification thanks
to the repeated subject,
predicate, and object.
Cons:
1. Not intuitive, hard to
understand
2. Limited expressiveness
Provenance-aware Facts:
13
Provenance-aware Context Entity
Subject Predicate Object
BobDylan_wp rdf:type Bob Dylan
SaraLownds_wp rdf:type Sara Lownds
BobDylan_wp marriedTo SaraLownds_wp
BobDylan_wp hasSource wiki:Bob_Dylan
BobDylan_wp hasDateExt 2009-06-07
Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth, and Krishnaprasad Thirunarayan. 2010. Provenance
context entity (PaCE): scalable provenance tracking for scientific RDF data. In Proceedings of the 22nd international
conference on Scientific and statistical database management (SSDBM'10),
Subject Predicate Object Source DateExtracted
Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07
Provenance-aware Context Entity
Subject Predicate Object
BobDylan_wp rdf:type Bob Dylan
SaraLownds_wp rdf:type Sara Lownds
BobDylan_wp marriedTo SaraLownds_wp
BobDylan_wp hasSource wiki:Bob_Dylan
BobDylan_wp hasDateExt 2009-06-07
Facts and Provenance:
14
PaCE vs. Singleton Property
Subject Predicate Object
marriedTo#1 rdf:sp marriedTo
BobDylan marriedTo#1 Sarah Lownds
marriedTo#1 hasSource wp:Bob_Dylan
marriedTo#1 hasDateExt 2009-06-07
Singleton Property
Form of Quadruples: Named Graph
Pros:
1. Intuitive --creating # named graphs
for # sources
2. Attach metadata for a set of triples
3. SPARQL supported
Cons:
1. Defined for provenance only
2. Ambiguous semantics while
associating different types of
metadata at triple level
Time-aware Facts:
* Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
15
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Named Graph
Subject Predicate Object NG
Bob Dylan marriedTo Sarah Lownds ng_1
ng_1 starts 1965-11-22 Prov_graph
ng_2 ends 1977-06-29 Prov_graph
Named Graph
Subject Predicate Object NG
Bob Dylan marriedTo Sarah Lownds ng_1
ng_1 starts 1965-11-22 Prov_graph
ng_2 ends 1977-06-29 Prov_graph
Time-aware Facts:
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Named Graph vs. Singleton Property
Subject Predicate Object
marriedTo#1 rdf:sp marriedTo
Bob Dylan marriedTo#1 Sarah Lownds
marriedTo#1 starts 1965-11-22
marriedTo#1 ends 1977-06-29 16
Singleton Property
RDF+:
Subject Predicate Object Meta Property Meta value
Bob Dylan marriedTo Sarah Lownds starts 1965-11-22
Bob Dylan marriedTo Sarah Lownds ends 1977-06-29
Form of Quintuples: RDF+
Cons:
1. The representation is not in the form of RDF. Statement identifiers are used
internally. Require the mappings from RDF to RDF+ and vice versa.
2. The SPARQL query syntax and semantics need to be extended to support RDF+
Facts and Temporal Information:
* Dividino, Renata, et al. "Querying for provenance, trust, uncertainty and other meta knowledge in RDF." Web
Semantics: Science, Services and Agents on the World Wide Web 7.3 (2009): 204-219.
17
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Experiment: BKR with Provenance
All datasets are available at http://wiki.knoesis.org/index.php/Singleton_Property 20
• Five data sets generated from the same seed BKR
 Singleton Property (SP)
 Reification (R)
 PaCE C1 (C1)
 PaCE C2 (C2)
 PaCE C3 (C3)
Experiment Results
(A) random-value queries vs. fixed-value queries in msec.
(B) query length and execution time in msec. 21
• Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit
Sheth, Olivier Bodenreider, Michel Dumontier. Exposing provenance metadata using
different RDF models. In Proceedings of Semantic Web Applications and Tools for
Life Science (SWAT4LS), 2016.
https://pubchem.ncbi.nlm.nih.gov/
• Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works
well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
• Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther
Vidal. "Evaluation of Metadata Representations in RDF stores.”
• Daniel Hernández, Aidan Hogan, Cristian Riveros, Carlos Rojas, Enzo Zerega:
Querying Wikidata: Comparing SPARQL, Relational and Graph Databases.
International Semantic Web Conference (2) 2016: 88-103
22
External Evaluation
Subject Predicate Object Source FromDataset Confidence
CID5280961(Genistein) inhibits GID2100(ESR2) PMID12502307 ChemBL
CID5757(Estradiol) activates GID2100(ESR2) PMID19128016 ChemBL
10/25/2018
Exposing provenance metadata using different RDF models
Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier
Model I Model II Model III Model IV Model V
22,787,218 21,445,348 19,575,298 17,239,427 27,605,782
24
PubChem
• Five data sets generated from the same seed
 N-ary with cardinal assertion (Model I)
 N-ary without cardinal assertion (Model II)
 Singleton property with cardinal assertion (Model III)
 Singleton property without cardinal assertion (Model IV)
 NanoPublication (Model V)
• Comparing sizes of generated datasets
 SP datasets are the most compact ones
Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier
Bodenreider, Michel Dumontier. Exposing provenance metadata using different RDF models. In
Proceedings of Semantic Web Applications and Tools for Life Science (SWAT4LS), 2016.
25
PubChem
• Query performance in secs
 SP models (III and IV) outperforms other models in Virtuoso
26
PubChem (cont)
27
WikiData
• Four data sets generated from the same seed
 Standard Reification (SR)
 N-ary relation (NR)
 Singleton property (SP)
 Named Graph (NG)
• Comparing sizes of generated datasets
 SP dataset is the most compact one
Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with
wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
28
WikiData
• Query performance in 4store and GraphDB
 SP models are not supported by 4store and GraphDB
• Query performance in Virtuoso and BlazeGraph
 Reification and NG are well-supported by Virtuoso and
BlazeGraph
 SP is little faster than NR in Virtuoso, slower in BlazeGraph
29
WikiData
• Six data sets generated from the same seed
 Standard Reification (stdreif)
 N-ary relation (naryrel)
 Singleton property (sgprop)
 Companion property (cpprop)
 Named Graph (ngraphs)
 RDF* (rdr)
• Comparing sizes of generated datasets
 SP dataset is the most compact triple representation
 Fastest in loading time for WikiData
 Best query performance for StarDog in all cases
 Slowest in Virtuoso but not by much for WikiData queries
 Not encounter performance issues with SP
Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther Vidal. "Evaluation of
Metadata Representations in RDF stores."
30
Experimental Comparison
• Dataset size
 SP offers the most concise representation in all cases
• Query performance
 SP performs reasonably well in Virtuoso, best in StarDog, OK in
BlazeGraph
 SP may have the potential for the performance gain if
supported and optimized by the query engines
Is SP representation optimal?
31
Contextualized Knowledge Graph
from
Graph Database perspective
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Bob Dylan marriedTo Carolyn Dennis 1986-06-## 1992-10-##
Property Graph
Facts:
32
Subject Predicate Object
Bob Dylan marriedTo Sarah Lownds
Bob Dylan marriedTo Carolyn Dennis
Name: CarolynDennisName: SaraLownds
2 3
Name: BobDylan
1
marriededTo marriededTo
Starts: 1965-11-22
Ends: 1977-06-29
Starts: 1986-06-##
Ends: 1992-10-##
33
Contextualized Knowledge Graph
with an application in
10/25/2018
Neighbor: only available through REST interface
10/25/2018 35
PubChem Neighbor
10/25/2018 36
Current PubChem Neighbor
• Number of links
 92,000,000 * 92,000,000 / 2 = 4.232 * 10^15
 4 quadrillion
• Challenges
⨯ Number of triples increases to quadrillion
⨯ SPARQL query processing for Quadrillion triples
• Is it worth?
 Chemical similarity is one of the most important concept in
chemoinformatics
 Similar compounds have similar properties
10/25/2018
Current PubChem Neighbor
Subject Predicate Object
nbr:CID1_CID2_2DSim has_measurement_value nbr:CID1_CID2_2DTanimotoScore
nbr:CID1_CID2_2DSim refers_to compound:CID1
nbr:CID1_CID2_2DSim refers_to compound:CID2
nbr:CID1_CID2_2DSim type pcvocab:PC2D_structural_similarity
nbr:CID1_CID2_2DTanimotoScore has_value 0.91^^xsd:float
nbr:CID1_CID2_2DTanimotoScore Is_output_of sio:CHEMINF_000333
nbr:CID1_CID2_2DTanimotoScore type pcvocab:PC2D_Fingerprint_TanimotorScore
10/25/2018
1 neighbor link: 7 triples
compound:CID1 sio:CHEMINF_000482 compound:CID2
4 quadrillion x 7 = 28 quadrillion triples
PubChem Neighbor using CKG Model
10/25/2018
Subject Predicate Object
nbr:CID1_CID2_2DSim has_measurement_value nbr:CID1_CID2_2DTanimotoScore
nbr:CID1_CID2_2DSim refers_to compound:CID1
nbr:CID1_CID2_2DSim refers_to compound:CID2
nbr:CID1_CID2_2DSim type pcvocab:PC2D_structural_similarity
nbr:CID1_CID2_2DTanimotoScore has_value 0.91^^xsd:float
nbr:CID1_CID2_2DTanimotoScore Is_output_of sio:CHEMINF_000333
nbr:CID1_CID2_2DTanimotoScore type pcvocab:PC2D_Fingerprint_TanimotorScore
Subject Predicate Object
compound:CID1 has_structural_similarity?sp=1&ds=pc&is_output_of=sio:CHEMINF_00
0333&has_2d_tanimoto_score=0.91^^xsd
compound:CID1
1 neighbor link: 1 triple
4 quadrillion x 1 = 4 quadrillion triples
10/25/2018
< 20 billion CKG triples

Weitere ähnliche Inhalte

Ähnlich wie Contextualized Knowledge Graph from two perspectives: Semantic Web and Graph Database with an application in PubChem

IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
Dr.-Ing. Thomas Hartmann
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
Dhavalkumar Thakker
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain
 

Ähnlich wie Contextualized Knowledge Graph from two perspectives: Semantic Web and Graph Database with an application in PubChem (20)

NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Radically Open Cultural Heritage Data on the Web
Radically Open Cultural Heritage Data on the WebRadically Open Cultural Heritage Data on the Web
Radically Open Cultural Heritage Data on the Web
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Linked data experiments at the National Library of Scotland / Alexandra De Pr...
Linked data experiments at the National Library of Scotland / Alexandra De Pr...Linked data experiments at the National Library of Scotland / Alexandra De Pr...
Linked data experiments at the National Library of Scotland / Alexandra De Pr...
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
RDF presentation at DrupalCon San Francisco 2010
RDF presentation at DrupalCon San Francisco 2010RDF presentation at DrupalCon San Francisco 2010
RDF presentation at DrupalCon San Francisco 2010
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the Unknown
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
The Next Decade in Web Design
The Next Decade in Web DesignThe Next Decade in Web Design
The Next Decade in Web Design
 
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Big data search
Big data search Big data search
Big data search
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
NCompass Live: RDA: Are We There Yet?
NCompass Live: RDA: Are We There Yet?NCompass Live: RDA: Are We There Yet?
NCompass Live: RDA: Are We There Yet?
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Contextualized Knowledge Graph from two perspectives: Semantic Web and Graph Database with an application in PubChem

  • 1. Contextualized Knowledge Graph from two perspectives Semantic Web and Graph Database with an application in Presenter: Vinh Nguyen
  • 2. 2
  • 3. 3
  • 4. What is Knowledge Graph? 10/25/2018 4
  • 5. What is Knowledge Graph? 10/25/2018 5
  • 6. What is Contextualized Knowledge Graph? 10/25/2018 6 A contextualized knowledge graph is a knowledge graph in which every fact is qualified with a set of contextual properties.
  • 7. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Bob Dylan marriedTo Carolyn Dennis 1986-06-## 1992-10-## Motivation Scenario Facts: Meta Queries: Query type Sample query Provenance P1. Where is this fact from? P2. When was it created? P3. Who created this fact? Time T1. When did this fact occur? T2. What is the time span of this fact? T3. Which events happened in the same year? Location L1. What is the location associated with this fact? L2. Which events happened at the same place? Certainty C1. What is the author confidence of this fact? 7 Subject Predicate Object Bob Dylan marriedTo Sarah Lownds Bob Dylan marriedTo Carolyn Dennis
  • 9. 9 2973 datasets with 149 billion triples Linked Data principles Use URIs as names Use HTTP URLs to be looked up URI provides useful info using standard Include links to other URIs to discover more
  • 10. 10
  • 11. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 RDF Reification Form of Triples: RDF Reification Pros: 1. Intuitive, easy to understand Cons: 1. Takes 3N triples (4N if including Statement typing) to represent a statement => Not scalable 2. No formal semantics defined => Semantics is unclear 3. Discouraged in LOD! Time-aware Facts: 11 Subject Predicate Object #stmt1 type Statement #stmt1 hasSubject BobDylan #stmt1 hasProperty marriedTo #stmt1 hasObject Sara Lownds Bob Dylan marriedTo Sarah Lownds #stmt1 starts 1965-11-22 #stmt1 ends 1977-06-29
  • 12. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 RDF Reification RDF Reification vs. Singleton Property Time-aware Facts: Subject Predicate Object #stmt1 type Statement #stmt1 hasSubject BobDylan #stmt1 hasProperty marriedTo #stmt1 hasObject Sara Lownds Bob Dylan marriedTo Sarah Lownds #stmt1 starts 1965-11-22 #stmt1 ends 1977-06-29 Subject Predicate Object marriedTo#1 rdf:sp marriedTo BobDylan marriedTo#1 Sarah Lownds marriedTo#1 starts 1965-11-22 marriedTo#1 ends 1977-06-29 Singleton Property 12 Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property." In Proceedings of the 23rd international conference on World wide web, pp. 759-770. ACM, 2014.
  • 13. Subject Predicate Object Source DateExtracted Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07 Form of Triples: PaCE Pros: 1. Save ~50% number of triples compared to reification thanks to the repeated subject, predicate, and object. Cons: 1. Not intuitive, hard to understand 2. Limited expressiveness Provenance-aware Facts: 13 Provenance-aware Context Entity Subject Predicate Object BobDylan_wp rdf:type Bob Dylan SaraLownds_wp rdf:type Sara Lownds BobDylan_wp marriedTo SaraLownds_wp BobDylan_wp hasSource wiki:Bob_Dylan BobDylan_wp hasDateExt 2009-06-07 Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth, and Krishnaprasad Thirunarayan. 2010. Provenance context entity (PaCE): scalable provenance tracking for scientific RDF data. In Proceedings of the 22nd international conference on Scientific and statistical database management (SSDBM'10),
  • 14. Subject Predicate Object Source DateExtracted Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07 Provenance-aware Context Entity Subject Predicate Object BobDylan_wp rdf:type Bob Dylan SaraLownds_wp rdf:type Sara Lownds BobDylan_wp marriedTo SaraLownds_wp BobDylan_wp hasSource wiki:Bob_Dylan BobDylan_wp hasDateExt 2009-06-07 Facts and Provenance: 14 PaCE vs. Singleton Property Subject Predicate Object marriedTo#1 rdf:sp marriedTo BobDylan marriedTo#1 Sarah Lownds marriedTo#1 hasSource wp:Bob_Dylan marriedTo#1 hasDateExt 2009-06-07 Singleton Property
  • 15. Form of Quadruples: Named Graph Pros: 1. Intuitive --creating # named graphs for # sources 2. Attach metadata for a set of triples 3. SPARQL supported Cons: 1. Defined for provenance only 2. Ambiguous semantics while associating different types of metadata at triple level Time-aware Facts: * Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005. 15 Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Named Graph Subject Predicate Object NG Bob Dylan marriedTo Sarah Lownds ng_1 ng_1 starts 1965-11-22 Prov_graph ng_2 ends 1977-06-29 Prov_graph
  • 16. Named Graph Subject Predicate Object NG Bob Dylan marriedTo Sarah Lownds ng_1 ng_1 starts 1965-11-22 Prov_graph ng_2 ends 1977-06-29 Prov_graph Time-aware Facts: Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Named Graph vs. Singleton Property Subject Predicate Object marriedTo#1 rdf:sp marriedTo Bob Dylan marriedTo#1 Sarah Lownds marriedTo#1 starts 1965-11-22 marriedTo#1 ends 1977-06-29 16 Singleton Property
  • 17. RDF+: Subject Predicate Object Meta Property Meta value Bob Dylan marriedTo Sarah Lownds starts 1965-11-22 Bob Dylan marriedTo Sarah Lownds ends 1977-06-29 Form of Quintuples: RDF+ Cons: 1. The representation is not in the form of RDF. Statement identifiers are used internally. Require the mappings from RDF to RDF+ and vice versa. 2. The SPARQL query syntax and semantics need to be extended to support RDF+ Facts and Temporal Information: * Dividino, Renata, et al. "Querying for provenance, trust, uncertainty and other meta knowledge in RDF." Web Semantics: Science, Services and Agents on the World Wide Web 7.3 (2009): 204-219. 17 Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
  • 18. Experiment: BKR with Provenance All datasets are available at http://wiki.knoesis.org/index.php/Singleton_Property 20 • Five data sets generated from the same seed BKR  Singleton Property (SP)  Reification (R)  PaCE C1 (C1)  PaCE C2 (C2)  PaCE C3 (C3)
  • 19. Experiment Results (A) random-value queries vs. fixed-value queries in msec. (B) query length and execution time in msec. 21
  • 20. • Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier. Exposing provenance metadata using different RDF models. In Proceedings of Semantic Web Applications and Tools for Life Science (SWAT4LS), 2016. https://pubchem.ncbi.nlm.nih.gov/ • Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47. • Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther Vidal. "Evaluation of Metadata Representations in RDF stores.” • Daniel Hernández, Aidan Hogan, Cristian Riveros, Carlos Rojas, Enzo Zerega: Querying Wikidata: Comparing SPARQL, Relational and Graph Databases. International Semantic Web Conference (2) 2016: 88-103 22 External Evaluation
  • 21. Subject Predicate Object Source FromDataset Confidence CID5280961(Genistein) inhibits GID2100(ESR2) PMID12502307 ChemBL CID5757(Estradiol) activates GID2100(ESR2) PMID19128016 ChemBL 10/25/2018 Exposing provenance metadata using different RDF models Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier
  • 22. Model I Model II Model III Model IV Model V 22,787,218 21,445,348 19,575,298 17,239,427 27,605,782 24 PubChem • Five data sets generated from the same seed  N-ary with cardinal assertion (Model I)  N-ary without cardinal assertion (Model II)  Singleton property with cardinal assertion (Model III)  Singleton property without cardinal assertion (Model IV)  NanoPublication (Model V) • Comparing sizes of generated datasets  SP datasets are the most compact ones Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier. Exposing provenance metadata using different RDF models. In Proceedings of Semantic Web Applications and Tools for Life Science (SWAT4LS), 2016.
  • 23. 25 PubChem • Query performance in secs  SP models (III and IV) outperforms other models in Virtuoso
  • 25. 27 WikiData • Four data sets generated from the same seed  Standard Reification (SR)  N-ary relation (NR)  Singleton property (SP)  Named Graph (NG) • Comparing sizes of generated datasets  SP dataset is the most compact one Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
  • 26. 28 WikiData • Query performance in 4store and GraphDB  SP models are not supported by 4store and GraphDB • Query performance in Virtuoso and BlazeGraph  Reification and NG are well-supported by Virtuoso and BlazeGraph  SP is little faster than NR in Virtuoso, slower in BlazeGraph
  • 27. 29 WikiData • Six data sets generated from the same seed  Standard Reification (stdreif)  N-ary relation (naryrel)  Singleton property (sgprop)  Companion property (cpprop)  Named Graph (ngraphs)  RDF* (rdr) • Comparing sizes of generated datasets  SP dataset is the most compact triple representation  Fastest in loading time for WikiData  Best query performance for StarDog in all cases  Slowest in Virtuoso but not by much for WikiData queries  Not encounter performance issues with SP Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther Vidal. "Evaluation of Metadata Representations in RDF stores."
  • 28. 30 Experimental Comparison • Dataset size  SP offers the most concise representation in all cases • Query performance  SP performs reasonably well in Virtuoso, best in StarDog, OK in BlazeGraph  SP may have the potential for the performance gain if supported and optimized by the query engines Is SP representation optimal?
  • 30. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Bob Dylan marriedTo Carolyn Dennis 1986-06-## 1992-10-## Property Graph Facts: 32 Subject Predicate Object Bob Dylan marriedTo Sarah Lownds Bob Dylan marriedTo Carolyn Dennis Name: CarolynDennisName: SaraLownds 2 3 Name: BobDylan 1 marriededTo marriededTo Starts: 1965-11-22 Ends: 1977-06-29 Starts: 1986-06-## Ends: 1992-10-##
  • 32. 10/25/2018 Neighbor: only available through REST interface
  • 34. 10/25/2018 36 Current PubChem Neighbor • Number of links  92,000,000 * 92,000,000 / 2 = 4.232 * 10^15  4 quadrillion • Challenges ⨯ Number of triples increases to quadrillion ⨯ SPARQL query processing for Quadrillion triples • Is it worth?  Chemical similarity is one of the most important concept in chemoinformatics  Similar compounds have similar properties
  • 36. Current PubChem Neighbor Subject Predicate Object nbr:CID1_CID2_2DSim has_measurement_value nbr:CID1_CID2_2DTanimotoScore nbr:CID1_CID2_2DSim refers_to compound:CID1 nbr:CID1_CID2_2DSim refers_to compound:CID2 nbr:CID1_CID2_2DSim type pcvocab:PC2D_structural_similarity nbr:CID1_CID2_2DTanimotoScore has_value 0.91^^xsd:float nbr:CID1_CID2_2DTanimotoScore Is_output_of sio:CHEMINF_000333 nbr:CID1_CID2_2DTanimotoScore type pcvocab:PC2D_Fingerprint_TanimotorScore 10/25/2018 1 neighbor link: 7 triples compound:CID1 sio:CHEMINF_000482 compound:CID2 4 quadrillion x 7 = 28 quadrillion triples
  • 37. PubChem Neighbor using CKG Model 10/25/2018 Subject Predicate Object nbr:CID1_CID2_2DSim has_measurement_value nbr:CID1_CID2_2DTanimotoScore nbr:CID1_CID2_2DSim refers_to compound:CID1 nbr:CID1_CID2_2DSim refers_to compound:CID2 nbr:CID1_CID2_2DSim type pcvocab:PC2D_structural_similarity nbr:CID1_CID2_2DTanimotoScore has_value 0.91^^xsd:float nbr:CID1_CID2_2DTanimotoScore Is_output_of sio:CHEMINF_000333 nbr:CID1_CID2_2DTanimotoScore type pcvocab:PC2D_Fingerprint_TanimotorScore Subject Predicate Object compound:CID1 has_structural_similarity?sp=1&ds=pc&is_output_of=sio:CHEMINF_00 0333&has_2d_tanimoto_score=0.91^^xsd compound:CID1 1 neighbor link: 1 triple 4 quadrillion x 1 = 4 quadrillion triples
  • 38. 10/25/2018 < 20 billion CKG triples

Hinweis der Redaktion

  1. Semantic Web Technology, enhanced by a massive use of open linked data, plays a crucial role in the overall Deep QA architecture
  2. CEO Sundar Pichai led the charge here, noting that Google's Knowledge Graph (the easily accessible information that pop up under the search bar for certain queries) now encompasses 70 billion facts
  3. 1163 datasets Using Semantic Web technologies 149,423,660,620 triples from 2973 datasets (retrieved Dec 14) Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs. so that they can discover more things.
  4. Five datasets
  5. One slide shows the graph database approach One slide compares the SP and property graph
  6. One slide shows the schema One slide shows similarity score file One slide shows the numbers in the schema One slides show the numbers for all approaches
  7. nbr:CID1_CID2_2DSim has_measurement_value nbr:CID1_CID2_2dTani . nbr:CID1_CID2_2DSim refers_to compound:CID1 . nbr:CID1_CID2_2DSim refers_to compound:CID2 . nbr:CID1_CID2_2DSim type pcvocab:PC2D_structural_similarity . nbr:CID1_CID2_2dTani has_value 0.91^^xsd:float . nbr:CID1_CID2_2dTani is_output_of sio:CHEMINF_000333 . nbr:CID1_CID2_2dTani type pcvocab:PC2D_Fingerprint_TanimotorScore
  8. nbr:CID1_CID2_2DSim has_measurement_value nbr:CID1_CID2_2dTani . nbr:CID1_CID2_2DSim refers_to compound:CID1 . nbr:CID1_CID2_2DSim refers_to compound:CID2 . nbr:CID1_CID2_2DSim type pcvocab:PC2D_structural_similarity . nbr:CID1_CID2_2dTani has_value 0.91^^xsd:float . nbr:CID1_CID2_2dTani is_output_of sio:CHEMINF_000333 . nbr:CID1_CID2_2dTani type pcvocab:PC2D_Fingerprint_TanimotorScore
  9. 96280729533/7=13754389933