SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Transparency in the Data Supply Chain

Paul Groth (@pgroth)
Web & Media Group
Department of Computer Science
VU University Amsterdam
http://www.few.vu.nl/~pgroth
Outline
• Data integration for analysis
– i.e. remixing data

• The need for transparency
• Two solutions
• The future
http://openphacts.org
pmu@openphacts.org
@Open_PHACTS
Public Domain Drug Discovery Data:
Pharma are accessing, processing, storing & re-processing

Literature Genbank
Patents PubChem

Data Integration

Databases

Data Analysis

Downloads

x

Repeat @
each
company

Firewalled Databases

Why?
Prioritised Research Questions
Number

sum

Nr of 1

Question

15

12

9

All oxido,reductase inhibitors active <100nM in both human and mouse

18

14

8

Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety
concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor,
KOL) for findings associated with a compound?

24

13

8

Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine
ADMET profile of actives.

32

13

8

For a given interaction profile, give me compounds similar to it.

37

13

8

The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine
protease assays for molecules that contain substructure X.

38

13

8

41

13

8

44

13

8

46

13

8

59

14

8

Retrieve all experimental and clinical data for a given list of compounds defined by their chemical
structure (with options to match stereochemistry or not).
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to
modulate the target directly? What are the compounds that may modulate the target directly? i.e. return
all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both
from structured assay databases and the literature.
Give me all active compounds on a given target with the relevant assay data
Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
Identify all known protein-protein interaction inhibitors
www.openphacts.org
Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse

From Mabel Loza - USC team
From Mabel Loza - USC team
From Mabel Loza - USC team
From Mabel Loza - USC team
Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse

ChEMBL:
Search target Oxidoreductase: 481 targets from different species

Selection of all the oxidoreductases and filtering bioactivities with the
criteria IC50 < 100 (no units could be selected): 11497 data obtained
Table exported to a excel spreadsheet and manually filtered

From Mabel Loza - USC team
5 people
Working 6 hours
Problem: Data Integration
Queries

Queries

Data Warehouse

Mediator

Extract
Transform
Load

Query
Reformulation

Data
Source

Data
Source

Data
Source

Data
Source
Using the Power of Open PHACTS, London, 22-23 April 2013

Core Platform

Applications
Identity
Resolution
Service
Identifier
Management
Service

“Adenosine
receptor 2a”

Linked Data API (RDF/XML, TTL, JSON)

P12374
EC2.43.4
CS4532

Semantic Workflow Engine
Chemistry
Registration
Normalisation
& Q/C

Data Cache
(Virtuoso Triple Store)

VoID

VoID

VoID

Nanopub

Public
Ontologies

Db

Domain
Specific
Services

Db

Public Content

VoID
Nanopub

Db

VoID
Nanopub

Db

Commercial

User
Annotations

index
Open PHACTS Explorer

15
Open PHACTS Explorer

?
16
Credits: Curt Tilmes, Peter Fox
Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.;
Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate
Assessment in the Global Change Information System," Geoscience and Remote
Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
Problem: I don’t trust your assessment what is
it based on?
Tension:

Integrated &
Summarized
Data

Transparency
& Trust
Solution
Integrating and exposing provenance
provided by multiple sources
provbook.org
National Climate Change Assessment Provenance
PROV the database as a black box

Q
Goal
• the capability to trace back, for each query
result, the complete list of sources and how they
were combined to deliver a result.
Implement In a Graph Database at
Scale
Marcin Wylot
Philippe Cudré-Mauroux
Exascale Lab
University of Fribourg

http://diuf.unifr.ch/main/xi/diplodocus
TriplePROV [WWW2014]
Provenance Polynomials
Test on large messy data
• Billion Triple Challenge
– Crawled from the linked open data cloud

• Web Data Commons
– RDFa, Microdata extracted from common crawl

• 115 million triples (25 GB)
• 8 Queries defined for BTC
– T. Neumann and G. Weikum. Scalable join processing
on very large rdf graphs. In Proceedings of the 2009
ACM SIGMOD International Conference on
Management of data, pages 627–640. ACM, 2009.
External + Internal Provenance
• Unified queries over external and database
provenance
• Adapting query results based on provenance
• Performance improvements
FUTURE
60 % of time is spent on data
preparation
Big Data is often lots of small data

http://www.data2semantics.org/prov-reconstruction-challenge/
Questions?
• More info:
–
–
–
–

openphacts.org
data2semantics.org
provbook.org
Paul Groth, "Transparency and Reliability in the Data Supply
Chain," IEEE Internet Computing, vol. 17, no. 2, pp. 69-71, MarchApril, 2013
– Paul Groth, "The Knowledge-Remixing Bottleneck," Intelligent
Systems, IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013
– Marcin Wylot, Philippe Cudré-Mauroux and Paul Groth.
TripleProv: Efficient Processing of Lineage Queries over a Native
RDF Store. WWW 2014
Backup
Hack Sparql
What’s the overhead? Setup

Source and complete trace (i.e. triple level)
Annotations:
Propagate
annotations through
the query processing
pipeline
What’s the overhead?

Weitere ähnliche Inhalte

Was ist angesagt?

Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigData_Europe
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...Chris Southan
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemPaul Thiessen
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookIsabella Feierberg
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsChris Southan
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingGigaScience, BGI Hong Kong
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
GtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextGtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextChris Southan
 

Was ist angesagt? (20)

Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChem
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlook
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Clicking Past Google
Clicking Past GoogleClicking Past Google
Clicking Past Google
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
GtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextGtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in context
 

Andere mochten auch

Telling your research story with (alt)metrics
Telling your research story with (alt)metricsTelling your research story with (alt)metrics
Telling your research story with (alt)metricsPaul Groth
 
"Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited "Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited Paul Groth
 
Altmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactAltmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactPaul Groth
 
Decoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from ExecutionDecoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from ExecutionPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Knowledge Graphs at Elsevier
Knowledge Graphs at ElsevierKnowledge Graphs at Elsevier
Knowledge Graphs at ElsevierPaul Groth
 
Open PHACTS API Walkthrough
Open PHACTS API WalkthroughOpen PHACTS API Walkthrough
Open PHACTS API WalkthroughPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Youth Announcements 02-07-2010
Youth  Announcements 02-07-2010Youth  Announcements 02-07-2010
Youth Announcements 02-07-2010realifesigma
 
Comenius Project Results Quest. Both Pres Italy
Comenius Project Results Quest. Both  Pres ItalyComenius Project Results Quest. Both  Pres Italy
Comenius Project Results Quest. Both Pres ItalyPetros Michailidis
 
Final dinsmore-cd-kw-presentation
Final dinsmore-cd-kw-presentationFinal dinsmore-cd-kw-presentation
Final dinsmore-cd-kw-presentationCarolineDinsmore
 

Andere mochten auch (20)

Telling your research story with (alt)metrics
Telling your research story with (alt)metricsTelling your research story with (alt)metrics
Telling your research story with (alt)metrics
 
"Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited "Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited
 
Altmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactAltmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impact
 
Decoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from ExecutionDecoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from Execution
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Knowledge Graphs at Elsevier
Knowledge Graphs at ElsevierKnowledge Graphs at Elsevier
Knowledge Graphs at Elsevier
 
Open PHACTS API Walkthrough
Open PHACTS API WalkthroughOpen PHACTS API Walkthrough
Open PHACTS API Walkthrough
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
11
1111
11
 
Youth Announcements 02-07-2010
Youth  Announcements 02-07-2010Youth  Announcements 02-07-2010
Youth Announcements 02-07-2010
 
Comenius Project Results Quest. Both Pres Italy
Comenius Project Results Quest. Both  Pres ItalyComenius Project Results Quest. Both  Pres Italy
Comenius Project Results Quest. Both Pres Italy
 
Ad e qui_a_peur_du_b_b_r
Ad e qui_a_peur_du_b_b_rAd e qui_a_peur_du_b_b_r
Ad e qui_a_peur_du_b_b_r
 
Foi and open data
Foi and open dataFoi and open data
Foi and open data
 
Final dinsmore-cd-kw-presentation
Final dinsmore-cd-kw-presentationFinal dinsmore-cd-kw-presentation
Final dinsmore-cd-kw-presentation
 

Ähnlich wie Transparency in the Data Supply Chain

2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeSciBite Limited
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Kerstin Forsberg
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...European Data Forum
 
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discoveryopen_phacts
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug DiscoverySciBite Limited
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...Barry Hardy
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorLevi Waldron
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
 

Ähnlich wie Transparency in the Data Supply Chain (20)

Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
 
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
 

Mehr von Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 

Mehr von Paul Groth (20)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Transparency in the Data Supply Chain

  • 1. Transparency in the Data Supply Chain Paul Groth (@pgroth) Web & Media Group Department of Computer Science VU University Amsterdam http://www.few.vu.nl/~pgroth
  • 2.
  • 3. Outline • Data integration for analysis – i.e. remixing data • The need for transparency • Two solutions • The future
  • 5. Public Domain Drug Discovery Data: Pharma are accessing, processing, storing & re-processing Literature Genbank Patents PubChem Data Integration Databases Data Analysis Downloads x Repeat @ each company Firewalled Databases Why?
  • 6. Prioritised Research Questions Number sum Nr of 1 Question 15 12 9 All oxido,reductase inhibitors active <100nM in both human and mouse 18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? 24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives. 32 13 8 For a given interaction profile, give me compounds similar to it. 37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. 38 13 8 41 13 8 44 13 8 46 13 8 59 14 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not). A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature. Give me all active compounds on a given target with the relevant assay data Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease) Identify all known protein-protein interaction inhibitors www.openphacts.org
  • 7. Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse From Mabel Loza - USC team
  • 8. From Mabel Loza - USC team
  • 9. From Mabel Loza - USC team
  • 10. From Mabel Loza - USC team
  • 11. Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse ChEMBL: Search target Oxidoreductase: 481 targets from different species Selection of all the oxidoreductases and filtering bioactivities with the criteria IC50 < 100 (no units could be selected): 11497 data obtained Table exported to a excel spreadsheet and manually filtered From Mabel Loza - USC team
  • 13. Problem: Data Integration Queries Queries Data Warehouse Mediator Extract Transform Load Query Reformulation Data Source Data Source Data Source Data Source
  • 14. Using the Power of Open PHACTS, London, 22-23 April 2013 Core Platform Applications Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” Linked Data API (RDF/XML, TTL, JSON) P12374 EC2.43.4 CS4532 Semantic Workflow Engine Chemistry Registration Normalisation & Q/C Data Cache (Virtuoso Triple Store) VoID VoID VoID Nanopub Public Ontologies Db Domain Specific Services Db Public Content VoID Nanopub Db VoID Nanopub Db Commercial User Annotations index
  • 17. Credits: Curt Tilmes, Peter Fox Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate Assessment in the Global Change Information System," Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
  • 18.
  • 19. Problem: I don’t trust your assessment what is it based on?
  • 21. Solution Integrating and exposing provenance provided by multiple sources
  • 22.
  • 24.
  • 25. National Climate Change Assessment Provenance
  • 26.
  • 27.
  • 28. PROV the database as a black box Q
  • 29. Goal • the capability to trace back, for each query result, the complete list of sources and how they were combined to deliver a result.
  • 30. Implement In a Graph Database at Scale Marcin Wylot Philippe Cudré-Mauroux Exascale Lab University of Fribourg http://diuf.unifr.ch/main/xi/diplodocus
  • 33. Test on large messy data • Billion Triple Challenge – Crawled from the linked open data cloud • Web Data Commons – RDFa, Microdata extracted from common crawl • 115 million triples (25 GB) • 8 Queries defined for BTC – T. Neumann and G. Weikum. Scalable join processing on very large rdf graphs. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 627–640. ACM, 2009.
  • 34. External + Internal Provenance • Unified queries over external and database provenance • Adapting query results based on provenance • Performance improvements
  • 36. 60 % of time is spent on data preparation
  • 37. Big Data is often lots of small data http://www.data2semantics.org/prov-reconstruction-challenge/
  • 38. Questions? • More info: – – – – openphacts.org data2semantics.org provbook.org Paul Groth, "Transparency and Reliability in the Data Supply Chain," IEEE Internet Computing, vol. 17, no. 2, pp. 69-71, MarchApril, 2013 – Paul Groth, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013 – Marcin Wylot, Philippe Cudré-Mauroux and Paul Groth. TripleProv: Efficient Processing of Lineage Queries over a Native RDF Store. WWW 2014
  • 41. What’s the overhead? Setup Source and complete trace (i.e. triple level)