SlideShare ist ein Scribd-Unternehmen logo
1 von 25
FAIR Data-centric
Information Architecture:
It's all fun and games until
someone uses an “I”
Ben Gardner
Data Standards & Interoperability, Data Office, Data
Science & Artificial Intelligence, R&D, AstraZeneca,
Cambridge, United Kingdom
September 2023
What is Data Centricity?
2
Data-Centricity puts
data at the centre of
the enterprise.
Applications are
optional visitors to the
data. (Data-centric manifesto)
Data-centricity involves structuring our data around the science that we do rather
than the systems that we use. It promotes data reusability over system-centric design.
Defining our data-centric transformation using FAIR metrics
3
L1
•Data is catalogued in situ
•Raw data in an
unconformed format,
external to a data lake
•Requires expert technical
and subject matter
knowledge to access and
use
L2
•Raw data catalogued and
accessible with
governance in a data lake,
unconformed.
•Requires average
technical and expert
subject matter
knowledge to use
L3
•Conformed data
catalogued and available
in a data lake in
accordance with AZ
patterns
•Data is conformed on a
system by system basis
and may not map to a
domain data model
•Controls on access at the
system level
•Requires average level
technical and subject
matter knowledge to use:
Data Scientist.
L4 – Domain
level FAIR
•Data conformed,
integrated, processed and
audited to support
specific analytics
patterns/enquiries
•Data is conformed using a
domain level data model
•Data embeds local master
and reference data
•Controls on access at the
data level
• Creation of analytics
ready 'Marts' enabling
self service analytics:
Citizen Data Scientist
L5 – Enterprise
Level FAIR
•Extend L4 further by:
•Data is described in a
cross domain data or
industry model and is
integrated in a Data Mesh
•Data embeds enterprise
master and reference
data
•Enterprise Metadata
standard applied
•URI’s and PURLs are
implemented
•Cross domain Analytics
enabled: Citizen Data
Scientist using all relevant
AZ data
L6 – Knowledge
Level FAIR
•Extend L5 further by:
•Data is fully described
using a knowledge
language or ontology
•Data is aggregated by
business concepts and
users can navigate from
concept to concept
•Automated AI enabled: AI
and semantic solutions
can act directly on data
sets without need for
interpretation
•Knowledge enabled
citizens
Evolution of System Centric Thinking Evolution of Data Centric Thinking to Knowledge Centric
Sweet spot for many AZ domains –
where deep science and/or broad
data integration is not required
Some data domains will require additional investment to support scientific
use cases. This maximises use of Enterprise data and fully supports AI
Requires strong and increasing linkage of FAIR with TRUST
System-centric Catalogs and API’s only, giving minimal FAIR capabilities
Credit to Information Architecture and Ben Gardner
The value of data centricity
4
Junk Yard
Car Boot Sale
Moroccan Street
Market
Local Market
Hyper Market
Amazon Prime
• Random stuff by car
• Ordered by seller
• Sellers have assigned pitches
• No insights about what other sellers have
• Random level of quality i.e. first edition vs
latest edition book
• Randomly distributed stuff
• Lots of effort to find stuff
• Easy to miss what you are looking for
• Only the 'owner' knows where stuff might be at best
• Grouping by product category i.e. spices
vs carpets vs etc
• Improved level of quality
• Seller can explain the provenance of the
product
• Good intentions but not really scalable
• Well organised
• Lots of categories
• Well labelled items including
provenance (made where)
• Limited produce
• Scale and organisation at the
next level
• Everything in once place
• Store guide/map
• Optimised for
society/community
• Specialist by product category
• Data driven display/groupings -
I.e. Christmas, summer vs winter;
mining shopping patterns,
context sensitivity
• Click and collect service
• Price comparison with other
retailers
• Digital, no longer physically
constrained
• Recommendation engines -
other categories/products
related to this, push
information to you vs pull
provides information on
request
• Scale of products and
variety of manufactures
• Fuzzy search - helps me find
what I might be looking for -
I'll know it when I find it
searches
• Amazon subscription
services - schedule delivery,
Dash buttons
• Alexa - AI guided
• Services - Music and video
• Market place for other
vendors
Think about your shopping
experience….
Credit to Graham Henry, Sandra McGarry, Kerstin Forsberg and Ben Gardner
Building Knowledge Graphs
A three legged stool
5
“Start with meaning”
Dave McComb, Semantic Arts
Stool built by Derek Scuffell
Ontology Architecture overview
6
A conceptual model that
provides the minimum linking
of shared entities used across
domain and application
models.
Entity Ontologies describe
individual concepts and act
as building blocks for
importing into application
ontologies.
Application Ontologies are
built to support a specific
application.
Honeycombs used to
communicate concept of
knowledge map, illustrate use
case coverage and organic
evolution.
Credit to Jon Ison
Conceptual Model Entity Ontology Application Ontology Honeycomb
Honeycomb motif inspired by Dan Gschwend - Amgen’s Enterprise Data Fabric, DCAF 2021
Anatomy of an Entity Ontology
A model of a high level concept within a knowledge domain
7
Relationships
Controlled Vocabularies
Information
(Data Properties)
Strategy to organically grow a knowledge map
Increasing the breadth, depth and complexity of questions enabled
8
Model evolves based on emerging new scientific use cases
Foundational CV’s deliver Quality all the way up
9
Silo’ed picklists
Collections
Local CV
Foundational CV
Uncoordinated list of strings
used by an application
Well designed atomic controlled
vocabulary built to a common
standard
Narrow coverage for a
domain/application
Decided by use case specific
governance, enabled by editorial
capability & appropriate tools
Well designed atomic controlled
vocabulary built to a common
standard
Broad coverage for multiple
domains/applications
Decided by SMEs, enabled by
specialist curators
A collection of terms derived
from existing atomic controlled
vocabularies that meet specific
application needs/use cases
SKOS-XL
https://pid.astrazeneca.net/ref/cvname/{ID}
Credit to Varsha Khodiyar
Controlled Vocabularies
SKOS-XL supports a multitude of labels and signifiers
A pool of lexical labels exist for each concept. They are common use OR attributed to systems and
vocabularies. AZ curators decide which one will be preferred (for AZ) and whether other labels will be
alternative or hidden. Each label should be further characterized by a signifier.
Alternative Labels
Well defined non-case variant, alternative
labels that are used for this concept. –
some may call these “synonyms”
Preferred Labels
The preferred label for AstraZeneca, that
resonates with the business vernacular
Hidden Labels
Common mis-representations (spelling
mistakes, etc) of the concept that exist and
we don’t want used by humans. Often used
to support NLP and AI activity
“ctDNA”@en
C113243
Pref Label
NCI Thesaurus
Identifier Scheme
“Circulating Tumor DNA” ->[NCIT]
Alt Label (attributed)
“circulating tumor DNA”@en-us
Alt Label
“cell-free circulating
tumour DNA”@en
Alt Label
“ctdDNA”@en
Alt Label
“cDNA” -> Spelling
Hidden Label
“cell-free circulating
tumor DNA”@en-us
Alt Label
we choose you! (as our
preferred label)
Credit to Nathalie Conte, Derek Scuffell & Jon Ison – OWG Summer project team
Data
Domain level FAIR (L4) a great first step
11
Data Catalogue
Credit to – Data Office Collibra team
Find - Data registered in Collibra and tagged with CMM
Access – Access controlled via Collibra request service
Reuse – Documentation captured against data
But we are FAR from FAIR
We MUST go further
12
Find - Data registered in Collibra
and tagged with CMM
Access – Access controlled via
Collibra request service
Reuse – Documentation captured
against data
Find - Data registered in Collibra and
tagged with CMM
Access – Access controlled via Collibra
request service
Interoperable – Data enhanced with
shared CV and PIDs
Reuse – Documentation captured
against data, includes data dictionary, etc
Data record discoverable in Collibra
Data enriched to create interoperability
Data is Machine Readable
FAIR metrics (Level 4) FAIR metrics (Level 5)
Data record discoverable in Collibra
Data Catalog
(Limited Scope)
FAIR
reference
data
FAIR
master
data
Purl
Server
Controlled
Vocabularies
Data Catalog
(Limited Scope)
Enterprise Level FAIR (L5)
A FAIRe(nough) data product
13
Data
Controlled
Vocabularies
FAIR Data Product
Findable Accessible Interoperable Reusable
Registered and
discoverable in a Data
Catalogue
Mechanism for requesting
and receiving data
The data has been aligned
to AZ standards where
they exist
Documentation describing
the constraints associated
with using the data
The PID for each instance
in the standard is included
to make the data machine
readable
Documentation describing
the data i.e. data
dictionary, schema, etc
The minimum viable FAIR Data standard should deliver
Catalogue
Data and Controlled Vocabularies
Putting Interoperability into FAIR
14
Data
Transform
Controlled Vocabularies
Term prefLabel
& PID
Dirty data Interoperable data
prefLabel PID
• Inconsistent identifiers & terms
• Column values can be concatenated
• etc
• Shared Controlled Vocabularies
• Enrich with preferred label and PIDs
• Uses common files format CSV, JSON, etc
• Is machine readable, graph enabling and
relationally world friendly
• Well documented - Data Dictionary/Data
Schema/etc
L5 FAIR Data Products benefit all
Inclusion of PIDs simplifies data integration irrespective of target data model
15
L5 FAIR Data Product
Relational
Graph
Aligning entities with controlled vocabularies is key
16
Entity
Ontology
Application
Ontology
Build from
Foundational
Vocabularies
Build from
Collections
Ontologies
(things and relationships)
Vocabularies
(terms that categorise things)
A model built from
one or more entity
ontologies and
extended as required
based on use case
Discrete models of
things within a domain
that provide the
foundational building
blocks
Atomic sets of terms that
are used to describe the
things identified within
entity ontologies
The collection of term sets
and terms from one or more
controlled vocabularies that
are required to support a use
case
Emerging CV demand drives evolution of Entity Ontologies
Critical Data Element and DDWG provide another path to enhancing our model
17
Credit to – Jon Ison, Arinjay Jadeja and Caroline Hellawell
Model and Data
Strings to things a mapping standard
18
Credit to – Jon Ison, Nicola Ellingham and Arinjay Jadeja
Knowledge Level FAIR (L6)
A FAIR data product minimises the gap between relational and graph worlds
19
L4 FAIR Data Controlled
Vocabularies
Reference Model
RDF Serialisation
Knowledge Map
L5 FAIR Data Product
• An enterprise standard
• Adds value for everyone
• Two thirds of the way to graph without
impacting non-graph consumers
Model, Controlled Vocabularies and Data
In a data-centric world the PIDs just snap the data and model together
20
Bringing it all back together
21
Controlled
Vocabularies
Reference
Model
FAIR Data
Product
Ontology Architecture
• Scalable and sustainable
• Reusable library of entity
ontologies
• Organic evolution of
reference ontology
Controlled Vocabularies
• RDM service implemented
• SKOS-XL based CV framework & workflow
designed
• Editorial controls in place
• Prioritisation & creation of CV underway
FAIR Data
• FAIR metrics goal 80%
Data FAIR by 2025
• Standards, resources,
tooling and governance
• FAIR assessment in place
• PID server and standard
patterns agreed
Omics Imaging Patient
Knowledge Map
(Data-centric)
Data is FAIR and aggregated by
business concept (semantic)
Platform Data Products
(Platform-centric)
Data is FAIR but fragmented
across domains
From System-centric to Data-centric
Reference Architecture
Application Data
(System-centric)
Data is aligned around
processes
What should I do?
Focus on utilising our Controlled Vocabularies and enriching your data
23
Collections
Local CV
Foundational CV
Catalogue your data Commit to reusing &
improving our CV estate
Enrich your data with
prefLabel & PID
Data Catalogue Controlled Vocabularies L5 FAIR Data Product
It takes village ……
24
DF+I
Daniel Roythorne
Jon Ison
Nathalie Conte
Nicola Ellingham
Arun Balaji
Induja Mohan
Arinjay Jadeja
Ben Gardner
Hans Ienasescu
Bhavna Khilnani
Michael Neylon
Mathew Woodwark
Rob Hernandez
Collaborators
Derek Scuffell
Varsha Khodiyar
Pablo Porras Millan
John Berrisford
Bijay Jassal
Rafa Jimenez
Philippe Rocca-Serra
Victor Kim
Alex Wood
Linda Zander-Balderud
Antonio Fabregat
Mark Reuter
Tom Plasterer
James Holman
Martina Devoti
Stacy Mather
Di Elvers
Colin Wood
Sandra Mc Garry
Gareth Henry
Kerstin Freberg
Calle Nordmark
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
25

Weitere ähnliche Inhalte

Was ist angesagt?

Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...
Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...
Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...Amazon Web Services
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowPyData
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data modelDATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 

Was ist angesagt? (20)

Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...
Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...
Cloud Native, Cloud First and Hybrid: How Different Organizations are Approac...
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data model
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 

Ähnlich wie FAIR Data-centric Information Architecture.pptx

FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Denodo
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsDenodo
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AITigerGraph
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptxirfanullahkhan64
 
Connected development data
Connected development dataConnected development data
Connected development dataRob Worthington
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...Steven Meister
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer OverviewGridlogics
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Denodo
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewDenodo
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
 

Ähnlich wie FAIR Data-centric Information Architecture.pptx (20)

FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Data Domain-Driven Design
Data Domain-Driven DesignData Domain-Driven Design
Data Domain-Driven Design
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AI
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptx
 
Connected development data
Connected development dataConnected development data
Connected development data
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 View
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 

Mehr von Ben Gardner

Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
 
Semantic blockchain
Semantic blockchainSemantic blockchain
Semantic blockchainBen Gardner
 
What AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalWhat AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalBen Gardner
 
From Search to Semantics
From Search to SemanticsFrom Search to Semantics
From Search to SemanticsBen Gardner
 
Practical semantics - An introduction
Practical semantics - An introductionPractical semantics - An introduction
Practical semantics - An introductionBen Gardner
 
From the Unknown to the Known
From the Unknown to the KnownFrom the Unknown to the Known
From the Unknown to the KnownBen Gardner
 
Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?Ben Gardner
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Ben Gardner
 

Mehr von Ben Gardner (9)

Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
 
Semantic blockchain
Semantic blockchainSemantic blockchain
Semantic blockchain
 
What AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalWhat AI is and examples of how it is used in legal
What AI is and examples of how it is used in legal
 
From Search to Semantics
From Search to SemanticsFrom Search to Semantics
From Search to Semantics
 
Practical semantics - An introduction
Practical semantics - An introductionPractical semantics - An introduction
Practical semantics - An introduction
 
From the Unknown to the Known
From the Unknown to the KnownFrom the Unknown to the Known
From the Unknown to the Known
 
Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?
 
meet Jessica
meet Jessicameet Jessica
meet Jessica
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 

Kürzlich hochgeladen

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Kürzlich hochgeladen (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

FAIR Data-centric Information Architecture.pptx

  • 1. FAIR Data-centric Information Architecture: It's all fun and games until someone uses an “I” Ben Gardner Data Standards & Interoperability, Data Office, Data Science & Artificial Intelligence, R&D, AstraZeneca, Cambridge, United Kingdom September 2023
  • 2. What is Data Centricity? 2 Data-Centricity puts data at the centre of the enterprise. Applications are optional visitors to the data. (Data-centric manifesto) Data-centricity involves structuring our data around the science that we do rather than the systems that we use. It promotes data reusability over system-centric design.
  • 3. Defining our data-centric transformation using FAIR metrics 3 L1 •Data is catalogued in situ •Raw data in an unconformed format, external to a data lake •Requires expert technical and subject matter knowledge to access and use L2 •Raw data catalogued and accessible with governance in a data lake, unconformed. •Requires average technical and expert subject matter knowledge to use L3 •Conformed data catalogued and available in a data lake in accordance with AZ patterns •Data is conformed on a system by system basis and may not map to a domain data model •Controls on access at the system level •Requires average level technical and subject matter knowledge to use: Data Scientist. L4 – Domain level FAIR •Data conformed, integrated, processed and audited to support specific analytics patterns/enquiries •Data is conformed using a domain level data model •Data embeds local master and reference data •Controls on access at the data level • Creation of analytics ready 'Marts' enabling self service analytics: Citizen Data Scientist L5 – Enterprise Level FAIR •Extend L4 further by: •Data is described in a cross domain data or industry model and is integrated in a Data Mesh •Data embeds enterprise master and reference data •Enterprise Metadata standard applied •URI’s and PURLs are implemented •Cross domain Analytics enabled: Citizen Data Scientist using all relevant AZ data L6 – Knowledge Level FAIR •Extend L5 further by: •Data is fully described using a knowledge language or ontology •Data is aggregated by business concepts and users can navigate from concept to concept •Automated AI enabled: AI and semantic solutions can act directly on data sets without need for interpretation •Knowledge enabled citizens Evolution of System Centric Thinking Evolution of Data Centric Thinking to Knowledge Centric Sweet spot for many AZ domains – where deep science and/or broad data integration is not required Some data domains will require additional investment to support scientific use cases. This maximises use of Enterprise data and fully supports AI Requires strong and increasing linkage of FAIR with TRUST System-centric Catalogs and API’s only, giving minimal FAIR capabilities Credit to Information Architecture and Ben Gardner
  • 4. The value of data centricity 4 Junk Yard Car Boot Sale Moroccan Street Market Local Market Hyper Market Amazon Prime • Random stuff by car • Ordered by seller • Sellers have assigned pitches • No insights about what other sellers have • Random level of quality i.e. first edition vs latest edition book • Randomly distributed stuff • Lots of effort to find stuff • Easy to miss what you are looking for • Only the 'owner' knows where stuff might be at best • Grouping by product category i.e. spices vs carpets vs etc • Improved level of quality • Seller can explain the provenance of the product • Good intentions but not really scalable • Well organised • Lots of categories • Well labelled items including provenance (made where) • Limited produce • Scale and organisation at the next level • Everything in once place • Store guide/map • Optimised for society/community • Specialist by product category • Data driven display/groupings - I.e. Christmas, summer vs winter; mining shopping patterns, context sensitivity • Click and collect service • Price comparison with other retailers • Digital, no longer physically constrained • Recommendation engines - other categories/products related to this, push information to you vs pull provides information on request • Scale of products and variety of manufactures • Fuzzy search - helps me find what I might be looking for - I'll know it when I find it searches • Amazon subscription services - schedule delivery, Dash buttons • Alexa - AI guided • Services - Music and video • Market place for other vendors Think about your shopping experience…. Credit to Graham Henry, Sandra McGarry, Kerstin Forsberg and Ben Gardner
  • 5. Building Knowledge Graphs A three legged stool 5 “Start with meaning” Dave McComb, Semantic Arts Stool built by Derek Scuffell
  • 6. Ontology Architecture overview 6 A conceptual model that provides the minimum linking of shared entities used across domain and application models. Entity Ontologies describe individual concepts and act as building blocks for importing into application ontologies. Application Ontologies are built to support a specific application. Honeycombs used to communicate concept of knowledge map, illustrate use case coverage and organic evolution. Credit to Jon Ison Conceptual Model Entity Ontology Application Ontology Honeycomb Honeycomb motif inspired by Dan Gschwend - Amgen’s Enterprise Data Fabric, DCAF 2021
  • 7. Anatomy of an Entity Ontology A model of a high level concept within a knowledge domain 7 Relationships Controlled Vocabularies Information (Data Properties)
  • 8. Strategy to organically grow a knowledge map Increasing the breadth, depth and complexity of questions enabled 8 Model evolves based on emerging new scientific use cases
  • 9. Foundational CV’s deliver Quality all the way up 9 Silo’ed picklists Collections Local CV Foundational CV Uncoordinated list of strings used by an application Well designed atomic controlled vocabulary built to a common standard Narrow coverage for a domain/application Decided by use case specific governance, enabled by editorial capability & appropriate tools Well designed atomic controlled vocabulary built to a common standard Broad coverage for multiple domains/applications Decided by SMEs, enabled by specialist curators A collection of terms derived from existing atomic controlled vocabularies that meet specific application needs/use cases SKOS-XL https://pid.astrazeneca.net/ref/cvname/{ID} Credit to Varsha Khodiyar
  • 10. Controlled Vocabularies SKOS-XL supports a multitude of labels and signifiers A pool of lexical labels exist for each concept. They are common use OR attributed to systems and vocabularies. AZ curators decide which one will be preferred (for AZ) and whether other labels will be alternative or hidden. Each label should be further characterized by a signifier. Alternative Labels Well defined non-case variant, alternative labels that are used for this concept. – some may call these “synonyms” Preferred Labels The preferred label for AstraZeneca, that resonates with the business vernacular Hidden Labels Common mis-representations (spelling mistakes, etc) of the concept that exist and we don’t want used by humans. Often used to support NLP and AI activity “ctDNA”@en C113243 Pref Label NCI Thesaurus Identifier Scheme “Circulating Tumor DNA” ->[NCIT] Alt Label (attributed) “circulating tumor DNA”@en-us Alt Label “cell-free circulating tumour DNA”@en Alt Label “ctdDNA”@en Alt Label “cDNA” -> Spelling Hidden Label “cell-free circulating tumor DNA”@en-us Alt Label we choose you! (as our preferred label) Credit to Nathalie Conte, Derek Scuffell & Jon Ison – OWG Summer project team
  • 11. Data Domain level FAIR (L4) a great first step 11 Data Catalogue Credit to – Data Office Collibra team Find - Data registered in Collibra and tagged with CMM Access – Access controlled via Collibra request service Reuse – Documentation captured against data
  • 12. But we are FAR from FAIR We MUST go further 12 Find - Data registered in Collibra and tagged with CMM Access – Access controlled via Collibra request service Reuse – Documentation captured against data Find - Data registered in Collibra and tagged with CMM Access – Access controlled via Collibra request service Interoperable – Data enhanced with shared CV and PIDs Reuse – Documentation captured against data, includes data dictionary, etc Data record discoverable in Collibra Data enriched to create interoperability Data is Machine Readable FAIR metrics (Level 4) FAIR metrics (Level 5) Data record discoverable in Collibra Data Catalog (Limited Scope) FAIR reference data FAIR master data Purl Server Controlled Vocabularies Data Catalog (Limited Scope)
  • 13. Enterprise Level FAIR (L5) A FAIRe(nough) data product 13 Data Controlled Vocabularies FAIR Data Product Findable Accessible Interoperable Reusable Registered and discoverable in a Data Catalogue Mechanism for requesting and receiving data The data has been aligned to AZ standards where they exist Documentation describing the constraints associated with using the data The PID for each instance in the standard is included to make the data machine readable Documentation describing the data i.e. data dictionary, schema, etc The minimum viable FAIR Data standard should deliver Catalogue
  • 14. Data and Controlled Vocabularies Putting Interoperability into FAIR 14 Data Transform Controlled Vocabularies Term prefLabel & PID Dirty data Interoperable data prefLabel PID • Inconsistent identifiers & terms • Column values can be concatenated • etc • Shared Controlled Vocabularies • Enrich with preferred label and PIDs • Uses common files format CSV, JSON, etc • Is machine readable, graph enabling and relationally world friendly • Well documented - Data Dictionary/Data Schema/etc
  • 15. L5 FAIR Data Products benefit all Inclusion of PIDs simplifies data integration irrespective of target data model 15 L5 FAIR Data Product Relational Graph
  • 16. Aligning entities with controlled vocabularies is key 16 Entity Ontology Application Ontology Build from Foundational Vocabularies Build from Collections Ontologies (things and relationships) Vocabularies (terms that categorise things) A model built from one or more entity ontologies and extended as required based on use case Discrete models of things within a domain that provide the foundational building blocks Atomic sets of terms that are used to describe the things identified within entity ontologies The collection of term sets and terms from one or more controlled vocabularies that are required to support a use case
  • 17. Emerging CV demand drives evolution of Entity Ontologies Critical Data Element and DDWG provide another path to enhancing our model 17 Credit to – Jon Ison, Arinjay Jadeja and Caroline Hellawell
  • 18. Model and Data Strings to things a mapping standard 18 Credit to – Jon Ison, Nicola Ellingham and Arinjay Jadeja
  • 19. Knowledge Level FAIR (L6) A FAIR data product minimises the gap between relational and graph worlds 19 L4 FAIR Data Controlled Vocabularies Reference Model RDF Serialisation Knowledge Map L5 FAIR Data Product • An enterprise standard • Adds value for everyone • Two thirds of the way to graph without impacting non-graph consumers
  • 20. Model, Controlled Vocabularies and Data In a data-centric world the PIDs just snap the data and model together 20
  • 21. Bringing it all back together 21 Controlled Vocabularies Reference Model FAIR Data Product Ontology Architecture • Scalable and sustainable • Reusable library of entity ontologies • Organic evolution of reference ontology Controlled Vocabularies • RDM service implemented • SKOS-XL based CV framework & workflow designed • Editorial controls in place • Prioritisation & creation of CV underway FAIR Data • FAIR metrics goal 80% Data FAIR by 2025 • Standards, resources, tooling and governance • FAIR assessment in place • PID server and standard patterns agreed
  • 22. Omics Imaging Patient Knowledge Map (Data-centric) Data is FAIR and aggregated by business concept (semantic) Platform Data Products (Platform-centric) Data is FAIR but fragmented across domains From System-centric to Data-centric Reference Architecture Application Data (System-centric) Data is aligned around processes
  • 23. What should I do? Focus on utilising our Controlled Vocabularies and enriching your data 23 Collections Local CV Foundational CV Catalogue your data Commit to reusing & improving our CV estate Enrich your data with prefLabel & PID Data Catalogue Controlled Vocabularies L5 FAIR Data Product
  • 24. It takes village …… 24 DF+I Daniel Roythorne Jon Ison Nathalie Conte Nicola Ellingham Arun Balaji Induja Mohan Arinjay Jadeja Ben Gardner Hans Ienasescu Bhavna Khilnani Michael Neylon Mathew Woodwark Rob Hernandez Collaborators Derek Scuffell Varsha Khodiyar Pablo Porras Millan John Berrisford Bijay Jassal Rafa Jimenez Philippe Rocca-Serra Victor Kim Alex Wood Linda Zander-Balderud Antonio Fabregat Mark Reuter Tom Plasterer James Holman Martina Devoti Stacy Mather Di Elvers Colin Wood Sandra Mc Garry Gareth Henry Kerstin Freberg Calle Nordmark
  • 25. Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com 25