Scaling the mirrorworld with knowledge graphs

Alan Morrison
SWC Webinar
23 October 2019
Scaling the mirrorworld with
knowledge graphs

PwC |Scaling the mirrorworld with the knowledge graph
Agenda
2
The mirrorworld vision
How graphs will begin to underlay the mirrorworld
Why semantic graphs are more efficient
Use cases that point the way forward
Conclusion: Efficiencies throughout the data lifecycle

Definitions and the
mirrorworld vision

Definitions
Content = Meaningful, human-readable data + logic in the
form of text, images, audio, video (or combinations of these)
Knowledge graphs = Meaningful, machine readable data +
logic in the form of any-to-any connected, contextualized
entities, their properties and relationships
Content can be modeled and then read by machines the same
way as other data + logic. The same techniques can apply.

In the mirrorworld,
everything will have a
paired twin.
Kevin Kelly in Wired
Feb 12, 2019
June 2019
5

What’s a digital twin? Depends on who you ask
6
GE: “At its core, the Digital Twin consists of sophisticated models or system
of models based on deep domain knowledge of specific industrial assets.
The Digital Twin is informed by a massive amount of design,
manufacturing, inspection, repair, online sensor and operational data.”
Goals: Predictive analytics, knowledge representation, etc.
From “What is a digital twin?” GE Digital, 2019
Finger Food, “We Are Industry-leading Digital Twin Holographic Service
Providers….
Imagine taking all of your disparate data sets from multiple spreadsheets
and diagrams and combining them into one live-streaming visual
holographic representation of your data – at full scale.”
Goals: “We can take your data from your spreadsheets and turn it into
clear, actionable context like never before…”
From “Digital Twin Solutions to Improve your Bottom Line,” Finger Food
Advanced Technology Group,“ 2019

Consider how long it took to build out the world’s oil &
gas infrastructure.
Now think about where we are with traditional data
management:
• How do we free ourselves from legacy IT?
• How do we build sharable digital twins?
• How do we scale a shared data infrastructure?
The mirrorworld poses a
massive global data
infrastructure challenge
7

Why treating smart data as a strategic asset is so critical right now
8
Challenge of the 2020s: Feeding your AIs enough
relevant, quality data
• Emerging tech often gets adopted just in pockets,
• That’s particularly the case with AI.
• Retraining, hiring new people, or buying more tools
isn’t enough.
• Many never figure out how to take advantage of
important AI-enabling tech. They’ll just use it in ad-
hoc projects or subscribe to AI-enhanced apps.
• But the impact on decision making will be minimal
without an industrial-scale approach to data and
flow.
Opportunity of the 2020s:
Pipelines, distribution networks and
volumes of quality, contextualized
smart data flowing to the point of
need
The challenge we face is the same
as the oil and gas industry faced in
the 1920s:
• Collecting enough raw material
• Refining and enriching it
• Distributing it to the places that
need it most
• Creating enough supply to
generate massive demand and
drive down the cost of AI

How graphs will
begin to underlay the
mirrorworld

Emerging techs – How are all these things interrelated?
Are they addressable too?
Knowledge graphs—the manifestation of a data-
centric architecture--can empower the other
technologies in these ways:
1. Accelerate machine learning training set
development
2. Enable multi-domain virtual
assistants/chatbots
3. Add reasoning to conversational ai platforms
4. Become means of sharing and interoperation
of digital twins
10

Emerging markets — related to most relevant hype cycle techs
11
Total projected revenue: $58.2 billion (2021)
Source: Tractica, Grandview Research and PwC analysis, 2019

Summary: A very large available market, but of course there’s a catch….
12
4%
5%
5%
8%
8%
9%
14%
13%
8%
26%
Summary of global target markets for
knowledge graph technology, 2021
Digital twins PaaS--data mgmt.
DaaS (org. domain) Virtual assistants
Conversational AI Deep learning
PaaS--integration, orchestration Info mgmt software
Integration software DBMS software
Total: $205 Billion Sources: Gartner (hype cycle only),
IDC, Tractica, PwC analysis, 2019

Why semantic graphs
are more efficient

Why traditional data management doesn’t scale
14
1. Relational databases don’t treat relationship
data as a first-class citizen
2. As a result, most companies have buried or are
missing the relationship data they need for
contextualization
3. Tables alone don’t help you dynamically model
your data or share the models
4. Managing large numbers of tables soon gets
unwieldy
5. Limiting your database resources to tabular
methods ensures you won’t take full advantage
of today’s compute, networking and storage
Relationship
richness
Relationship
sparseness
Static selective
fragmented
labor intensive
Additive
Index friendly
Immutable
versioning possible
More dynamic
More inclusive
More integrated
More machine assisted
Relational:
Row and column headers
And up-front taxonomies
Document:
Nested, cumulative
hierarchies
Graph:
Any-to-any
relationships
PwC, 2016
When overused, RDBMSes
perpetuate the provincial data
mentality of the 1980s, back
when computing didn’t scale
Lots of data is missing from relational
datasets—namely the contextual clues
needed for disambiguation via entity
resolution and, therefore, large-scale
integration

The consequence of logic and data siloing – App-centric system-level complexity
and disconnectedness spinning out of control (Result – Table and code sprawl)
15
Hardware
DBMS
OS
Custom code
Hardware
Lots of OSes
1,000+ SQL/
NoSQL DBs
Custom code
ERP+ suites
Hardware
A few more
OSes
More
DBMSes
Custom code
ERP+ suites
Hardware
Lots more OSes
5,000+
databases
Componentized
suites
Custom code
Cloud layer
Hardware
More types
of OSes
10,000+ DBs +
blockchains
Multicloud layer
Suites as
services
Various SaaSes
Custom code
Hardware
A few
DBMSes
A few OSes
ERP+ suites
Custom code
Threat of more
application centric
sprawl
Early1990s Late 1990s 2000s 2010s1973-1990sPre 1970 2020s

Implications of semantic knowledge graphs
16
• Data modeling in graph form can become dynamic, reusable, and scalable.
• The same data model can be readily used conceptually, logically and physically,
in a write-once, use-anywhere fashion, and can be reused as semantic
metadata.
• Semantic metadata is both machine- and human-readable, can be encoded as
data and can live with the rest of the data at the data layer.
• With the help of knowledge graphs, techniques born in the world of web content
can be applied to other data + logic, across boundaries.
• Logic does not have to be trapped in applications, but can remain universally
accessible and callable via the data layer as part of reusable data models.
• Knowledge graph-driven development in this world can become a highly efficient
and scalable means of development, eliminating application and data silo sprawl.

Data-centric design at the micro level brings human and machines together, with
the humans helping the machines build and scale relationship data
17
Relationship logic to shared at scale needs to be created in human-machine feedback
loops and embedded in a standard form at the data layer for full reuse—not trapped in
app silos
Relationship-
sparse, but
highly
articulated
data context
that humans
need to help
machines
refine and
enrich
Relationship-
rich smart
data that
uses
description or
predicate
logic to scale
integration,
context and
interoperation

The key opportunity – Large-scale integration and model-driven intelligence in
a de-siloed and de-duplicated way
18
Previously dominant
Rule-based systems (includes KR)
Handcrafted knowledge” is the term DARPA
uses; rule-based programming + procedure
replication in process automation, + some
knowledge representation (KR)
• Strong on logical reasoning in specific
concrete contexts
- Procedural + declarative programming +
set theory, etc.
- Deterministic
• Can’t learn or abstract
• Still exceptionally common and useful
On the rise and rapidly improving
Statistical machine learning
• Probabilistic
• From Bayesian algorithms to neural nets
(yes, deep learning also)
• Strong on perceiving and learning
(classifying, predicting)
• Weak on abstracting and reasoning
• Quite powerful in the aggregate but
individually (instance by instance) unreliable
• Can require lots of data
Perceiving
Learning
Abstracting
Reasoning
Perceiving
Learning
Abstracting
Reasoning
Perceiving
Learning
Abstracting
Reasoning
Example: Consumer tax software Example: Facial recognition using
deep learning/neural nets
John Launchbury of DARPA (https://www.youtube.com/watch?v=N2L8AqkEDLs), Estes Park Group and PwC research, 2017
Nascent, just beginning
Contextualized, model-driven approach
• Contextualized modeling approach-allows
efficiency, precision and certainty
• Combines power of deterministic,
probabilistic and description logic
• Allows explanations to be added
to decisions
• Accelerates the training process with the
help of specific, contextual human input
• Takes less data
Example: Explains first how handwritten
letters are formed so machines can decide-
less data needed, more transparency.

The solution – Data-centric architecture reduces both application and
database sprawl
19
Trapped app code and databases
Application centric versus Data centric
Semantic model/rules
Data lake or hub
Applets   
Applications for execution only
Models exposed with the data

Use cases that point
the way forward

Largest changes in market cap by global company, cross industry, 2018
21
1. Change in market cap from IPO date
2. Market cap at IPO date
Source: Bloomberg and PwC analysis
• Other major tech, FS and pharma cos. are also working on cross-enterprise knowledge graphs
• Many have cross-enterprise knowledge graph ambitions, but most are focused on a single use case
• S&P does cross-enterprise data management using relational tech
Company name Location Industry
Change in market cap
2009 – 2018 ($bn)
Market cap
2018 ($bn)
1 Apple United States Technology 757 851
2 Amazon.Com United States Consumer Services 670 701
3 Alphabet United States Technology 609 719
4 Microsoft Corp United States Technology 540 703
5 Tencent Holdings China Technology 483 496
6 Facebook United States Technology 3831 464
7 Berkshire Hathaway United States Financial 358 492
8 Alibaba China Consumer Services 3021 470
9 JPMorgan Chase United States Financials 275 375
10 Bank of America United States Financials 263 307
Known knowledge
graph builders
Operator of
Taobao and AliBot
KG builder
Known KG
builders
The most value-creating companies in the world are using knowledge graphs

State of the art knowledge graph – Blue Brain Nexus (1 of 2)
22
How do scientists record the provenance, curate, share in open
source and collaborate on what they’re documented using 3D
imaging techniques generated with the help of a supercomputer,
such as the slices of a rat’s brain?
From the EPFL Blue Brain Portal Gallery, https://portal.bluebrain.epfl.ch/gallery-2/

Morgan Stanley’s operational risk model (simplified)
23
Jason Marburg, Morgan Stanley, and Michael Uschold, Semantic Arts, “Representing Operational Risk in an RDF Graph,” presented at Graphorum, October 16, 2019.
3p
vendor/supplier
3P service
ProcessTechnology
asset
Risk & control
self-assessment Risk in context
of a process
Control Incident
Issue
Action plan
This simplified diagram illustrates some of
the main concepts and relationships
articulated in Morgan Stanley’s
Operational Risk Ontology (ORO), which
consists of 350 classes, 350 properties,
and 800 relationships.
Semantic Arts, a PwC partner, led the
development of the ORO. PwC advised
Morgan Stanley on risk strategy and
information governance.
Is realization of
Is assessment of
Is assessment of
Depends upon
Depends upon
Is part of
Pertains to
failure of
Depends upon
Provided by RemediatesIs identified
Issue with
Is identified
Issue with
Has root cause

A semantic knowledge graph could enable the model-driven organization (a digital
twin) at the data layer
24
Step One: Model the relevant
elements of the organization, how
they relate to one another
and interoperate
Step Two: Embed the model where
it lives as machine-readable data
Step Three: Integrate the source
datasets as a target knowledge
graph with model-driven mappings
Step Four: Browse, query,
disambiguate, detect and discover
via the resulting knowledge graph
Capability
enables
process
Process uses
information
https://virtualdutchman.com/2018/10/14/moving-to-a-model-based-enterprise-the-business-model/
Clearvision, 2019. Used with permission.
Prog/proj
creates
information
Prog/proj
Supports
process
Prog/proj
Has person
Prog/proj
creates
technology
Person uses
process
Person uses
information
Person
creates
information
Person uses
technology
Person uses
capability
Capability uses
technology
Information
uses
technology
Technology
Supports
process
Prog/proj
has risk
Portfolio
has person
Risk owned
by personPerson
Identified risk
Company
employs person
Portfolio
Has prog/proj
Prog/proj
outputs
Work package
Prog/proj
Has role
Prog/proj
Has parente prog/pro
Company
Has prog/proj
Prog/proj
Delivers strategy
Prog/proj
Has milestone
Company
has portfolio
Strategy
has milestone
Company
Has role
Role needs
competenceWork package
Needs competence
Work
package
Process
Information
Person
Risk
Portfolio
Milestone
Strategy
Company
Role
Competence
Technology
Capability
Capability uses
information
Prog/proj
Uses information
Prog/proj
Uses technology
Prog/proj
delivers
capability
Prog/proj
Work Package
has person
Person has
competence

Conclusion:
Efficiencies
throughout the data
lifecycle

Knowledge graphs complete the picture of your transformed data lifecycle and how
it’s managed
26

pwc.com
Thanks for attending!
© 2019 PwC. All rights reserved. PwC refers to the US member firm or one of its subsidiaries or affiliates, and may sometimes refer to the PwC network. Each member firm is a separate legal entity.
Please see www.pwc.com/structure for further details.
Alan Morrison
PwC | Emerging Tech | Sr. Research Fellow
+1 (408) 205 5109
alan.s.morrison@pwc.com
https://www.linkedin.com/in/alanmorrison/
https://twitter.com/AlanMorrison
https://www.quora.com/profile/Alan-Morrison

Scaling the mirrorworld with knowledge graphs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling the mirrorworld with knowledge graphs

Similar to Scaling the mirrorworld with knowledge graphs (20)

More from Alan Morrison

More from Alan Morrison (6)

Recently uploaded

Recently uploaded (20)

Scaling the mirrorworld with knowledge graphs