Overview of GSK Machine Learning and Artificial Intelligence activities, by Kim Branson, SVP and Head of AI at GSK Pharma, November 3rd, 2021. AI methods are becoming widely used due to the exponential nature of data generation. AI is used to collect the data, process it, derive causal relations. AI is being used to aid design the next experiment in an efficient manner. (RL, Bandits ..). The exponential nature of data improves AI in a virtuous cycle. Target discovery: integration of Functional Genomic, Genetic and other data and other sources for target discovery.
Companion Software: for each asset we we will generate software for stratification, and individual response prediction
Fundamental AI Research: Fundamental research into causal machine learning, automated machine learning, and multi modal data combination. We are developing a feedback loop for each AI system we build. We have best in industry full automated discovery biology robotics. We ask the model what data it needs. We only know what to do with 15% of the genetic variants we obtain from genetic association studies. How do we unlock all the value of our investments in genetic data? We build AI for Variant to Gene Prediction: It transforms a complex genetic locus, To a ranked list of candidate genes with confidence bounds, That are tested experimentally through Functional Genomics. Variant to Gene AI: A multi AI system for solving the variant to gene problem. Teaching our AI what we know about the world- Internal and external data, GSK AI team developed a custom NLP model for biomedical data, Knowledge Graph of all data. Data becomes a critical factor for AI success. Private Data Sources, Generate data allow us determine the Value of other public / private sources. Models trained on private and public Data are unique. Common Public data sources. Moving Beyond medical records for cohort definition. Image Derived Phenotype (IDP) discovery & generation using AI/ML. Computational companion diagnostics and learning from clinical trials. Focusing on Computational Pathology- Applying the advances in AI for image analysis. Tissues are collected as part of the biopsy for pathology. Digital versions of these H&E slides as a tool for diagnosis/prognosis by human pathologist. What else can we do with this image data? Genetic differences are not human discernible. Currently determined by sequencing the tumor. Should we be constrained by human ability? AI can determine HRD genetic status from image.
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
AI at GSK_Kim Branson_mHealth Israel
1. AI at GSK
Dr Kim Branson
SVP Global Head of Artificial
Intelligence and machine
learning
2. GSK Machine Learning and Artificial Intelligence
– GSK AI Team Established in 2019
– Distributed between San Francisco, London, Boston,
Heidelberg, Philadelphia, Tel Aviv..
– See our website at gsk.ai
– Total size ~120 team members
– GSK.ai Fellowship
– Strategic relations for computation
3. AI can assist in almost every aspect of the process
– A methods are becoming widely used due to the exponential nature of data generation
– AI is used to collect the data, process it, derive causal relations
– AI is being used to aid design the next experiment in an efficient manner. (RL, Bandits ..)
– The exponential nature of data improves AI in a virtuous cycle.
3D super-resolution microscopy, DeepSequence for variant calling, 3D cell segmentation and tracking animal behaviour through video analysis..
4. The GSK AI group has 3 main areas of focus
Insert your date / confidentiality text here 4
Target discovery: integration of Functional
Genomic, Genetic and other data and
other sources for target discovery.
Companion Software: for each asset we
we will generate software for stratification,
and individual response prediction
Fundamental AI Research: Fundamental
research into causal machine learning,
automated machine learning, and multi
modal data combination.
5. Active learning for model development.
But we can’t just generate all the data.
Functional
Genomics
Update AI Model
Make Predictions
Generate Data
– We are developing a feedback
loop for each AI system we build.
– We have best in industry full
automated discovery biology
robotics
– We ask the model what data it
needs.
Experiments as code
6. Target Discovery
The variant to gene problem
We created an AI to solve this problem
Gene A Gene B Gene C Gene C
Protein variant
Which Gene (C, or A) ?
Problem
• We only know what to do with 15% of the genetic variants we obtain from genetic
association studies.
• How do we unlock all the value of our investments in genetic data ?
protein region
regulatory
7. We build AI for Variant to Gene Prediction
Insert your date / confidentiality text here 7
It transforms a
complex genetic
locus
To a ranked list of candidate
genes with confidence
bounds
That are tested
experimentally through
Functional Genomics
Experimental feedback loop to validate the predictions
8. A multi AI system for solving the variant to gene problem
What is the Variant to Gene AI ?
ranked list of genes
Variant to Gene Model
coding variant model knowledge graph model
functional representation node embeddings chromatin representation
hand-engineered features
hand-curated features
DNA stacked embedding
ATCCGTATAACCCGTGGATACG
causality models
DeepFxGWAS basenji
brontosaurus
causal features
causal GWAS priors
cell similarity matrix chromatin representation
Feed
back
loop
9. Teaching our AI what we know about the world
Represent all external and internal biological knowledge
Insert your date / confidentiality text here 9
Knowledge Graph of all data
Poly(ADP-ribose) polymerase
is implicated in DNA repair
and transcription regulation.
protein
function
PARP
DNA repair
has_function
(PARP, has_function, DNA repair)
Scale: ~500B triples
Scale: ~35m articles
Internal and external data
Parse new data daily GSK NLP Model
0.530
0.683
0.782
0
0.2
0.4
0.6
0.8
1
SciSpacy BioBERT v1.1 GSK-BERT
v1.0
NER F1 on MedMentions
Dataset
GSK AI team developed a custom NLP
model for biomedical data
10. Data becomes a critical factor for AI success
Without unique data algorithmic advances are the only differentiator
Common Public data sources
Models trained on private and public
Data are unique
Private Data Sources
Generate data allow us determine the
Value of other public / private sources
11. Moving Beyond medical records for cohort definition
Image Derived Phenotype (IDP) discovery & generation using AI/ML
HPC
Structural CT / MRI
Image Data (DICOM)
IDP Generation using
using CNN / RNN
Image QC +
Preprocessing
Post image analysis
validation criteria
GSK CONFIDENTIAL
12. Software for every asset
D/RNA Seq Imaging Medical History Pathology Microbiome
Integration layer
Probability of response to therapy
Integration of multi modal data with a differing temporal dimension.
Computational companion diagnostics and learning from clinical trials
13. – Tissues are collected as part of the biopsy for pathology
– Digital versions of these H&E slides as a tool for diagnosis/prognosis by human pathologist
– What else can we do with this image data ?
Applying the advances in AI for image analysis
Focusing on Computational Pathology
Insert your date / confidentiality text here 13
Data Size
Low res: 1.5GB
Full res: 3.75 TB
32 second scan time at 40x
magnification (Aperio GT450)
14. Genetic differences are not human discernible.
14
Negative sample (HRP) Positive sample (HRD)
Homologous repair deficiency is a genetic status of the tumor
Currently determined by sequencing the tumor
15. Are the vertical lines parallel ? Are the horizontal lines are parallel ?
Should we be constrained by human ability?
A simple test; determining parallel lines.
16. AI can determine HRD genetic status from image
XXXX 0.73
Drug Trial Positive Predictive Value
Perfect
classifier
Random
Classifier
What is the AI thinking ?
73% percent of the time when we say a slide is HRD+ve we are correct
17. Dr Katie Aiello (Dr)
Dr Nick Person (Snr Dr)
Dr Shane Lewin (VP)
A globally distributed organization
The GSK AI Team
SF Philly Boston London
Heidelburg Tel Aviv
Dr Anne Cocos (Dr) Dr Jeremy England (Snr Dr)
Dr Jiang Zhu (Dr)
Dr Kalin Vestigan (Dr)
Dr Jiajie Zhang (Snr Dr)
Dr Hagen Trindl (Mgr)
Dr Stephen Young (Dr)
Steve Crossan(VP)
Patrick Schwab (Dr) Dr Lena Granovsky (Dr)
Petr Votava (Dr)
Key Leaders of the work presented
Hinweis der Redaktion
More data.
Help interpret
Help do the experiment.
Design of experiment
3D super-resolution microscopy, DeepSequence for variant calling, 3D cell segmentation and tracking animal behaviour through video analysis. These tasks involve high-dimensional input data, an agreed taxonomy of labels, and large volumes of data that is becoming cheaper to produce at scale. As such, ML methods can likely help classify, cluster and generate novel data points for these tasks. If we sample Nature, Nature Methods, Nature Biotechnology and Nature Medicine papers over the last 5 years, we can can see that in fact over 80% of papers mentioning “artificial intelligence” or “deep learning” were published in the last 2 years alone!
Whats the key message. Cant sample all the world.
Whats the key message. Cant sample all the world.
We are currently building an internal pipeline to generate NAFLD / NASH IDPs by training liver segmentation & QC models with AI/ML and combining with map generation using UK Biobank imaging data & labels
IDPs can range from organ / lesion segmentation, image maps to scalar quantities derived from raw imaging data
This clinical imaging pipeline building capability can be generalized and extended into large-scale neuroimaging studies for PD / AD; i.e. UK Biobank has large-scale brain imaging studies