See the 2,456 pharmacies on the National E-Pharmacy Platform
Â
2020-03-08 Atul Butte's keynote for the AMIA Virtual Informatics Summit
1. Precisely Practicing Medicine from
700 Trillion Points of
University of California Health Data
Atul Butte, MD, PhD
Chief Data Scientist, University of California Health (UC Health)
Priscilla Chan and Mark Zuckerberg Distinguished Professor
Director, Bakar Computational Health Sciences Institute, UCSF
atul.butte@ucsf.edu â–Ş @atulbutte
2. Conflicts of Interest
• Scientific founder and
advisory board membership
– Genstruct
– NuMedii
– Personalis
– Carmenta
• Honoraria for talks
– Lilly
– Pfizer
– Siemens
– Bristol Myers Squibb
– AstraZeneca
– Roche
– Genentech
– Warburg Pincus
– CRG
– AbbVie
– Westat
• Past or present consultancy
– Lilly
– Johnson and Johnson
– Roche
– NuMedii
– Genstruct
– Tercica
– Ecoeos
– Helix
– Ansh Labs
– uBiome
– Prevendia
– Samsung
– Assay Depot
– Regeneron
– Verinata
– Pathway Diagnostics
– Geisinger Health
– Covance
– Wilson Sonsini Goodrich & Rosati
– Orrick
– 10X Genomics
– GNS Healthcare
– Gerson Lehman Group
– Coatue Management
• Corporate Relationships
– Northrop Grumman
– Genentech
– Optum
– Aptalis
– Allergan
– Astellas
– Thomson Reuters
– Intel
– SAP
– SV Angel
– Progenity
– Illumina
• Speakers’ bureau
– None
• Companies started by students
– Carmenta
– Serendipity
– Stimulomics
– NunaHealth
– Praedicat
– MyTime
– Flipora
– Tumbl.in
– Polyglot
– Iota Health
– Ongevity Health
9. Artificial Intelligence and Machine Learning
• Artificial Intelligence: aspects of human intelligence modeled
by computers
• Machine Learning: implementing aspects of AI through
processing data
– Supervised or unsupervised learning
• Deep Learning: one type of ML, modeling brain architecture
with layers of individual classifiers, adding non-linearity
9
10.
11. Why is AI back in biomedicine?
• Great hardware from video gamers
• Software libraries with novel machine learning methods
• More big data sets to train with
– Molecular data, genomics on more people, more cases
– Electronic health records
• Hard unsolved questions need answers
– Patient predictions
– Efficient drug discovery
– Population modeling
15. 227 million substances x
1.3 million assays
More than a billion measurements
within a grid of 300 trillion cells
71 million meet Lipinski 5
1.2 million active substances
16. The next big open data: clinical trials
Download 300+ studies today
Drug repositioning, new patient subsets,
digital comparative effectiveness, more!
immport.org
Sanchita Bhattacharya
Zicheng Hu
Elizabeth Thomson
Patrick Dunn
17. The United States is spending $billions on electronic
health records, and too few are using any of this data
18. University of California
• 10 campuses and 3 national labs
• ~200,000 employees, ~250,000 students/yr
UC Health
• 18 health professional schools (6 med schools)
• Train half the medical students and residents in
California
• ~$2 billion NIH funding
• $12+ billion clinical operating revenue
• 5000 faculty physicians, 12000 nurses
• UCSF and UCLA are in US News top 10
• 5 NCI Comprehensive Cancer Centers, 5 NIH CTSA
19.
20. UC Health Data Analytics Platform
Combining healthcare data from across the
six University of California medical schools and systems
Health Data Warehouse
21. • Combined EHR data from UCSF, UCLA, UC Irvine, UC Davis,
UC San Diego, and UC Riverside
• Total 15 million patients with an MRN, basic diagnosis data
• Central database built using OMOP (not Epic) as a data backend
– Structured data from 2012 to the present day
– 5.3 million patients with “modern” data
– 125M encounters, 416M procedures, 626M med orders, 627M diagnosis
codes, 1.3B lab tests and vital signs,
– OSHPD data, pathology and radiology text elements, death index
– Claims data from our self-funded plans now included
– Continually harmonizing elements
• Quality and performance dashboards
The University of California has an
incredible view of the medical system
22. Why Use OMOP?
• Optimized for analytics rather than
operations
• Data harmonized to national standards
• LOINC, RxNORM, SNOMED-CT, ICD 10, CPT
• Scalable, extensible, backwards
compatible data model
• Supports easy code sharing between sites
• Tool Set and Standard Queries available
from OHDSI
• Achilles: Data Quality Assessment Tool
• No further reliance on Epic
• The same OMOP-driven system is used for
operational and research uses
Aiden Barin
Ayan Patel
Charles Wilson
Lisa Dahm
23. Data Platform
- Tableau – Data Visualization
- Dashboards, Data Exploration, Trending
- Collibra – Data Governance
- Data Definitions, Terminology Mapping
- Value Set Management with Quality Measures
- OMOP (Observational Medical Outcomes Partnership)
- Observational Health Data Science and Informatics
- Standards-based model to harmonize data
- Consortium of industry, government, and academia
- Extract and convert data from Epic Clarity
- Microsoft SQL Server – Database
- Azure – Cloud-based user platform
- Statistics – R, Jupyter, Julia, and commercial sw
Aiden Barin
Ayan Patel
Charles Wilson
Lisa Dahm
24. All five University of California academic medical centers provide
health data to patients through FHIR (and Apple Health)
bit.ly/ucappleh
25.
26. “She has played a leading role working on
health privacy matters for the FTC, including
organizing the FTC’s seminar on Consumer
Generated and Controlled Health Data… Her
law enforcement actions include the
Commission’s settlement with Facebook.”
Our new University of California Health
Chief Health Data Officer: Cora Han
33. Post-approval safety
Updating side effect rates
Discovering novel side effects
Supporting regulatory approval
Single-arm experimental trials
”Digital approvals”
Biosimilar development
Informing clinical trials design
Better patient selection
Trimming the trials: more efficient data collection
Continually establishing efficacy
Assessing the efficacy-effectiveness gap
Searching for efficacy in specific populations
Effect modifiers and precision medicine
Long-term, post trial outcomes
Comparative effectiveness
Integrating costs with comparative effectiveness
Understanding effects of pharmacy practices on healthcare utilization
Studying novel on-label pharmaceuticals versus older off-label drugs
Studying the practice of medicine
Quality of practice, medical errors
Standardizing care and care delivery
The effect of payors on medical care
Are new-generation diagnostics improving outcomes?
Data driven decision support
Clinical-Decision Support: the Provider Perspective
Clinical-Decision Support: the Patient Perspective
Clinical-Decision Support: the Community Perspective
Twenty-one uses for Real World Data
Vivek Rudrapatna
bit.ly/jci_RWD
34. How will research access work?
• Researchers should first write and optimize OMOP SQL queries
locally, on their own campuses
• When ready to scale (and authorized), we spin up a virtual machine
for the researcher, populated with common tools
– Electronically sign a UC Health data use agreement
– R, Tableau, Jupyter Notebooks, Julia, SQL, Windows or Linux available
• Upload your scripts and run, but cannot download data
• Safe, respectful, regulated research use of clinical data
Fully identified
“Legally
deidentified”Limited data sets
35. Enabling UC researchers and patients to go beyond… machine
learning in a safe, respectful, fair, equitable way in medicine
go.nature.com/2Gfa84U4
38. What could we do with clinical data?
• Clinical researcher at UCLA could run a genome wide association study across UC Health
• Mobile health researcher at UCSD can enable patients to contribute data for research
• Community activist and researcher UC Merced can study environmental factors
contributing to health and disease
• Transplant patient at UC Irvine can download all their data across UC Health
• Data scientist at UC Santa Barbara can model development of Alzheimer's disease and build
a multi-modal predictor
• App designer at UC Riverside can show patients their choices with chronic disease
• CMO at UCSF can build predictive models for readmission, test, share across UC Health
• AI researcher at UC Berkeley can build deep-learning models for image-based diagnostics
• Health services researcher at UC Davis can build predictive models for drug efficacy, and
maybe enable pay-for-performance
• Cancer genomics researcher at UCSC can study all our clinical cancer genomes
42. AI/ML lessons I’ve learned over 20 years
• Get the question right; solve problems that health care professionals need solved, don't guess
– And verify good questions and good unmet needs with more than one doc
– Build a great diagnostic vs. understanding the biology
• Perfectly lassoed variables may miss the big picture biology
– Medical professionals (and biologists) really love explanations over black boxes
• Watch out for input limiting models
– Patients might not type in the right codes for their symptoms
– They barely enter their own race/ethnicity
– And docs?
• Learn what IRB, HIPAA, BAA are. Learn what ICD-10 and CPT codes are. CLIA and CAP.
– And learn patience.
– Not all of us are cloud allergic.
• Not everything needs deep learning
• Having all data on everyone is super rare: genomics, images, and longitudinal EHR data
• Data integration and harmonization can happen if there is a business reason for it
• $1 trillion dollars is spent on Medicare and Medicaid. Would your parents would use it?
• Health care inefficiency is not about friction, it’s about resistance to change
44. Collaborators
• Elizabeth Thomson, Patrick Dunn / Northrop
Grumman
• Quan Chen, Dawei Lin / NIAID
• Andrei Goga / UCSF Oncology
• Mallar Bhattacharya / UCSF Pulmonary
• Minnie Sarwal / UCSF Nephrology
• Geoffrey Gurtner / UCSF Surgery
• Roberta Diaz Brinton / Arizona
• Carol Bult / Jackson Labs
• Alejandro Sweet-Cordero, Julien Sage /
Pediatric Oncology
• Takashi Kadowaki, Momoko Horikoshi, Kazuo
Hara, Hiroshi Ohtsu / U Tokyo
• Kyoko Toda, Satoru Yamada, Junichiro Irie /
Kitasato Univ and Hospital
• Shiro Maeda / RIKEN
• David Stevenson, Gary Shaw / Stanford
Neonatology
• Mark Davis, C. Garrison Fathman / Stanford
Immunology
• Russ Altman, Steve Quake / Stanford
Bioengineering
• Euan Ashley, Joseph Wu / Stanford Cardiology
• Mike Snyder, Carlos Bustamante, Anne Brunet
/ Stanford Genetics
• Jay Pasricha / Stanford Gastroenterology
• Rob Tibshirani, Brad Efron / Stanford Statistics
• Hannah Valantine, Kiran Khush/ Stanford
Cardiology
• Mark Musen, Nigam Shah / National Center for
Biomedical Ontology
• Sam So, Ken Weinberg, David Miklos /
Stanford Oncology
45. Support
Admin and Tech Staff
• Andrew Jan
• Mounira Kenaani
• Kevin Kaier
• Boris Oskotsky
• Mae Moredo
• Ada Chen
• University of California, San Francisco
• Priscilla Chan and Mark Zuckerberg
• Barbara and Gerson Bakar Foundation
• NIH: NIAID, NLM, NIGMS, NCI, NHLBI, OD; NIDDK, NHGRI, NIA, NCATS, NICHD
• Intervalien Foundation, Leon Lowenstein Foundation
• March of Dimes, Juvenile Diabetes Research Foundation
• California Governor’s Office of Planning and Research
• Howard Hughes Medical Institute, California Institute for Regenerative Medicine
• Hewlett Packard, L’Oreal, Progenity, Genentech
• Scleroderma Research Foundation
• Clayville Research Fund, PhRMA Foundation, Stanford Cancer Center, Bio-X, SPARK
• Tarangini Deshpande
• Kimayani Butte
• Talmadge King and Mark Laret and Carrie Byington
• Sam Hawgood and Keith Yamamoto
• Isaac Kohane