SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Mongo Boston 2013
Accelerate Pharmaceutical R&D with
Big Data and MongoDB

Jason Tetrault
Architect - AstraZeneca
AstraZeneca at a glance
We are a global, innovation led biopharmaceutical company
with a mission to make a meaningful difference to patient health
through great medicines and a belief that health connects us all

Global

Targeted

Collaborative

57,000 people
Sales in 100 countries
Manufacturing in 16
R&D across 3 continents
$4 bn invested in R&D
$33 bn sales in 2011

Cancer
Cardiovascular
Gastrointestinal
Infection
Neuroscience
Respiratory & inflammation

HCPs
Patients
Payers
Regulators
Partners
Local communities

Constantly anticipating and
adapting to the needs of a
changing world.

Driving continued innovation
where we can make the most
difference.

Connecting with others to
achieve common goals
in improving healthcare.

Committed to driving business success responsibly
Architect: R&D Information
What does this mean?

• Support the Researchers
• AstraZeneca has Multiple iMeds that are
focused on different areas of R&D
• Specifically, I work with the Oncology and
Infection iMeds here in Waltham
• Support different software and system
builds and / or purchases
• Looking to apply new technologies to
enable Researchers

• Core Focus:
• Next Generation Sequencing
Scaling
• IAAS
• Big Data Pilots and Exploration
Introduction of Disruptive Technology:
Step 1: Introduce Concepts

• What
• Unstructured Data
• NoSQL
• Categories (Document, Key Value, Graph)

•
•
•
•

Hadoop
Map Reduce
Horizontal Scalability
Cloud (IAAS and SAAS)

• How
•
•
•
•

Lunch and Learns
Examples (Craigslist uses this)
“Big Cookies for Big Data”
Demonstrations
Introduction of Disruptive Technology:
Step 2: Pilots

• Goals:
• We needed to show what “Unstructured Data” actually means.
• We needed to prove what these technologies can and cannot
do for us.
• Find something difficult and make it easy!
• We needed to find the best way to enable researchers.
Iterative Agile Analytics
How quickly can I make indirect associations between gene sequence
features and structural fingerprints?
Now scale up to 4M compounds, 20K
assays…and more decoration – 5to50 Tb

Data sources
Compound

JSON

Pivot
Map Reduce

Matrix

AssayResults
(300K Compounds) – 200Gb

GeneCatalog

(1.4M fingerprints) – 1Gb

• Compound with Fingerprints
• Gene sequence
• Target mappings
• Assay results

Gather

Fingerprint with
compounds

Aggregate

(500m pairs) – 81Gb

Tanimoto matrix
Gene matrix

Analyze

Target mappings

Decorate
• Easily convert to JSON and import an initial cut of data from different sources (e.g. spreadsheets,
RDBMS, …)
• Embrace unstructured data, massage it into a more useful format: Rinse, Wash, Repeat!
• Ability to decorate data, adding fields and additional datastores quickly
6
Introduction of Disruptive Technology:
Pilot Findings

• Tech Findings:
• GSON can help with weird character
conversions.
• Per Node write limits (500 per second)
but, you can save a bunch of documents
at once (Change to bulk Insert).
• Users think that even though they could
do it relationally, this was way quicker.
• Using arrays for multiple results in a doc
can be interesting.
• JSON and JavaScript is fairly natural to
technical researchers (python).

• We are not alone…
•
•
•
•

Davy Suvee
tranSMART
Seven Bridges
…
Next Generation Sequencing:
Driving Question:

Can we predict which drug is
most effective against
specific tumors?

How many other cancer types
that I have processed have the
same variation as the cancer
type I am working on?
Fairly Inaccurate Overview of Genetics
Processing
A 2 Minutes Over Simplification to a Really Hard
Problem

9
Fairly Inaccurate Overview of Genetics
Processing
Sequencing

10
Fairly Inaccurate Overview of Genetics
Processing
Sequencing

11
Fairly Inaccurate Overview of Genetics
Processing
Alignment
HG19

12

Set area descriptor | Sub level 1
Fairly Inaccurate Overview of Genetics
Processing
Down Stream Processing (Variant)

HG19

13
Can I Process 88 Whole Human Genomes?
Researcher: I would like to process 88 public Genomic Samples from of Cancer Patients. They are Whole Human
Genomes. Each patient has 2 genomic sequences, one of the tumor and one from a normal cell.

Tech:
• 200 GB raw uncompressed fastq per
experiment
• 176 Genome Pipelines to process
• Each “pipeline” runs on a m1.xlarge
• We ran 4 runs of ~3.5 days on 50 nodes
• Total processed data in the pipeline may be 5X
per experiment
• Could expand to 10X or more for more
complex pipelines
• ~86 GB result average to save
• Stored in S3 / Glacier
• Totals:
• ~171 TB Total Processed Storage
• ~14,784 hours of processing
• ~15 TB of results

Elastic HPC
Infrastructure

Scripts,
programs,
reference

Shared Storage
Compute
Amazon

StarCluster
Elastic Node Expansion
Local Storage
Processing

Result offload to S3
Transition to
Glacier
A Possible Vision for Experiment Management
NGS Data
Explants
TumorsFFPE
Tumors –
fresh frozen
Cell lines

 Patient stratification
 Biomarkers for prognosis,
drug response, safety

Expression
RNASeq

Variants
Amplicon

DNASeq

Whole
exome
Whole
genome

Coding and
non-coding
variants
Coding
variants

 Mechanism of drug
action
 Mechanism of disease
New Target ID

Inbound

Seven Bridges

GenePattern

Storage

Partners

Big Data
Store

Experiment
Management /
Metadata
Management

Services
Genome Upload /
Curation

Pipeline
Engines

Long Term
Storage

Partner
Integration

Big Data Storage
and Analytics
Lets look at a Variant …
Another Area Mongo May Help

16
VCF Format
##fileformat=VCFv4.1
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS
ID
REF
ALT
QUAL FILTER INFO
FORMAT
NA00001
NA00002
NA00003
20
14370
rs6054257 G
A
29
PASS
NS=3;DP=14;AF=0.5;DB;H2
GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51
1/1:43:5:.,.
20
17330
.
T
A
3
q10
NS=3;DP=11;AF=0.017
GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3
0/0:41:3
20
1110696 rs6040355 A
G,T
67
PASS
NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2
2/2:35:4
20
1230237 .
T
.
47
PASS
NS=3;DP=13;AA=T
GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51
0/0:61:2
20
1234567 microsat1 GTC
G,GTCT 50
PASS
NS=3;DP=9;AA=G
GT:GQ:DP
0/1:35:4
0/2:17:2
1/1:40:3

17
VCF as JSON
Header and Variant Information
{

{
"_id" : ObjectId("52617b613004b77f64efed62"),
"ALT" : [
"A"
],
"QUAL" : "29",
"NA00001" : "0|0:48:1:51,51",
"POS" : 14370,
"NA00002" : "1|0:48:8:51,51",
"FILTER" : "PASS",
"CHROM" : "20",
"NA00003" : "1/1:43:5:.,.",
"FORMAT" : "GT:GQ:DP:HQ",
"__vcfid" : "40770f6f-165a-4930-8092-05e98e4e0b27",
"ID" : "rs6054257",
"INFO" : {
"DP" : "14",
"AF" : "0.5",
"NS" : "3"
},
"REF" : "G"
}

18

"_id" : ObjectId("52617b613004b77f64efed67"),
"phasing" : "partial",
"fileformat" : "VCFv4.1",
"fileDate" : "20090805",
"source" : "myImputationProgramV3.1",
"FORMAT" : {
"Description" : ""Haplotype Quality"",
"Type" : "Integer",
"Number" : "2",
"ID" : "HQ"
},
"__vcfid" : "40770f6f-165a-4930-8092-05e98e4e0b27",
"contig" : {
"species" : ""Homo sapiens"",
"assembly" : "B36",
"md5" : "f126cdf8a6e0c7f379d618ff66beb2da",
"length" : "62435964",
"ID" : "20",
"taxonomy" : "x"
},
"INFO" : {
"Description" : ""HapMap2 membership"",
"Type" : "Flag",
"Number" : "0",
"ID" : "H2"
},
"reference" : "file:///seq/references/1000GenomesPilotNCBI36.fasta",
"FILTER" : {
"Description" : ""Less than 50% of samples have data"",
"ID" : "s50"
}
}
Query
Search Variant Ranges
// Here is our range definition
var begin = 10000;
var end = 10200;
// The Chromosome position is fuzzy in format so, we use a regex
var chromosome = ".*17$";
var variant = "A"
// Query for range and chromosome position.
db.publicvariants.find(
{"POS":{$gte: begin, $lt: end},
"CHROM":{$regex : chromosome}
})
db.variants.find(
{"POS":{$gte: begin, $lt: end},
"CHROM":{$regex : chromosome}
})
// Query for a specific variant in a range
db.publicvariants.find(
{"POS":{$gte: begin, $lt: end},
"CHROM":{$regex : chromosome},
"ALT":variant})
db.variants.find(
{"POS":{$gte: begin, $lt: end},
"CHROM":{$regex : chromosome},
"ALT":variant})

19
Wrap Up and Panel
• Panel
• Deniz Kural: Founder and CEO – SevenBridges

• Code:
• https://github.com/jjtetrault/bio-mongo

• Thanks
• Todd Nelson, Rajan Desai
• Sebastien Lefebvre, Robin Brouwer
• Sara Dempster
20
The Panel
…

21
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and
remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or
disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 2 Kingdom Street, London, W2 6BD, UK,
T: +44(0)20 7604 8000, F: +44 (0)20 7604 8151, www.astrazeneca.com

22

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsMongoDB
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsMongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Webinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDBWebinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDBMongoDB
 
MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB JourneyMongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB JourneyMongoDB
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model databaseJiaheng Lu
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategyIBM Sverige
 

Was ist angesagt? (19)

Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Mongodb
MongodbMongodb
Mongodb
 
Webinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDBWebinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDB
 
MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB JourneyMongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model database
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategy
 

Andere mochten auch

MongoDB at Medtronic
MongoDB at MedtronicMongoDB at Medtronic
MongoDB at MedtronicMongoDB
 
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...MongoDB
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB
 
Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1VulcanMinds
 
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...MongoDB
 
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, AnalyticsMongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, AnalyticsMongoDB
 
Michael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice FusionMichael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice FusionMongoDB
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBMongoDB
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDBMongoDB
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB
 
Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Valery Tkachenko
 
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDBMongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDBMongoDB
 
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineBig Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineNew York eHealth Collaborative
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big ValueThomas Kelly, PMP
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigData_Europe
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWSMatthew (정재화)
 
Insight into AstraZeneca's Technology Services.
Insight into AstraZeneca's Technology Services.Insight into AstraZeneca's Technology Services.
Insight into AstraZeneca's Technology Services.Nick Brown
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewHamdaoui Younes
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...MongoDB
 

Andere mochten auch (20)

MongoDB at Medtronic
MongoDB at MedtronicMongoDB at Medtronic
MongoDB at Medtronic
 
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
 
Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1
 
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
 
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, AnalyticsMongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, Analytics
 
Michael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice FusionMichael Poremba, Director, Data Architecture at Practice Fusion
Michael Poremba, Director, Data Architecture at Practice Fusion
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDB
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
 
Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...
 
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDBMongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB
 
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineBig Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
 
Insight into AstraZeneca's Technology Services.
Insight into AstraZeneca's Technology Services.Insight into AstraZeneca's Technology Services.
Insight into AstraZeneca's Technology Services.
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
 

Ähnlich wie Accelerate Pharmaceutical R&D with Big Data and MongoDB

Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Platforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esPlatforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esJoaquin Dopazo
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptxSilpa87
 
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaScaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaDatabricks
 
Accelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache SparkAccelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache SparkDatabricks
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesGabe Rudy
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomicsGuy Coates
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotLi Shen
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...keesvb
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Databricks
 

Ähnlich wie Accelerate Pharmaceutical R&D with Big Data and MongoDB (20)

Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Platforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esPlatforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-es
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
 
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaScaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
 
Accelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache SparkAccelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache Spark
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databases
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomics
 
Harvester I
Harvester IHarvester I
Harvester I
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Xomics brochure 2013
Xomics brochure 2013Xomics brochure 2013
Xomics brochure 2013
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 

Kürzlich hochgeladen (20)

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 

Accelerate Pharmaceutical R&D with Big Data and MongoDB

  • 1. Mongo Boston 2013 Accelerate Pharmaceutical R&D with Big Data and MongoDB Jason Tetrault Architect - AstraZeneca
  • 2. AstraZeneca at a glance We are a global, innovation led biopharmaceutical company with a mission to make a meaningful difference to patient health through great medicines and a belief that health connects us all Global Targeted Collaborative 57,000 people Sales in 100 countries Manufacturing in 16 R&D across 3 continents $4 bn invested in R&D $33 bn sales in 2011 Cancer Cardiovascular Gastrointestinal Infection Neuroscience Respiratory & inflammation HCPs Patients Payers Regulators Partners Local communities Constantly anticipating and adapting to the needs of a changing world. Driving continued innovation where we can make the most difference. Connecting with others to achieve common goals in improving healthcare. Committed to driving business success responsibly
  • 3. Architect: R&D Information What does this mean? • Support the Researchers • AstraZeneca has Multiple iMeds that are focused on different areas of R&D • Specifically, I work with the Oncology and Infection iMeds here in Waltham • Support different software and system builds and / or purchases • Looking to apply new technologies to enable Researchers • Core Focus: • Next Generation Sequencing Scaling • IAAS • Big Data Pilots and Exploration
  • 4. Introduction of Disruptive Technology: Step 1: Introduce Concepts • What • Unstructured Data • NoSQL • Categories (Document, Key Value, Graph) • • • • Hadoop Map Reduce Horizontal Scalability Cloud (IAAS and SAAS) • How • • • • Lunch and Learns Examples (Craigslist uses this) “Big Cookies for Big Data” Demonstrations
  • 5. Introduction of Disruptive Technology: Step 2: Pilots • Goals: • We needed to show what “Unstructured Data” actually means. • We needed to prove what these technologies can and cannot do for us. • Find something difficult and make it easy! • We needed to find the best way to enable researchers.
  • 6. Iterative Agile Analytics How quickly can I make indirect associations between gene sequence features and structural fingerprints? Now scale up to 4M compounds, 20K assays…and more decoration – 5to50 Tb Data sources Compound JSON Pivot Map Reduce Matrix AssayResults (300K Compounds) – 200Gb GeneCatalog (1.4M fingerprints) – 1Gb • Compound with Fingerprints • Gene sequence • Target mappings • Assay results Gather Fingerprint with compounds Aggregate (500m pairs) – 81Gb Tanimoto matrix Gene matrix Analyze Target mappings Decorate • Easily convert to JSON and import an initial cut of data from different sources (e.g. spreadsheets, RDBMS, …) • Embrace unstructured data, massage it into a more useful format: Rinse, Wash, Repeat! • Ability to decorate data, adding fields and additional datastores quickly 6
  • 7. Introduction of Disruptive Technology: Pilot Findings • Tech Findings: • GSON can help with weird character conversions. • Per Node write limits (500 per second) but, you can save a bunch of documents at once (Change to bulk Insert). • Users think that even though they could do it relationally, this was way quicker. • Using arrays for multiple results in a doc can be interesting. • JSON and JavaScript is fairly natural to technical researchers (python). • We are not alone… • • • • Davy Suvee tranSMART Seven Bridges …
  • 8. Next Generation Sequencing: Driving Question: Can we predict which drug is most effective against specific tumors? How many other cancer types that I have processed have the same variation as the cancer type I am working on?
  • 9. Fairly Inaccurate Overview of Genetics Processing A 2 Minutes Over Simplification to a Really Hard Problem 9
  • 10. Fairly Inaccurate Overview of Genetics Processing Sequencing 10
  • 11. Fairly Inaccurate Overview of Genetics Processing Sequencing 11
  • 12. Fairly Inaccurate Overview of Genetics Processing Alignment HG19 12 Set area descriptor | Sub level 1
  • 13. Fairly Inaccurate Overview of Genetics Processing Down Stream Processing (Variant) HG19 13
  • 14. Can I Process 88 Whole Human Genomes? Researcher: I would like to process 88 public Genomic Samples from of Cancer Patients. They are Whole Human Genomes. Each patient has 2 genomic sequences, one of the tumor and one from a normal cell. Tech: • 200 GB raw uncompressed fastq per experiment • 176 Genome Pipelines to process • Each “pipeline” runs on a m1.xlarge • We ran 4 runs of ~3.5 days on 50 nodes • Total processed data in the pipeline may be 5X per experiment • Could expand to 10X or more for more complex pipelines • ~86 GB result average to save • Stored in S3 / Glacier • Totals: • ~171 TB Total Processed Storage • ~14,784 hours of processing • ~15 TB of results Elastic HPC Infrastructure Scripts, programs, reference Shared Storage Compute Amazon StarCluster Elastic Node Expansion Local Storage Processing Result offload to S3 Transition to Glacier
  • 15. A Possible Vision for Experiment Management NGS Data Explants TumorsFFPE Tumors – fresh frozen Cell lines  Patient stratification  Biomarkers for prognosis, drug response, safety Expression RNASeq Variants Amplicon DNASeq Whole exome Whole genome Coding and non-coding variants Coding variants  Mechanism of drug action  Mechanism of disease New Target ID Inbound Seven Bridges GenePattern Storage Partners Big Data Store Experiment Management / Metadata Management Services Genome Upload / Curation Pipeline Engines Long Term Storage Partner Integration Big Data Storage and Analytics
  • 16. Lets look at a Variant … Another Area Mongo May Help 16
  • 17. VCF Format ##fileformat=VCFv4.1 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta ##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x> ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4 20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2 20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 17
  • 18. VCF as JSON Header and Variant Information { { "_id" : ObjectId("52617b613004b77f64efed62"), "ALT" : [ "A" ], "QUAL" : "29", "NA00001" : "0|0:48:1:51,51", "POS" : 14370, "NA00002" : "1|0:48:8:51,51", "FILTER" : "PASS", "CHROM" : "20", "NA00003" : "1/1:43:5:.,.", "FORMAT" : "GT:GQ:DP:HQ", "__vcfid" : "40770f6f-165a-4930-8092-05e98e4e0b27", "ID" : "rs6054257", "INFO" : { "DP" : "14", "AF" : "0.5", "NS" : "3" }, "REF" : "G" } 18 "_id" : ObjectId("52617b613004b77f64efed67"), "phasing" : "partial", "fileformat" : "VCFv4.1", "fileDate" : "20090805", "source" : "myImputationProgramV3.1", "FORMAT" : { "Description" : ""Haplotype Quality"", "Type" : "Integer", "Number" : "2", "ID" : "HQ" }, "__vcfid" : "40770f6f-165a-4930-8092-05e98e4e0b27", "contig" : { "species" : ""Homo sapiens"", "assembly" : "B36", "md5" : "f126cdf8a6e0c7f379d618ff66beb2da", "length" : "62435964", "ID" : "20", "taxonomy" : "x" }, "INFO" : { "Description" : ""HapMap2 membership"", "Type" : "Flag", "Number" : "0", "ID" : "H2" }, "reference" : "file:///seq/references/1000GenomesPilotNCBI36.fasta", "FILTER" : { "Description" : ""Less than 50% of samples have data"", "ID" : "s50" } }
  • 19. Query Search Variant Ranges // Here is our range definition var begin = 10000; var end = 10200; // The Chromosome position is fuzzy in format so, we use a regex var chromosome = ".*17$"; var variant = "A" // Query for range and chromosome position. db.publicvariants.find( {"POS":{$gte: begin, $lt: end}, "CHROM":{$regex : chromosome} }) db.variants.find( {"POS":{$gte: begin, $lt: end}, "CHROM":{$regex : chromosome} }) // Query for a specific variant in a range db.publicvariants.find( {"POS":{$gte: begin, $lt: end}, "CHROM":{$regex : chromosome}, "ALT":variant}) db.variants.find( {"POS":{$gte: begin, $lt: end}, "CHROM":{$regex : chromosome}, "ALT":variant}) 19
  • 20. Wrap Up and Panel • Panel • Deniz Kural: Founder and CEO – SevenBridges • Code: • https://github.com/jjtetrault/bio-mongo • Thanks • Todd Nelson, Rajan Desai • Sebastien Lefebvre, Robin Brouwer • Sara Dempster 20
  • 22. Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 2 Kingdom Street, London, W2 6BD, UK, T: +44(0)20 7604 8000, F: +44 (0)20 7604 8151, www.astrazeneca.com 22

Hinweis der Redaktion

  1. A chat about Introducing Disruptive Technology Mention PanelPlease Hold questions till the end.
  2. I am an Architect focusing on Drug R&amp;DI support Researchers, specifically in Oncology and Infection R&amp;DI focus on Next Generation Sequencing, IAASAlso running some Big Data Pilots.
  3. Introduce Disruptive Concepts: StressUnstructured DataNoSQL + Map ReduceDocument Store and MongoHow: Lunch and Learns, Big Cookies For Big Data
  4. Make a good pilot. Show what Unstructured data is. See what you can do. Make something difficult easy.
  5. Run through tech findingsWe are not alone…Mention PanelAsk if anyone else is using Mongo in this space.
  6. Focus on closing out Introduction of Disruptive TechnologyGenomes Sequencing is getting cheaper.More public samples are available.Big Data tools can be used to sift through some of these bigger questions.
  7. Make joke about “All set, ready to work on this?”A Whole Human Genome is the whole thing. They tend to be 10 – 20 X the size of a “Standard Human Experiment”Rattle off numbers,
  8. Make sure you talk about MetaData and Big Data StoreWrap Up VisionNow, lets talk about VCF
  9. Now lets look at a variantRemember that question I had earlier: How do I ask my driving question.