SlideShare ist ein Scribd-Unternehmen logo
1 von 42
How to interpret your
own genome.
C. Titus Brown
ctbrown@ucdavis.edu
@ctitusbrown
http://ivory.idyll.org/blog/
Second in my ongoing attempt to explain what I actually do to Terry Peppers.
Some basic facts about
DNA
The primary DNA sequence consists of strings of A, C, G, and T.
Most human cells contain approximately 6 billion of these.
They are divided into 23 chromosome pairs.
These chromosomes are the primary unit of heredity.
http://classes.biology.ucsd.edu/bimm110.SP07/lectures_WEB/L08.05_Cytogenetics.htm
How DNA is interpreted –
“It’s complicated.”
http://www.exploringnature.org/db/detail.php?dbID=106&detID=2454
How inheritance & generation
of variation works
http://genetics.thetech.org/ask/ask435
+ approximately 300-
600 mutations
per generation
If we knew a person’s genome
sequence perfectly…
We still wouldn’t know all that much!
We could correlate variation between genomes with
diseases.
We could identify parentage and genetic inheritance.
We could probably identify ethnic origin.
We could find known “mistakes” or problems.
But… why wouldn’t we know
that much?? Isn’t the genome
the person?
Let’s ignore environmental factors, first of all…
Imagine…
…you’re locked in a room, with feral lawyers roaming
around outside;
You have a bunch of source code on a stack of CDs
to understand;
And you’ve been given a Windows 98 machine with
Python installed.
(see David Beazley, “Discovering Python”, PyCon
2014)
This talk came partly from listening to his talk…
This “locked room” problem is a
pretty good analogy to genomics!
“Here are 3 billion characters of DNA! Go
figure out what it all means!”
It’s like the previous locked room problem, and:
The code is all written in Perl 8, for which neither a
specification or software interpreter exists.
But you have access to the Internet and a world-wide
collection of other scientists, and (some of) their data and
papers.
Oh, and: the answers hold the keys to life and death.
Genomes are still useful! How
do we find sequence?
Primary approach for human genomes is: spend a lot of money
sequencing one, or a few; use that as reference.
Initial cost: $2.7 bn (in 1991)
Current human genome reference is from 13 anonymous
volunteers in Buffalo, NY (Wikipedia ;)
Older technology: identify points of variation, then target for
further investigation.
Current technology: sequence. (The rest of this talk.
Next technology: longer reads. (Sequence more, better.)
Working with short read
sequencing - overview
Sequence Map
Call
variants
Interpret
Working with short read
sequencing - sequencing
Need about 250 ng of DNA at 2 ng/ul.
“Under $1,000 dollars”
http://biome.biomedcentral.com/welcome-to-the-1000-
genome/
…some up front investment required :)
Sequence Map
Call
variants
Interpret
Working with short read
sequencing - sequencing
Sequence Map
Call
variants
Interpret
@D00360:18:H8VC6ADXX:1:1103:1434:46766/1
AACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCAACCCTCAACTATCACACA
+
@@@DDDDDFHHFHHIIIBHGIIDGIA;EDGD@CG@FDDEFFB@DCGHGGIG8CHGD
Raw data looks something like this (x 2 bn)
Mapping: locate sequences in
referencehttp://en.wikipedia.org/wiki/File:Mapping_Reads.png
Sequence Map
Call
variants
Interpret
=> BAMFASTQ =>
Variant detection after mapping
http://www.kenkraaijeveld.nl/genomics/bioinformatics/
Sequence Map
Call
variants
Interpret
BAM => => VCF
Working with short-read
sequencing – annotate variants
Is it a variant known to have an effect?
Is it in a gene?
Is it in a gene and does it have some “obvious” effect (e.g.
breaking the gene)?
Has it been associated with some effect?
Sequence Map
Call
variants
Interpret
Pipeline, approaches, formats,
technologies.
Sequence Map
Call
variants
Interpret
Illumina BWA
Samtools
FreeBayes
VEP
SNPedia
Gemini bcbio 
See http://ivory.idyll.org/blog/2015-pycon-talk.html for details.
~1500 hours ~12 hours~100 hours
An example data set
Sequences from a “trio” (son, father, mother) of Ashkenazi
Jews are available, together with medical records (see links
in blog post).
The Ashkenazim branched off from other Jews ~2500 years
ago, flourished during Roman Empire, then “went through a
'severe bottleneck' as they dispersed, reducing a population
of several million to just 400 families who left Northern Italy
around the year 1000.”
http://en.wikipedia.org/wiki/Ashkenazi_Jews#Genetics
“Raw” human data:
BAM file: 108 GB
(contains sequences + quality scores)
+ human genome (~3 GB or so)
+ lots of databases of varying size.
Full instructions at:
http://ivory.idyll.org/blog/2015-pycon-talk.html
Working with short-read
sequencing – mapping.
Software such as BWA takes in a reference genome and a
set of reads and yields tab-delimited output:
D00360:37:HA3HMADXX:1:2104:14000:62852 163 chr22
16050001 15 87S8M1I10M1D41M1S =
16050476 621 CCA…. 3((…
This contains information about where each read maps, how
well it maps, etc.
Sequence Map
Call
variants
Interpret
Most parts of the genome are
sampled many times (~50,
here)
HG002 data set
Sequence Map
Call
variants
Interpret
Calling variants w/FreeBayes
https://github.com/ekg/freebayes
Sequence Map
Call
variants
Interpret
Working with short-read
sequencing – annotate variants
HG002 data setVariants annotated with VEP using Gemini.
Sequence Map
Call
variants
Interpret
Most differences are
~uninterpretable!
Total variants: 5,562,545
Between genes: 3,032,670
Between parts of genes
(exons): 2,014,962
Remaining: 514,913
(Only 2% of human genome
makes genes; maybe ~5% of
genome thought to be functional)
HG002 data set
OK, you’ve got your variants –
now what??
HT to Slate Star Codex,
http://slatestarcodex.com/2014/11/12/how-to-use-23andme-irresponsibly/
Chasing down a disease-
related variant: Canavan
disease.
http://www.snpedia.com/index.php/Rs12948217
chr17:3397702 (hg19) in HG002 sample (son)
The son and both parents
are heterozygous (1/2) for
this – they are carriers,
but not afflicted with
disease.
¼ of their children would
have homozygous allele
and probably be affected
by Canavan’s Disease:
“Children who inherit two
copies of the gene
appear normal at birth,
but between three and
nine months of age they
begin to show symptoms
... These children cannot
sit, crawl, or talk, and few
live past age 10.”
http://www.snpedia.com/index.php/Can
ease
Challenges in actually
interpreting – “version hell”.
Variant is actually a T.
Snpedia says A is the problematic variant, but that’s on
hg38.
On hg19, which is what variants were called on, relevant
gene is on reverse strand so T => A.
Human migrations into Europe (~40kya – fall of Roman Empire)
Veeramah and Novembre, doi:10.1101/cshperspect.a008516
Veeramah and Novembre, doi:10.1101/cshperspect.a008516
Human genetic comparisons overlayed on map of Europe.
Predicting new disease
variants:Can we find associations between variants and diseases?
“Genome Wide Association Study (GWAS)”
Wellcome Trust CCT, 2007,
doi:10.1038/nature05911
…cautions of GWAS:
Need to account for relatedness in samples;
Large sample sizes needed;
Complex statistics needed & “multiple testing” issues;
Different identifier/database mixtures;
Correlation is not causation;
Large effects are rare – typically many small signals
combined.
The data science problem from hell!
Where next?
Short-term: next 2-5 years
Medium-term: 10 years
Long-term: 20 years+
Short term
Lots more data! “Millions to billions of human
genomes” coming.
Individual data – est 300,000 human genomes
sequenced in 2014.
Tumor and somatic data.
Time course data (“narcissome”) - Mike Snyder
Newer sequencing data types – e.g. longer reads.
see: http://www.nature.com/news/the-rise-of-the-narciss-ome-1.10240
Short-term software
problems
Increasingly many open source Python projects
(bcbio, Gemini);
Help with integration between tools (dependency
hell, versioning hell);
Optimization of specific approaches not so
important.
Lack of concordance => technical problem.
General speed ~meh
Flexible and robust libraries still maturing.
Medium term
We’ll be sequencing everything all the time (but still
won’t really know what it means); => data integration
and data mining.
Large scale sequencing is rapidly being extended to
agriculture, ecology, and veterinary medicine.
We will soon be able to “edit” whatever genomes we
want (check out CRISPR), but will not have a good
idea of what to actually edit (c.f. Perl8 analogy,
above).
Read up on “gene drive” if you want the bejeezus scared out of you:
http://news.sciencemag.org/biology/2015/03/chain-reaction-spreads-gene-
through-insects
Longer term
No one knows.
We’ve only had large scale sequencing & the human
genome for ~15 years!!
Free associate the following:
cheap sequencing; quantified self; Internet of Things.
How to get involved?
A lot of the software is open source!
(bwa, samtools, etc. etc.)
…but:
Warning: genomics is large, and deep, and largely invisible, and
has its own culture.
Sadly, your best bet is probably to come do a PhD with someone like me, for
free.
(just kidding! …)
bcbio and Gemini
Help with:
Gemini: SQLite to PostgreSQL conversion;
Gemini: “bigwig” parsing performance;
bcbio: improving use & cleanliness of Cloud port
bcbio: moving to Common Workflow Language (note,
reference implementation in Python)
See talk blog post at http://ivory.idyll.org/2015-pycon-
talk.html for more info.
How can you sequence your
own genome?
Most genetic testing services (23andme, etc.) don’t
actually sequence your 6 billion bases of DNA; they
instead use a more targeted approach and look at
common variants or known disease variants.
If it costs < $1000, they’re not actually sequencing you :)
DNA extraction, etc, is fairly straightforward if you have
access to a lab and the necessary expertise.
Main suggestion: see http://www.personalgenomes.org/
Thanks for coming!
Please see links to data, instructions, and more reading at
http://ivory.idyll.org/blog/2015-pycon-talk.html

Weitere ähnliche Inhalte

Was ist angesagt?

Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.Yaniv Erlich
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest RelativeMark Pallen
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to BiotechnologyDoug Jones
 
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC PragueGenome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC PraguePat (JS) Heslop-Harrison
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanMark Pallen
 
Dna of human and great ape
Dna of human and great apeDna of human and great ape
Dna of human and great apeLekshmiJohnson
 
Why we should clone extinct animals
Why we should clone extinct animalsWhy we should clone extinct animals
Why we should clone extinct animalsMorganScience
 
L14 human genome
L14 human genomeL14 human genome
L14 human genomeMUBOSScz
 
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human diseaseUsing of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human diseaseTassanee Lerksuthirat
 
The Genographic Project 2015
The Genographic Project 2015The Genographic Project 2015
The Genographic Project 2015Family Tree DNA
 
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison MalaysiaChromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison MalaysiaPat (JS) Heslop-Harrison
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
XenotransplantionAchyut Bora
 
Superdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing cropsSuperdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing cropsPat (JS) Heslop-Harrison
 
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...Dan Graur
 
The language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimesterThe language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimesterSofia Paz
 
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-HarrisonDomestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-HarrisonPat (JS) Heslop-Harrison
 

Was ist angesagt? (20)

Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to Biotechnology
 
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC PragueGenome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Dna of human and great ape
Dna of human and great apeDna of human and great ape
Dna of human and great ape
 
Why we should clone extinct animals
Why we should clone extinct animalsWhy we should clone extinct animals
Why we should clone extinct animals
 
Project powerpoint
Project powerpointProject powerpoint
Project powerpoint
 
Heterologous expression lecture
Heterologous expression lectureHeterologous expression lecture
Heterologous expression lecture
 
L14 human genome
L14 human genomeL14 human genome
L14 human genome
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human diseaseUsing of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
 
Bliss
BlissBliss
Bliss
 
The Genographic Project 2015
The Genographic Project 2015The Genographic Project 2015
The Genographic Project 2015
 
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison MalaysiaChromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
Xenotransplantion
 
Superdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing cropsSuperdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing crops
 
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
 
The language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimesterThe language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimester
 
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-HarrisonDomestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
 

Andere mochten auch

Luxury presentation
Luxury presentationLuxury presentation
Luxury presentationlmeneley
 
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09Eyeblaster Spain
 
Celebrating 30 years
Celebrating 30 yearsCelebrating 30 years
Celebrating 30 yearskfitzsy
 
How to do windows movie maker?
How to do windows movie maker?How to do windows movie maker?
How to do windows movie maker?jessecadelina
 
Eyeblaster Analytics Bulleting Online Video
Eyeblaster  Analytics  Bulleting  Online VideoEyeblaster  Analytics  Bulleting  Online Video
Eyeblaster Analytics Bulleting Online VideoEyeblaster Spain
 
e-book: Social Business Now
e-book: Social Business Nowe-book: Social Business Now
e-book: Social Business NowSanne Heerink
 
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...Alexander Lavrov
 
Point Dynamics Our Story
Point Dynamics   Our StoryPoint Dynamics   Our Story
Point Dynamics Our Storyguestc8ec941c
 
Presentation Flazznet
Presentation FlazznetPresentation Flazznet
Presentation FlazznetSusy Rizky
 
Getting results when working with english result
Getting results when working with english resultGetting results when working with english result
Getting results when working with english resultemege68
 
Ondernemen in de toekomst
Ondernemen in de toekomstOndernemen in de toekomst
Ondernemen in de toekomstPiet van Vugt
 
Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012Piet van Vugt
 
Passivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederPassivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederBertel Bolt-Jørgensen
 
MicroMedia B2B / case Rocla
MicroMedia B2B / case RoclaMicroMedia B2B / case Rocla
MicroMedia B2B / case RoclaAntti81
 
Loco Legacy Mini-Update
Loco Legacy Mini-UpdateLoco Legacy Mini-Update
Loco Legacy Mini-Updateguest2cd8a3
 
Personnel Planning &amp; Recruiting
Personnel Planning &amp; RecruitingPersonnel Planning &amp; Recruiting
Personnel Planning &amp; Recruitingabir014
 

Andere mochten auch (20)

Morsø erhversråd energimærkning
Morsø erhversråd   energimærkningMorsø erhversråd   energimærkning
Morsø erhversråd energimærkning
 
Luxury presentation
Luxury presentationLuxury presentation
Luxury presentation
 
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
 
Celebrating 30 years
Celebrating 30 yearsCelebrating 30 years
Celebrating 30 years
 
How to do windows movie maker?
How to do windows movie maker?How to do windows movie maker?
How to do windows movie maker?
 
Eyeblaster Analytics Bulleting Online Video
Eyeblaster  Analytics  Bulleting  Online VideoEyeblaster  Analytics  Bulleting  Online Video
Eyeblaster Analytics Bulleting Online Video
 
e-book: Social Business Now
e-book: Social Business Nowe-book: Social Business Now
e-book: Social Business Now
 
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
 
Point Dynamics Our Story
Point Dynamics   Our StoryPoint Dynamics   Our Story
Point Dynamics Our Story
 
Presentation Flazznet
Presentation FlazznetPresentation Flazznet
Presentation Flazznet
 
Getting results when working with english result
Getting results when working with english resultGetting results when working with english result
Getting results when working with english result
 
Ondernemen in de toekomst
Ondernemen in de toekomstOndernemen in de toekomst
Ondernemen in de toekomst
 
Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012
 
Passivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederPassivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og muligheder
 
Vizerra 2010
Vizerra 2010Vizerra 2010
Vizerra 2010
 
MicroMedia B2B / case Rocla
MicroMedia B2B / case RoclaMicroMedia B2B / case Rocla
MicroMedia B2B / case Rocla
 
Br10 sommerhus
Br10 sommerhusBr10 sommerhus
Br10 sommerhus
 
إلى
إلىإلى
إلى
 
Loco Legacy Mini-Update
Loco Legacy Mini-UpdateLoco Legacy Mini-Update
Loco Legacy Mini-Update
 
Personnel Planning &amp; Recruiting
Personnel Planning &amp; RecruitingPersonnel Planning &amp; Recruiting
Personnel Planning &amp; Recruiting
 

Ähnlich wie 2015 pycon-talk

2014 whitney-public-talk
2014 whitney-public-talk2014 whitney-public-talk
2014 whitney-public-talkc.titus.brown
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02Raman Kannan
 
Instructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docxInstructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docxmaoanderton
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Andrei Afanasiev
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf7006ASWATHIRR
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECTNusrat Gulbarga
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomicsmikaelhuss
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Projectaafaq ali
 

Ähnlich wie 2015 pycon-talk (20)

2014 whitney-public-talk
2014 whitney-public-talk2014 whitney-public-talk
2014 whitney-public-talk
 
2014 naples
2014 naples2014 naples
2014 naples
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
Instructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docxInstructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docx
 
Human genome project 1
Human genome project 1Human genome project 1
Human genome project 1
 
HGP.ppt
HGP.pptHGP.ppt
HGP.ppt
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Project
 

Mehr von c.titus.brown

2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-reviewc.titus.brown
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcastc.titus.brown
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbugc.titus.brown
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 

Mehr von c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 

Kürzlich hochgeladen

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Kürzlich hochgeladen (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 

2015 pycon-talk

  • 1. How to interpret your own genome. C. Titus Brown ctbrown@ucdavis.edu @ctitusbrown http://ivory.idyll.org/blog/ Second in my ongoing attempt to explain what I actually do to Terry Peppers.
  • 2. Some basic facts about DNA The primary DNA sequence consists of strings of A, C, G, and T. Most human cells contain approximately 6 billion of these. They are divided into 23 chromosome pairs. These chromosomes are the primary unit of heredity. http://classes.biology.ucsd.edu/bimm110.SP07/lectures_WEB/L08.05_Cytogenetics.htm
  • 3. How DNA is interpreted – “It’s complicated.” http://www.exploringnature.org/db/detail.php?dbID=106&detID=2454
  • 4. How inheritance & generation of variation works http://genetics.thetech.org/ask/ask435 + approximately 300- 600 mutations per generation
  • 5. If we knew a person’s genome sequence perfectly… We still wouldn’t know all that much! We could correlate variation between genomes with diseases. We could identify parentage and genetic inheritance. We could probably identify ethnic origin. We could find known “mistakes” or problems.
  • 6. But… why wouldn’t we know that much?? Isn’t the genome the person? Let’s ignore environmental factors, first of all…
  • 7. Imagine… …you’re locked in a room, with feral lawyers roaming around outside; You have a bunch of source code on a stack of CDs to understand; And you’ve been given a Windows 98 machine with Python installed. (see David Beazley, “Discovering Python”, PyCon 2014) This talk came partly from listening to his talk…
  • 8. This “locked room” problem is a pretty good analogy to genomics! “Here are 3 billion characters of DNA! Go figure out what it all means!” It’s like the previous locked room problem, and: The code is all written in Perl 8, for which neither a specification or software interpreter exists. But you have access to the Internet and a world-wide collection of other scientists, and (some of) their data and papers. Oh, and: the answers hold the keys to life and death.
  • 9. Genomes are still useful! How do we find sequence? Primary approach for human genomes is: spend a lot of money sequencing one, or a few; use that as reference. Initial cost: $2.7 bn (in 1991) Current human genome reference is from 13 anonymous volunteers in Buffalo, NY (Wikipedia ;) Older technology: identify points of variation, then target for further investigation. Current technology: sequence. (The rest of this talk. Next technology: longer reads. (Sequence more, better.)
  • 10. Working with short read sequencing - overview Sequence Map Call variants Interpret
  • 11. Working with short read sequencing - sequencing Need about 250 ng of DNA at 2 ng/ul. “Under $1,000 dollars” http://biome.biomedcentral.com/welcome-to-the-1000- genome/ …some up front investment required :) Sequence Map Call variants Interpret
  • 12. Working with short read sequencing - sequencing Sequence Map Call variants Interpret @D00360:18:H8VC6ADXX:1:1103:1434:46766/1 AACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCAACCCTCAACTATCACACA + @@@DDDDDFHHFHHIIIBHGIIDGIA;EDGD@CG@FDDEFFB@DCGHGGIG8CHGD Raw data looks something like this (x 2 bn)
  • 13. Mapping: locate sequences in referencehttp://en.wikipedia.org/wiki/File:Mapping_Reads.png Sequence Map Call variants Interpret => BAMFASTQ =>
  • 14.
  • 15. Variant detection after mapping http://www.kenkraaijeveld.nl/genomics/bioinformatics/ Sequence Map Call variants Interpret BAM => => VCF
  • 16.
  • 17. Working with short-read sequencing – annotate variants Is it a variant known to have an effect? Is it in a gene? Is it in a gene and does it have some “obvious” effect (e.g. breaking the gene)? Has it been associated with some effect? Sequence Map Call variants Interpret
  • 18. Pipeline, approaches, formats, technologies. Sequence Map Call variants Interpret Illumina BWA Samtools FreeBayes VEP SNPedia Gemini bcbio  See http://ivory.idyll.org/blog/2015-pycon-talk.html for details. ~1500 hours ~12 hours~100 hours
  • 19. An example data set Sequences from a “trio” (son, father, mother) of Ashkenazi Jews are available, together with medical records (see links in blog post). The Ashkenazim branched off from other Jews ~2500 years ago, flourished during Roman Empire, then “went through a 'severe bottleneck' as they dispersed, reducing a population of several million to just 400 families who left Northern Italy around the year 1000.” http://en.wikipedia.org/wiki/Ashkenazi_Jews#Genetics
  • 20. “Raw” human data: BAM file: 108 GB (contains sequences + quality scores) + human genome (~3 GB or so) + lots of databases of varying size. Full instructions at: http://ivory.idyll.org/blog/2015-pycon-talk.html
  • 21. Working with short-read sequencing – mapping. Software such as BWA takes in a reference genome and a set of reads and yields tab-delimited output: D00360:37:HA3HMADXX:1:2104:14000:62852 163 chr22 16050001 15 87S8M1I10M1D41M1S = 16050476 621 CCA…. 3((… This contains information about where each read maps, how well it maps, etc. Sequence Map Call variants Interpret
  • 22. Most parts of the genome are sampled many times (~50, here) HG002 data set Sequence Map Call variants Interpret
  • 24. Working with short-read sequencing – annotate variants HG002 data setVariants annotated with VEP using Gemini. Sequence Map Call variants Interpret
  • 25. Most differences are ~uninterpretable! Total variants: 5,562,545 Between genes: 3,032,670 Between parts of genes (exons): 2,014,962 Remaining: 514,913 (Only 2% of human genome makes genes; maybe ~5% of genome thought to be functional) HG002 data set
  • 26. OK, you’ve got your variants – now what?? HT to Slate Star Codex, http://slatestarcodex.com/2014/11/12/how-to-use-23andme-irresponsibly/
  • 27. Chasing down a disease- related variant: Canavan disease. http://www.snpedia.com/index.php/Rs12948217
  • 28. chr17:3397702 (hg19) in HG002 sample (son) The son and both parents are heterozygous (1/2) for this – they are carriers, but not afflicted with disease. ¼ of their children would have homozygous allele and probably be affected by Canavan’s Disease: “Children who inherit two copies of the gene appear normal at birth, but between three and nine months of age they begin to show symptoms ... These children cannot sit, crawl, or talk, and few live past age 10.” http://www.snpedia.com/index.php/Can ease
  • 29. Challenges in actually interpreting – “version hell”. Variant is actually a T. Snpedia says A is the problematic variant, but that’s on hg38. On hg19, which is what variants were called on, relevant gene is on reverse strand so T => A.
  • 30. Human migrations into Europe (~40kya – fall of Roman Empire) Veeramah and Novembre, doi:10.1101/cshperspect.a008516
  • 31. Veeramah and Novembre, doi:10.1101/cshperspect.a008516 Human genetic comparisons overlayed on map of Europe.
  • 32. Predicting new disease variants:Can we find associations between variants and diseases? “Genome Wide Association Study (GWAS)” Wellcome Trust CCT, 2007, doi:10.1038/nature05911
  • 33. …cautions of GWAS: Need to account for relatedness in samples; Large sample sizes needed; Complex statistics needed & “multiple testing” issues; Different identifier/database mixtures; Correlation is not causation; Large effects are rare – typically many small signals combined. The data science problem from hell!
  • 34. Where next? Short-term: next 2-5 years Medium-term: 10 years Long-term: 20 years+
  • 35. Short term Lots more data! “Millions to billions of human genomes” coming. Individual data – est 300,000 human genomes sequenced in 2014. Tumor and somatic data. Time course data (“narcissome”) - Mike Snyder Newer sequencing data types – e.g. longer reads. see: http://www.nature.com/news/the-rise-of-the-narciss-ome-1.10240
  • 36. Short-term software problems Increasingly many open source Python projects (bcbio, Gemini); Help with integration between tools (dependency hell, versioning hell); Optimization of specific approaches not so important. Lack of concordance => technical problem. General speed ~meh Flexible and robust libraries still maturing.
  • 37. Medium term We’ll be sequencing everything all the time (but still won’t really know what it means); => data integration and data mining. Large scale sequencing is rapidly being extended to agriculture, ecology, and veterinary medicine. We will soon be able to “edit” whatever genomes we want (check out CRISPR), but will not have a good idea of what to actually edit (c.f. Perl8 analogy, above). Read up on “gene drive” if you want the bejeezus scared out of you: http://news.sciencemag.org/biology/2015/03/chain-reaction-spreads-gene- through-insects
  • 38. Longer term No one knows. We’ve only had large scale sequencing & the human genome for ~15 years!! Free associate the following: cheap sequencing; quantified self; Internet of Things.
  • 39. How to get involved? A lot of the software is open source! (bwa, samtools, etc. etc.) …but: Warning: genomics is large, and deep, and largely invisible, and has its own culture. Sadly, your best bet is probably to come do a PhD with someone like me, for free. (just kidding! …)
  • 40. bcbio and Gemini Help with: Gemini: SQLite to PostgreSQL conversion; Gemini: “bigwig” parsing performance; bcbio: improving use & cleanliness of Cloud port bcbio: moving to Common Workflow Language (note, reference implementation in Python) See talk blog post at http://ivory.idyll.org/2015-pycon- talk.html for more info.
  • 41. How can you sequence your own genome? Most genetic testing services (23andme, etc.) don’t actually sequence your 6 billion bases of DNA; they instead use a more targeted approach and look at common variants or known disease variants. If it costs < $1000, they’re not actually sequencing you :) DNA extraction, etc, is fairly straightforward if you have access to a lab and the necessary expertise. Main suggestion: see http://www.personalgenomes.org/
  • 42. Thanks for coming! Please see links to data, instructions, and more reading at http://ivory.idyll.org/blog/2015-pycon-talk.html