Sequencing genes and genomes in biology. The most important technique available to the molecular biologist is DNA sequencing, by which the precise order of nucleotides in a piece of DNA can be determined
2. • The most important technique available to the molecular biologist is
DNA sequencing, by which the precise order of nucleotides in a piece of
DNA can be determined.
• The techniques in use today can be divided into two categories:
1. Chain-termination method
2. Next-generation sequencing
2
4. 10.1.1. Chain termination DNA sequencing in outline
Chain termination DNA sequencing is based on the principle that single-
stranded DNA molecules that differ in length by just a single nucleotide
can be separated from one another by polyacrylamide gel electrophoresis.
. short oligonucleotide + template = primer
. deoxyribonucleotide triphosphates (dNTPs—dATP, dCTP, dGTP, and
dTTP)
. dideoxynucleotides (ddNTPs—ddATP, ddCTP, ddGTP, and ddTTP)
4
6. 10.1.2. Not all DNA polymerases can be used for sequencing
• Many DNA polymerases have a mixed enzymatic activity, being able to
degrade as well as synthesize DNA.
• Degradation can occur in either the 5′→3′ or 3′→5′ direction.
• The 3′→5′ activity could have the same effect, but more importantly will
remove a dideoxynucleotide that has just been added at the 3′ end,
preventing chain termination from occurring.
• In the original method Klenow polymerase = (5′→3′ ) ,
processivity
• Taq polymerase processivity , exonuclease
6
7. 10.1.3. Chain-termination sequencing with
Taq polymerase
• thermal cycle sequencing
• similar to PCR
• four dideoxynucleotides chain termination
10.1.4. Limitations of chain-termination sequencing
• It is necessary to sequence each region of a genome multiple times, in order to identify errors present in individual sequence reads
• One of the goals of personalized medicine is to use individual genome sequences to make accurate diagnoses of a person’s risk of
developing a disease, and to use that person’s genetic characteristics to plan effective therapies and treatment regimes.
From Sanger sequencing to genome databases and beyond
• personalized medicine has huge potential with both diagnoses and treatment options being driven by different factors associated with
an individual
• Ultimately, it is the advancements in the abovementioned NGS technologies that enable the construction of large open access
databases. The availability of these databases will allow patients to gain access to more sophisticated tests that provide important
genetic information, allowing for more targeted medical care.
7
8. . A large library made up of thousands or millions of DNA fragments is
sequenced in a single experiment
10.2. Next-generation sequencing
The difference between chain-
termination and next-generation
sequencing
8
9. 10.2.1 Preparation of a next-generation
sequencing library
• Breakage of DNA
• Immobilization
• Amplification
DNA fragmentation
Sonication random positions
Immobilization of the library
Glass slide / small metallic beads
Amplification of the library
The final step in library preparation
9
10. Immobilization of the DNA fragments in a
sequencing library by base pairing to metallic
beads. (a) Each DNA fragment is attached to a
single bead via a streptavidin–biotin linkage.
(b) Individual beads, with their attached DNA
fragments, are placed within water droplets in
an oil–water emulsion.
Immobilization of the DNA fragments in a sequencing
library by base pairing to oligonucleotides on a glass side
10
11. 10.2.2. Next-generation sequencing methods
There is no electrophoresis or any other fragment separation step.
1. Reversible terminator sequencing
Modified nucleotides / The termination step is reversible / Maximum length of 300 bp / Illumina
sequencing
2. Pyrosequencing
Does not require electrophoresis / More rapid / Detection of pulses of chemiluminescence /
Pyrophosphate / Sulfurylase
In the reaction mixture so that if a deoxynucleotide is not incorporated into the polynucleotide
then it is rapidly degraded before the next one is added = apyrase
Sanger and Next Generation Sequencing Approaches to Evaluate HIV-1 Virus in Blood Compartments
11
12. Pyrosequencing
Illumina sequencing. The fragments are attached to
primers immobilized on a solid surface, performing
a bridge-amplification and generating clusters of
DNA with identical molecules.
12
13. 10.2.3. Third-generation sequencing
. Real-time sequencing also enables read lengths to be longer
. Best methods = single molecule real-time sequencing (SMRT)
. Read lengths of up to 20 000 bp
Real-time DNA sequencing with each
nucleotide addition detected with a zero-mode
waveguide
13
14. 10.2.4. Directing next-generation sequencing at
specific sets of genes
• All parts of the genome are sequenced at the same time
• Target enrichment method
Target enrichment. (a) Baits are used to capture DNA
fragments representing genes of interest. (b) Only the
captured DNA fragments are sequenced
14
15. 10.3 How to sequence a genome
• Bacteriophage ϕX174 , SV40 virus, pBR322.
• The first chromosome sequence, for chromosome III of the yeast
Saccharomyces cerevisiae, was published in 1992, and the entire yeast
genome was completed in 1996
• The main challenge lies with sequence assembly
• The simplest way of doing this is the shotgun approach
15
16. 10.3.1 Shotgun sequencing of prokaryotic genomes
Shotgun sequencing of the Haemophilus influenzae genome
• Sonicate / cloned in plasmid and M13 vectors / two chain-termination DNA sequences
• After cloning, 28,643 chain termination sequencing experiments were carried out with 19,687 of the
clones.
• A few of these sequences—4339 in all—were rejected because they were less than 400 bp in length.
• The remaining 24 304 sequences were entered into a computer, which then spent 30 hours searching
for overlaps among the sequences.
• The result was 140 contiguous sequences or contigs, each made up of a series of overlapping
sequence reads.
• It might have been possible to continue sequencing more of the sonicated fragments in order
eventually to close the gaps between the individual contigs.
• The most successful involving hybridization analysis of a clone library prepared in a λ vector
16
17. A schematic of the key steps in the H.
influenza genome sequencing project
Using oligonucleotide
hybridization to close gaps in
the H. influenzae genome
sequence. Oligonucleotides 2
and 5 both hybridize to the
same λ clone, indicating that
contigs I and III are adjacent.
The gap between them can be
closed by sequencing the
appropriate part of the λ clone.
17
18. 10.3.2. Sequencing of eukaryotic genomes
One problem with the shotgun approach. An incorrect
overlap is made between two sequence reads that both
terminate within a repeated element. The result is that
a segment of the DNA molecule is absent from the
DNA sequence
18
19. The hierarchical shotgun approach
• Pre-sequencing phase
• These fragments then cloned into a high-capacity vector such as a BAC
• Massive task
• Chromosome walking
• The weakness of chromosome walking is that it begins at a fixed starting point and
builds up the clone contig step by step, and hence slowly, from that fixed point.
• The more rapid techniques for clone contig assembly do not use a fixed starting point
and instead aim to identify pairs of overlapping clones.
• The various techniques that can be used to identify overlaps are collectively known
as clone fingerprinting. Clone fingerprinting is based on the identification of
sequence features that are shared by a pair of clones.
19
20. Hierarchical shotgun sequencing can avoid problems
with repeat sequences. Sequence assembly of the reads
from clone 2 could result in the segment between the two
repeats being lost. However, the sequence of clone 3
enables the mistake to be recognized and corrected
Building up a series of overlapping clones using a
clone fingerprinting technique
20
21. Shotgun sequencing of eukaryotic genomes
• The most successful strategy is to use two or more sequencing libraries,
with one of the libraries containing fragments that are longer than the
longest repeat sequences in the genome being studied.
• Drosophila melanogaster
The strategy used to ensure that repetitive DNA
regions were assembled correctly when the
Drosophila melanogaster genome was sequenced.
Two identical 8-kb repeats are adjacent in the
genome. The end-sequences of the 10-kb fragment
that spans one of these repeats can be checked
against the master sequence to ensure that the
segment between the two repeats has not been lost. 21
22. What do we mean by ‘genome sequence’?
• It is important to recognize that a ‘complete’ genome sequence is currently an
impossibility for a eukaryotic genome.
• The standard being that at least 95% of the euchromatin should be sequenced and all
except the most refractory gaps filled.
• N50 size = the total length of all the contigs added together
22
24. Channels used by the Oxford Nanopore
platform to sequence DNA.
Polymerase fixed to the bottom of an individual
well. In the Pacific Bioscience platform, the
DNA moves generate signals because of the
incorporation of phosphate-labeled nucleotides.
24
25. In the Ion Torrent platform, the chip is the
sequencer. Each well of the chip acts as a pH meter
that is able to detect changes in the concentration
of H+ produced in DNA polymerization.
25
26. DNA Sequencing Sensors
Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass
sequencing sensors, or new generation sequencing (NGS) meant a quantitative leap both in the volume of genetic material that
was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies,
already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each
run. All of this would be impossible without sensors and detection systems becoming smaller and more precise.
Massively parallel sequencing techniques for forensics
DNA sequencing, starting with Sanger's chain termination method in 1977 and evolving into the next generation sequencing
(NGS) techniques of today that employ massively parallel sequencing (MPS), has become essential in application areas such as
biotechnology, virology, and medical diagnostics. these techniques have also gained attention in the forensic field. Relevance to
the forensic analysis is that besides generation of standard STR-profiles, DNA repeats can also be sequenced to look for
polymorphisms. Furthermore, additional SNPs can be sequenced to acquire information on ancestry, paternity or phenotype. The
current MPS systems are also very helpful in cases where only a limited amount of DNA or highly degraded DNA has been
secured from a crime scene. If enough autosomal DNA is not present, mitochondrial DNA can be sequenced for maternal lineage
analysis.
26
27. Comparison of the different sequencing platforms. The data
shown refer to the most favorable conditions for each platform
27
28. The Third Revolution in Sequencing Technology
Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered
for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome
sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads.
Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality.
Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing
without the need for assembly.
Overview of Next Generation Sequencing Technologies
Sequencing platforms that are smaller, require less power, less reagents and maintenance will be utilized in medical, agricultural,
ecological and other settings. At the front end, advances in robotics, liquid handling, sample processing (nucleic acid
preparation) will contribute to these advancements. Equally important will be advances in faster and more accurate bioinformatic
data analysis as well as data transfer and storage.
Genetic evolution analysis of 2019 novel coronavirus and coronavirus from other species
However, so far, there are still controversies about the source of the virus and its intermediate host. Here, we found the novel
coronavirus was closely related to coronaviruses derived from five wild animals. However, genome and ORF1a homology show
that the virus is not the same coronavirus as the coronavirus derived from these five animals, whereas the virus has the highest
homology with Bat coronavirus isolate RaTG13.
28
29. Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of
emergence as a result of a recent recombination event
Our analysis suggests that the 2019-nCoV although closely related to BatCoV RaTG13 sequence throughout the genome
(sequence similarity 96.3%), shows discordant clustering with the Bat_SARS-like coronavirus sequences. The levels of genetic
similarity between the 2019-nCoV and RaTG13 suggest that the latter does not provide the exact variant that caused the outbreak
in humans, but the hypothesis that 2019-nCoV has originated from bats is very likely.
Characterization and Clinical Significance of Natural Variability in Hepatitis B Virus Reverse
Transcriptase in Treatment-Naive Chinese Patients by Sanger Sequencing and Next-Generation
Sequencing
HBV RT sequences were analyzed in 427 patients by Sanger sequencing and in 66 patients by next-generation sequencing.
Primary or secondary NA resistance (NAr) mutations were not found, except A181T in RT (rtA181T) by Sanger sequencing, but
they were detected by next-generation sequencing. Mutations were found in 56 RT amino acid (aa) sites by Sanger sequencing, 36
of which had mutations that could lead to changes in B or T cell epitopes in the RT or S protein. The present study demonstrates
that next-generation sequencing (NGS) was more suitable than Sanger sequencing to monitor NAr mutations at a low rate in the
treatment-naive patients, and that mutations in the RT region might be involved in the progression to ALD.
Simple protocol for population (Sanger) sequencing for Zika virus
genomic regions
The present study provided a simple and low-cost Sanger protocol to sequence relevant genes of the ZIKVgenome. The identity of
Sanger generated sequences with published consensus NGS support the use of Sanger method for ZIKV population studies. Primer
sets were designed in order to conduct a nested RT-PCR and a Sanger sequencing protocols.
29
30. Next-Generation Sequencing Technologies and their Application to the Study and Control of
Bacterial Infections
It is a challenge that the comparability of the sequence data generated on different platforms with different error profiles using
different library preparation methods has still not been comprehensively assessed and validated. On the positive side, the cost
of sequencing, data transfer and analysis will continue to decrease and the DNA purification including library preparation and
the actual sequencing will become faster and more efficient. The Achilles heel of the current long read technologies, the high
error rates, will continue to improve until they become as precise as the short-read technologies at which time the latter will
become obsolete.
30
31. • 1. DNA Sequencing Sensors: An Overview
Jose Antonio Garrido-Cardenas 1, Federico Garcia-Maroto 2, Jose Antonio Alvarez-Bermejo 3, Francisco Manzano-Agugliaro 4
PMID: 28335417 PMCID: PMC5375874 DOI: 10.3390/s17030588
• 2. Massively parallel sequencing techniques for forensics: A review
Brigitte Bruijns 1 2, Roald Tiggelaar 1 3, Han Gardeniers 1 PMID: 30101986 PMCID: PMC6282972 DOI: 10.1002/elps.201800082
• 3. From Sanger sequencing to genome databases and beyond
Jenny Straiton, Tristan Free, Abigail Sawyer, Joseph Martin PMID: 30744413 DOI: 10.2144/btn-2019-0011
• 4. Sanger and Next Generation Sequencing Approaches to Evaluate HIV-1 Virus in Blood Compartments
Andrea Arias 1, Pablo López 2, Raphael Sánchez 3, Yasuhiro Yamamura 4, Vanessa Rivera-Amill 5 PMID: 30096879 PMCID: PMC6122037 DOI:
10.3390/ijerph15081697
• 5. Detection of Rare Mutations in EGFR-ARMS-PCR-Negative Lung Adenocarcinoma by Sanger Sequencing
Chaoyue Liang # 1, Zhuolin Wu # 2, Xiaohong Gan 3, Yuanbin Liu 4 5, You You 4 5, Chenxian Liu 1, Chengzhi Zhou 4 5, Ying Liang 4, Haiyun Mo 6, Allen M
Chen 3 7, Jiexia Zhang 4 8 PMID: 29214771 PMCID: PMC5725350 DOI: 10.3349/ymj.2018.59.1.13
• 6. Simple protocol for population (Sanger) sequencing for Zika virus genomic regions
Gabriela Bastos Cabral 1, João Leandro de Paula Ferreira 1, Renato Pereira de Souza 2, Mariana Sequetin Cunha 2, Adriana Luchs 3, Cristina Adelaide
Figueiredo 4, Luís Fernando de Macedo Brígido 1 PMID: 29185594 PMCID: PMC5719533 DOI: 10.1590/0074-02760170248
• 7. Characterization and Clinical Significance of Natural Variability in Hepatitis B Virus Reverse Transcriptase in Treatment-Naive Chinese Patients by Sanger
Sequencing and Next-Generation Sequencing
Ya Fu 1 2, Yongbin Zeng 1 2, Tianbin Chen 1 2, Huijuan Chen 1 2, Ni Lin 1 2, Jinpiao Lin 1 2, Xiaofeng Liu 1 2, Er Huang 1 2, Songhang Wu 1 2, Shu Wu 1 2,
Siyi Xu 1 2, Long Wang 1 2, Qishui Ou 3 2 PMID: 31189581 PMCID: PMC6663897 DOI: 10.1128/JCM.00119-19
• 8. Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event
D Paraskevis 1, E G Kostaki 2, G Magiorkinis 2, G Panayiotakopoulos 3, G Sourvinos 4, S Tsiodras 5 PMID: 32004758 PMCID: PMC7106301 DOI:
10.1016/j.meegid.2020.104212
• 9. Genetic evolution analysis of 2019 novel coronavirus and coronavirus from other species
Chun Li 1, Yanling Yang 2, Linzhu Ren 3 PMID: 32169673 PMCID: PMC7270525 DOI: 10.1016/j.meegid.2020.104285
31
32. • 10. The Third Revolution in Sequencing Technology
Erwin L van Dijk 1, Yan Jaszczyszyn 2, Delphine Naquin 2, Claude Thermes 2 PMID: 29941292 DOI: 10.1016/j.tig.2018.05.008
• 11. Overview of Next Generation Sequencing Technologies
Barton E Slatko 1, Andrew F Gardner 1, Frederick M Ausubel 2 PMID: 29851291 PMCID: PMC6020069 DOI: 10.1002/cpmb.59 2019
• 12. . Next-generation sequencing technologies and their application to the study and control of bacterial infections
J Besser 1, H A Carleton 1, P Gerner-Smidt 2, R L Lindsey 1, E Trees 1 PMID: 29074157 PMCID: PMC5857210 DOI: 10.1016/j.cmi.2017.10.013
• 13. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
R D Fleischmann 1, M D Adams, O White, R A Clayton, E F Kirkness, A R Kerlavage, C J Bult, J F Tomb, B A Dougherty, J M Merrick, et al.
PMID: 7542800 DOI: 10.1126/science.7542800
• 14. Visual mapping by high resolution FISH M Heiskanen 1, L Peltonen, A Palotif PMID: 8909124 DOI: 10.1016/0168-9525(96)30083-8
• 15. Genome sequencing in microfabricated high-density picolitre reactors
Marcel Margulies 1, Michael Egholm, William E Altman, Said Attiya, Joel S Bader, Lisa A Bemben, Jan Berka, Michael S Braverman, Yi-Ju Chen,
Zhoutao Chen, Scott B Dewell, Lei Du, Joseph M Fierro, PMID: 16056220 PMCID: PMC1464427 DOI: 10.1038/nature03959
Nature. 2006 May 4;441(7089):120. Ho, Chun He [corrected to Ho, Chun Heen]
• 16. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides
J M Prober 1, G L Trainor, R J Dam, F W Hobbs, C W Robertson, R J Zagursky, A J Cocuzza, M A Jensen, K Baumeister PMID: 2443975 DOI:
10.1126/science.2443975
33. • 17. A sequencing method based on real-time pyrophosphate
M Ronaghi 1, M Uhlén, P Nyrén PMID: 9705713 DOI: 10.1126/science.281.5375.363
• 18. CircumVent thermal cycle sequencing and alternative manual and automated DNA sequencing protocols using the highly thermostable VentR
(exo-) DNA polymerase
L E Sears 1, L S Moran, C Kissinger, T Creasey, H Perry-O'Keefe, M Roskey, E Sutherland, B E Slatko PMID: 1476733
• 19. A method for constructing radiation hybrid maps of whole genomes
M A Walter 1, D J Spillett, P Thomas, J Weissenbach, P N Goodfellow PMID: 8075634 DOI: 10.1038/ng0594-22