Download How Are Complete Genomes Sequenced?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Essential gene wikipedia , lookup

Gene therapy wikipedia , lookup

Quantitative trait locus wikipedia , lookup

NUMT wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Copy-number variation wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene desert wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Oncogenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Transposable element wikipedia , lookup

Genomic imprinting wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Human Genome Project wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Human genome wikipedia , lookup

Genomic library wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

Metagenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genomics
20
BIOLOGICAL SCIENCE
FOURTH EDITION
SCOTT FREEMAN
Lectures by Stephanie Scher Pandolfi
© 2011 Pearson Education, Inc.
Introduction
• The complete DNA sequence of an organism is its genome. The
human genome sequence was published in February 2001 as part of
the Human Genome Project.
• Genomics is the scientific effort to sequence, interpret, and
compare whole genomes.
• Genomics provides a list of the genes present in an organism.
Functional genomics looks at when those genes are expressed and
how their products interact.
© 2011 Pearson Education, Inc.
Whole-Genome Sequencing
• Improved automation has increased the speed and reduced the cost
of DNA sequencing.
• The primary international repositories for DNA sequence data now
contain over 194 billion nucleotides.
• With about 3 billion nucleotides, humans have the largest haploid
genome sequenced to date.
• The size of the database increases by about 30 percent every year.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
How Are Complete Genomes Sequenced?
• Most genome sequencing projects use a whole-genome shotgun
sequencing approach.
• In this process, the genome is broken up into a set of overlapping
fragments that are sequenced, and these sequences are then put in
order.
© 2011 Pearson Education, Inc.
The Shotgun Sequencing Process
1. Sonication (use of high-frequency sound waves) breaks a
genome into pieces approximately 160 kilobases long.
2. Each piece is inserted into a plasmid called a bacterial artificial
chromosome (BAC). A BAC library is created by inserting
each BAC into a different Escherichia coli cell. Colonies of each
cell are allowed to grow, creating multiple copies of each BAC
library.
3. Each 160-kb DNA segment is broken into 1-kb segments.
© 2011 Pearson Education, Inc.
The Shotgun Sequencing Process
4. Each 1-kb segment is cloned into a plasmid. These plasmids are
then inserted into E. coli cells and replicated, producing shotgun
clones.
5. The fragments from each clone are then sequenced and analyzed
by computer programs.
6. The computer puts the sequences in order, thus reconstructing the
BACs.
7. The ends of the reconstructed BACs are similarly analyzed. The
goal is to arrange each 160-kb segment in its correct position
along the chromosome, based on regions of overlap.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
The Shotgun Sequencing Process
• In essence, the shotgun strategy consists of breaking a genome into
tiny fragments, sequencing the fragments, and then putting the
sequence data back into the correct order.
© 2011 Pearson Education, Inc.
The Role of Next-Generation Sequences Strategies
• Pyrosequencing is a cheaper and faster alternative to traditional
sequencing.
• It takes place on a single DNA fragment rather than multiple copies
of the same fragment.
© 2011 Pearson Education, Inc.
How Are Complete Genomes Sequenced?
• Bioinformatics is the effort to manage, analyze, and interpret
biological information, and is key to managing the vast quantity of
data generated by genome sequencing.
© 2011 Pearson Education, Inc.
Which Genomes Are Being Sequenced, and Why?
• The first genome of an organism to be sequenced was that of the
bacterium Haemophilus influenzae in 1995; it consists of about 1.8
million base pairs.
• The first eukaryotic genome to be sequenced was that of the yeast
Saccharomyces cerevisiae in 1996.
• To date, complete genomes have been sequenced from over 800
species.
• Most of the organisms that have been sequenced cause disease or
have other interesting biological properties.
© 2011 Pearson Education, Inc.
Which Sequences Are Genes?
The most basic task in annotating or interpreting a genome is to
identify which bases constitute genes.
•
Identifying genes is relatively straightforward in bacteria and
archaea but is much more difficult in eukaryotes, who have many
noncoding sequences in their genomes.
© 2011 Pearson Education, Inc.
Identifying Genes in Bacterial and Archaeal Genomes
• Computer programs are used to scan a genome sequence in both
directions in order to identify open reading frames (ORFs). ORFs
are possible genes—long stretches of sequence that lack a stop
codon but are flanked by a start codon and a stop codon.
• The computer programs also look for sequences typical of
promoters, operators, and other regulatory sites.
• Researchers can confirm that an ORF is actually a gene by
analyzing its product or by finding that it is homologous (similar
due to common ancestry) to a known gene.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
Identifying Genes in Eukaryotic Genomes
• In eukaryotic organisms, genes contain introns, and most of the
genome does not code for a product—thus, it is not possible to scan
for ORFs.
• The most effective strategy for identifying genes is to use reverse
transcriptase to produce a cDNA version of each mRNA, and
sequence a portion of the resulting molecule to produce an
expressed sequence tag, or EST. ESTs represent protein-coding
genes.
© 2011 Pearson Education, Inc.
Bacterial and Archaeal Genomes
• By sequencing the genomes of various strains of the same
prokaryotic species, researchers can now compare the genomes of
closely related organisms that have different ways of life.
© 2011 Pearson Education, Inc.
The Natural History of Prokaryotic Genomes
In bacteria, there is a general correlation between the size of the
genome and the metabolic capabilities of the organism.
•
The function of many bacterial genes is still unknown.
•
There is tremendous genetic diversity among bacteria and
archaea. About 15 percent of the genes in a prokaryotic genome
are unique to its own species.
•
Redundancy among genes is common. Some genes are found
multiple times within a prokaryotic genome.
© 2011 Pearson Education, Inc.
The Natural History of Prokaryotic Genomes
• Multiple chromosomes and plasmids are more common than
expected.
• In many bacterial and archaeal species, a significant portion of the
genome appears to have been acquired from other, often distantly
related, species.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
Lateral Gene Transfer
• The movement of DNA from one species to another species is
called lateral gene transfer.
• Recent evidence suggests that over 50 percent of archaean species
and 30–50% of bacterial species have at least one gene acquired by
lateral gene transfer.
© 2011 Pearson Education, Inc.
Evidence for Lateral Gene Transfer
• Two general criteria support the hypothesis that sequences in
bacterial or archaeal genomes originated in another species:
1. A gene is much more similar to genes in distantly related
species than it is to those in closely related species.
2. When the proportion of G-C base pairs to A-T base pairs in a
particular gene or series of genes is markedly different from
the base composition of the rest of the genome.
© 2011 Pearson Education, Inc.
How Does Lateral Gene Transfer Occur?
• Lateral gene transfer often results because genes are carried on
plasmids.
• Another way lateral gene transfer occurs is through transformation,
taking up DNA fragments from the environment.
• Thus, mutation and genetic recombination within species are not
the only sources of genetic variation in bacteria and archaea.
© 2011 Pearson Education, Inc.
Eukaryotic Genomes
Many eukaryotic genomes are dominated by repeated DNA
sequences that occur between genes or inside introns and do not
code for products used by the organism.
•
Sequencing eukaryotic genomes presents unique challenges.
– Eukaryotic genomes are much larger than the genomes of
bacteria and archaea.
– The presence of noncoding repetitive sequences.
© 2011 Pearson Education, Inc.
Parasitic and Repeated Sequences
• Protein-coding sequences constitute a very small percentage of the
human genome, and repetitive sequences make up more than 50
percent. In contrast, over 90 percent of the prokaryotic genome
consists of genes.
• Repeated sequences in the human genome are often the result of
transposable elements—segments of DNA that can move from
one location in a genome to another.
© 2011 Pearson Education, Inc.
Characteristics of Transposable Elements
• Transposable elements are examples of selfish genes—parasitic
DNA sequences that survive and reproduce but that do not increase
the fitness of the host genome.
• Transposable elements are classified as parasitic because they
decrease their host’s fitness:
– It takes time and resources to copy them along with the rest of
the genome.
– They can disrupt gene function when they insert in a new
location.
© 2011 Pearson Education, Inc.
Repeated Sequences
• Eukaryotic genomes have several thousand loci called short
tandem repeats (STRs). These are small sequences repeated down
the length of a chromosome. There are two types of STRs.
1. Microsatellites, or simple sequence repeats, are repeating
units of 1 to 5 bases.
2. Minisatellites, or variable number terminal repeats
(VNTRs), are repeating units of 6 to 500 bases.
• Repeated sequences are hypervariable and vary among individuals
much more than any other type of sequence.
© 2011 Pearson Education, Inc.
Repeated Sequences and DNA Fingerprinting
• DNA fingerprinting refers to any technique for identifying
individuals on the basis of unique features of their genomes.
• Because microsatellite and minisatellite loci vary so much among
individuals, they are now the markers of choice for DNA
fingerprinting.
© 2011 Pearson Education, Inc.
DNA Fingerprinting Process
• A sample of DNA is acquired from the individual.
• PCR is performed using primers that flank a region containing an
STR.
• The region is cloned.
• The region can be analyzed to determine the number of repeats
present.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
Gene Families
• In eukaryotes, the major source of new genes is duplication of
existing genes.
• Within a species, genes that are extremely similar to each other in
structure and function are considered to be part of the same gene
family.
• Genes that make up gene families are hypothesized to have arisen
from a common ancestral sequence through gene duplication.
© 2011 Pearson Education, Inc.
How Do Gene Families Arise?
• When gene duplication occurs, an extra copy of a gene is added to
the genome.
• The most common type of gene duplication results from unequal
crossing over during meiosis.
• The redundancy of duplicated genes may allow one copy to mutate
to create a new gene with different function or regulation, possibly
leading to the evolution of novel traits.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
New Genes—New Functions?
• Gene duplication is important because the original gene is still
functional and produces a normal product.
• The duplicated gene may:
1. Retain its original function and provide additional quantities
of the same product.
2. Undergo mutation resulting in a beneficial altered protein,
thus creating an important new gene.
3. Be a nonfunctional pseudogene, a remnant of a functional
copy of the gene that does not produce a working product.
© 2011 Pearson Education, Inc.
Why Do Humans Have So Few Genes?
• A surprising observation about eukaryotic genomes is that
organisms with complex morphology and behavior do not appear to
have large numbers of genes.
• Before the human genome was sequenced, scientists expected that
humans would have at least 100,000 genes. However, the actual
sequence revealed that we have only about 20,000 genes.
• The alternative-splicing hypothesis proposes that certain
multicellular eukaryotes do not need large numbers of genes
because alternative splicing creates different proteins from the same
gene.
© 2011 Pearson Education, Inc.
Similarities between Human and Chimp Genomes
• At the level of base sequence, the human and chimpanzee genomes
are 98.8 percent identical.
• This raises the question of how humans and chimps can be so
similar genetically but so different in morphology and behavior.
• One hypothesis proposes that even though many structural genes
(those that code for products) in humans and chimps are identical,
regulatory genes (those that code for regulatory transcription
factors) of the two species might have important differences.
© 2011 Pearson Education, Inc.
Functional Genomics and Proteomics
• Whole-genome data can be used to answer fundamental questions
about how organisms work.
• Large-scale analyses of gene expression are called functional
genomics.
• One of the basic tools of functional genomics is a DNA
microarray. Microarrays, used to study gene expression, consist
of a large number of single-stranded DNAs that are permanently
affixed to a glass slide.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
How Are DNA Microarrays Used?
• mRNAs produced in two contrasting types of cells are isolated, and
then cDNAs produced from these mRNAs are used to probe the
microarray.
• Researchers can thus identify differences in which genes are
expressed in the two cell types.
• A microarray allows researchers to study the expression of
thousands of genes at a time, and to identify which sets of genes are
expressed together under specific sets of conditions.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
© 2011 Pearson Education, Inc.
What Is Proteomics?
• A transcriptome is the complete set of genes that are transcribed in
a particular cell.
– A proteome is the complete set of proteins that are produced.
• Proteomics is the large-scale study of protein function.
• Instead of studying individual proteins or how two proteins might
interact, proteomics is based on studying all of the proteins present
at once.
© 2011 Pearson Education, Inc.
Applied Genomics: Understanding Cancer
Researchers are using tools created by advances in genomics to
deepen our understanding of cancer.
– Microarrays allow researchers to compare gene expression in
normal versus cancerous cells.
– The Human Genome Project has revealed common sets of
genes that are mutated in cancerous cells.
– The complete genome sequences of cancerous and
noncancerous cells from the same person identified over 600
mutations in the cancerous cells.
© 2011 Pearson Education, Inc.