Download Genome - people.iup.edu

Document related concepts

Gene therapy wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

NUMT wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Point mutation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene desert wikipedia , lookup

Oncogenomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Public health genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human Genome Project wikipedia , lookup

Non-coding DNA wikipedia , lookup

Metagenomics wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
I. Investigating Genomes
• 6.1 Introduction to Genomics
• 6.2 Sequencing Genomes
• 6.3 Bioinformatics and Annotating Genomes
© 2015 Pearson Education, Inc.
6.1 Introduction to Genomics
• Genome
• Entire complement of genetic information
• Includes genes, regulatory sequences, and noncoding
DNA
• Genomics
• Discipline of mapping, sequencing, analyzing, and
comparing genomes
© 2015 Pearson Education, Inc.
6.1 Introduction to Genomics
• Several thousand prokaryotic genomes now
sequenced and available
• RNA virus MS2
• First genome; sequenced in 1976
• 3,569 bp
• Haemophilus influenzae
• First cellular genome sequenced in 1995
• 1,830,137 bp
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes-Methods
• First generation sequencing:
• Maxam Gilbert chemical degradation
• Controlled breakdown of DNA chains
• No enzymes or cloning involved
• Hazardous materials
• Could not be widely applied
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes-methods
• First generation sequencing:
• Sanger dideoxy method
• Invented by Nobel Prize winner Fred Sanger
• Chain termination method
• Relies on extension of primer by DNA polymerase
• Utilized cloned DNA
© 2015 Pearson Education, Inc.
Missing OH
Normal deoxyribonucleotide Dideoxyribonucleotide
DNA chain
Direction of
chain growth
No free 3′-OH; replication
will stop at this point
© 2015 Pearson Education, Inc.
Figure 6.1
DNA strand to be sequenced
3′C G A C T C G A T T C 5′
5′ G C T G 3′
Add DNA polymerase, mixture
Radioactive
of all four deoxyribonucleotide
DNA primer
triphosphates; separate into
four reaction tubes.
Only one dideoxyribonucleotide triphosphate
(ddGTP, ddATP, ddTTP, or ddCTP) is added to each
tube and the reaction is allowed to proceed.
Reaction products
ddGTP
ddATP
ddCTP
ddTTP
-A (1)
A G C -T (4) A G -C (3)
A -G (2)
A G C T A A -G (7)A G C T -A (5)
A G C T A -A (6)
G
© 2015 Pearson Education, Inc.
A
T
C
Figure 6.2a
© 2015 Pearson Education, Inc.
Figure 6.2c
6.2 Sequencing Genomes-methods
• Second-generation DNA sequencing
• Generates data 100x faster than Sanger method
• Massively parallel methods
• Large number of samples sequenced side by side
• Uses increased computer power, robotics and
miniaturization
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes-methods
• Third-generation DNA sequencing
• Continued technical improvement and miniaturization
• Sequencing of single molecules of DNA
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes-methods
• Fourth-generation DNA sequencing
• Optical detection no longer used
• Detection of tiny ionic changes
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes-methods
• Fourth-generation DNA sequencing
• Oxford Nanopore Technologies system (Figure 6.4b)
• Passes DNA through nanoscale biological pores
• Detector measures change in electric current
• Extremely fast
• Measures long chains of DNA
© 2015 Pearson Education, Inc.
Double-stranded DNA
Protein
nanopore
As DNA
passes
through the
nanopore,
base-specific
electrical
charges are
emitted.
Singlestranded
Electrical signal
DNA
to monitor
Nanopore sequencing
© 2015 Pearson Education, Inc.
Figure 6.4b
6.2 Sequencing Genomes-Strategies
• Two possible approaches
• Map and order fragments first-then sequence (the
“Gold Standard”)
• Shotgun sequencing: sequence first then map and
order (“Fast and Dirty”)
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes
• Virtually all genomic sequencing projects use
shotgun sequencing strategy
• Entire genome is cloned or fragmented randomly and
resultant pieces are sequenced
• Much of the sequencing is redundant
• Generally 7- to 10-fold coverage
• Computer algorithms are used to look for replicate
sequences and organize them
© 2015 Pearson Education, Inc.
6.2 Sequencing Genomes
• Genome assembly consists of connecting the
DNA fragments in the correct order
• Occasionally assembly is not possible quickly
• Closure can be pursued using PCR to target areas
of the genome
• Closed vs. draft genome
• Closed genome relies on direct human activity
• More expensive
• More information
© 2015 Pearson Education, Inc.
6.3 Bioinformatics and Annotating Genomes
• Annotation: converting raw sequence data into a
list of genes present in the genome
• Annotation is "bottleneck" in genomics
• Bioinformatics
• Science that applies powerful computational tools to
DNA and protein sequences
• For the purpose of analyzing, storing, and accessing the
sequences for comparative purposes
© 2015 Pearson Education, Inc.
6.3 Bioinformatics and Annotating Genomes
• Functional ORF: an open reading frame that
encodes a protein
• Computer algorithms used to search for ORFs
(Figure 6.6)
• Look for start/stop codons and Shine–Dalgarno
sequences, codon bias
• ORFs can be compared to ORFs in other
genomes
© 2015 Pearson Education, Inc.
Structure of an ORF
Ribosomal Start
binding site codon
Stop
codon
Coding sequence
4. Computer
finds possible
RBS.
5. Computer
calculates
codon bias
in ORF.
© 2015 Pearson Education, Inc.
1. Computer
finds possible
start codons.
2. Computer
finds possible
stop codons.
3. Computer
counts codons
between start
and stop.
6. Computer
decides if ORF
is likely to be
genuine.
7. List of
probable
ORFs
Figure 6.6
6.3 Bioinformatics and Annotating Genomes
• Number of genes with role that can be clearly
identified in a given genome is 70% or less of total
ORFs detected
• Hypothetical proteins: uncharacterized ORFs;
proteins that likely exist but whose function is
currently unknown
• Likely encode nonessential genes
• In E. coli, many predicted to encode regulatory or
backup proteins
© 2015 Pearson Education, Inc.
6.3 Bioinformatics and Annotating Genomes
• Noncoding RNA: RNA that does not code for
protein
• Lack start codons and have multiple stop codons
• Examples
• Transfer RNA (tRNA)
• Ribosomal RNA (rRNA)
• Noncoding regulatory RNA molecules
© 2015 Pearson Education, Inc.
II. Microbial Genomes
• 6.4 Genome Size and Content
• 6.5 Genomes of Organelles
• 6.6 Eukaryotic Microbial Genomes
© 2015 Pearson Education, Inc.
6.4 Genome Size and Content
• Correlation between genome size and ORFs
(Figure 6.7)
• On average, a prokaryotic gene is 1,000 bp long
• ~1,000 genes per megabase
(1 Mbp = 1,000,000 bp)
• As genome size increases, gene content proportionally
increases
© 2015 Pearson Education, Inc.
© 2015 Pearson Education, Inc.
Figure 6.7
6.4 Genome Size and Content
• Smallest cellular genomes belong to “parasitic” or
endosymbiotic prokaryotes
• Obligate parasites may be as low as 490 kbp
(Nanoarchaeum equitans) or 525 kbp (Mycoplasma
genitalium)
• Endosymbionts can be smaller (e.g., 160-kbp genome
of Carsonella ruddii-an endosymbiont of plant lice)
• Estimates suggest the minimum number of genes for a
viable cell is 250–300 genes
© 2015 Pearson Education, Inc.
6.4 Genome Size and Content
• Largest prokaryotic genomes comparable to those
of some eukaryotes
• Sorangium cellulosum (Bacteria)
• Soil microbe
• Largest prokaryotic genome to date at >12.3 Mbp
• Largest archaeal genomes tend to be smaller
(~5 Mbp)
© 2015 Pearson Education, Inc.
6.4 Genome Size and Content
• Many genes can be identified by sequence
similarity to genes found in other organisms
(comparative analysis)
• Comparative analyses allow for predictions of
metabolic pathways and transport systems
• Example:Thermotoga maritima (Figure 6.9)
© 2015 Pearson Education, Inc.
© 2015 Pearson Education, Inc.
Figure 6.9
6.5 Genomes of Organelles
• Mitochondria and chloroplasts contain a “small”
genome
• Cyanobacteria -> chloroplast (cp or ct)
• Ct is one member of plastid family
• Rickettsiae (?????) -> mitochondrion (mt)
• Small, obligate intracellular parasites
• Human diseases-typhus, RMSF
© 2015 Pearson Education, Inc.
6.5 Genomes of Organelles
• Known chloroplast genomes
• Circular (usually) DNA molecules (Figure 6.11)
• Typically 120–170 kbp
• Usually contain two inverted repeats of 6–76 kbp (rRNA,
tRNA)
• About 100 genes, many for photosynthesis or protein
synthesis, transport
• Introns common; primarily of self-splicing type
© 2015 Pearson Education, Inc.
6.5 Genomes of Organelles
• Known mitochondrial genomes
• Diverse structures; some linear and some circular
• Primarily encode proteins for oxidative phosphorylation
• Use simplified genetic codes rather than "universal"
code
• Some contain small plasmids
• Human mitochondrial genome contains 37 genes in
16.5 kbp (Figure 6.12)
© 2015 Pearson Education, Inc.
6.5 Genomes of Organelles
• Many genes in the nucleus encode proteins
required for organelle function
• Organelle, nucleus must cooperate
• Examples: translational machinery, energy generation
• RUBISCO
© 2015 Pearson Education, Inc.
6.5 Genomes of Organelles
• Genome reduction in organelles
• Cyanobacteria: 1500 genes, chloroplasts-100
• Rickettsiae: 8-900 genes, mitochondria-37
• Many insects and other invertebrates contain symbiotic
bacteria (Figure 6.15)
• Symbiont no longer capable of existing independently
• Symbiont provides nutrients to host
• Host cannot survive without symbiont
© 2015 Pearson Education, Inc.
6.6 Eukaryotic Microbial Genomes
• Largest eukaryotic genome belongs to
Trichomonas vaginalis
• Parasite
• pathogen
• ~60,000 genes estimated
• Count likely to change
© 2015 Pearson Education, Inc.
6.6 Eukaryotic Microbial Genomes
• Smallest eukaryotic cellular genome belongs to
Encephalitozoon cuniculi
• Intracellular fungal pathogen
• Haploid genome contains 11 chromosomes
• Genome size 2.9 Mbp; ~2,000 genes
• Smallest eukaryotic genome belongs to nucleomorph
• Degenerate remains of a eukaryotic endosymbiont
• Ranges in size from 0.45 to 0.85 Mbp
© 2015 Pearson Education, Inc.
6.6 Eukaryotic Microbial Genomes
• The haploid yeast genome is more representative
• Contains 16 chromosomes, ranging in size from 220
kbp to 2,352 kbp
• Entire genome is ~13,400 kbp; encodes ~6,000 ORFs;
~4,000 encode proteins with known function
• About 900 ORFs are essential
• Contains a large amount of repetitive DNA
© 2015 Pearson Education, Inc.
III. Functional Genomics
• 6.7 Microarrays and the Transcriptome
• 6.8 Proteomics and the Interactome
• 6.9 Metabolomics and Systems Biology
• 6.10 Metagenomics
© 2015 Pearson Education, Inc.
6.7 Microarrays and the Transcriptome
• Transcriptome
• The entire complement of RNA produced under a given
set of conditions
• Hybridization techniques can be used in
conjunction with genomic sequence data to
measure gene expression
© 2015 Pearson Education, Inc.
6.7 Microarrays and the Transcriptome
• Hybridization allows identification of
complementary nucleic acid sequences
© 2015 Pearson Education, Inc.
6.7 Microarrays and the Transcriptome
Small solid-state supports to which genes or portions of
genes are fixed and arrayed spatially in a known pattern
Complementary matches light up specific spots
Allows identification of mRNAs that come from specific
genes
© 2015 Pearson Education, Inc.
Gene “X”
Hybridizes only
To its own
mRNA
Gene X
Gene Y
Gene Z
Synthesize short ss oligonucleotides complementary
to genes X, Y, and Z.
Affix DNA to chip at
known locations.
Gene X
DNA
chip
Growth
condition
1
© 2015 Pearson Education, Inc.
Gene Y
Gene Z
Growth Probe chip with
condition labeled mRNA
and scan chip.
2
Gene X expressed
Gene X not expressed
Genes Y and Z not expressed Genes Y and Z expressed
Figure 6.17
6.7 Microarrays and the Transcriptome
• DNA segments on arrays are hybridized with
mRNA from cells grown under specific conditions
and analyzed to determine patterns of gene
expression
• Arrays are large and dense enough that the
transcription pattern of an entire genome can be
analyzed (Figure 6.18)
© 2015 Pearson Education, Inc.
© 2015 Pearson Education, Inc.
Figure 6.18
6.7 Microarrays and the Transcriptome
• What can be learned from microarray
experiments?
• Global gene expression
• Expression of specific groups of genes under different
conditions
• Expression of genes with unknown function; can yield
clues to possible roles
• Comparison of gene content in closely related
organisms
• Identification of specific organisms
© 2015 Pearson Education, Inc.
6.8 Proteomics and the Interactome
• Proteomics
• Genome-wide study of the structure, function, and
regulation of an organism's proteins
© 2015 Pearson Education, Inc.
6.8 Proteomics and the Interactome
• Two-dimensional (2-D) polyacrylamide gel
electrophoresis
• Technique for separating, identifying, and measuring all
proteins present in a sample (Figure 6.20)
• In first (horizontal) dimension, proteins are separated by
differences in isoelectric points (charge)
• In second (vertical) dimension, proteins are separated by
size
Generates pattern of spots-each spot is an individual protein
© 2015 Pearson Education, Inc.
6.8 Proteomics and the Interactome
• Generates pattern of spots-each spot is an
individual protein
• Individual spot can be cut out and studies
• Mutant and wild type cells can be compared to
study protein function
Newer technology: HPLC (high pressure liquid chromatography)
and mass spectrometry
© 2015 Pearson Education, Inc.
© 2015 Pearson Education, Inc.
Figure 6.20
6.8 Proteomics and the Interactome
• Proteomics relies on genomics
• Sequence the genome of the organism
• Compare to genomes of other organisms
• Identify similar genes
• Different DNA sequence may not change protein
sequence
© 2015 Pearson Education, Inc.
6.8 Proteomics and the Interactome
• Proteins with >50% sequence similarity typically
have similar functions
• Proteins with >70% sequence similarity almost
certainly have similar functions
• Protein domains
• Distinct structural modules within proteins
• Have characteristic functions that can reveal much
about a protein's role, even in the absence of complete
sequence homology
© 2015 Pearson Education, Inc.
6.8 Proteomics and the Interactome
• Interactome
• Does not refer to physical things or molecules
• Refers to complete set of interactions among molecules
• Protein-protein, protein-RNA etc
• Data expressed in the form of network diagrams
• Simplified version (Figure 6.22)
© 2015 Pearson Education, Inc.
© 2015 Pearson Education, Inc.
Figure 6.22
6.9 Metabolomics and Systems Biology
• Metabolome
• The complete set of metabolic intermediates and other
small molecules produced in an organism
• Mass spectrometry is one of the primary
techniques for monitoring metabolites
• MALDI-TOF = well adapted for biomolecules
• Can be used to identify unknown microbial species
© 2015 Pearson Education, Inc.
6.9 Metabolomics and Systems Biology
• Systems biology
• Integration of different fields of research (Figure 6.24)
• Genomics
• Proteomics
• Transcriptonomics
• Metabolonomics
• Other
© 2015 Pearson Education, Inc.
Systems biology
Top level
Compares data and builds a computer model of the
system being studied
Figure 6.24
© 2015 Pearson Education, Inc.
IV. Evolution of Genomes
• 6.11 Gene Families, Duplications (and
Divergence), Deletions
• 6.12 Horizontal Gene Transfer and Genome
Stability
© 2015 Pearson Education, Inc.
6.11 Gene Families, Duplications, and Deletions
• Gene duplications thought to be mechanism for
evolution of most new genes (Figure 6.28)
• Gene analysis in the three domains of life
suggests that many genes present in all
organisms have common evolutionary roots
• Mechanisms for duplication:
• Replication errors
• Recombination
© 2015 Pearson Education, Inc.
“Gene A” is the
ancestral gene
Gene Duplication
Gene Divergence
Two Related Genes
Gene A and Gene A’ are members of a gene family
© 2015 Pearson Education, Inc.
Ancestral gene
Methionine metabolism
Gene duplication
RLP alpha
RLP beta
Duplicate
genes
Different sequence
changes
Some sequence
changes
Gene retains
original function
© 2015 Pearson Education, Inc.
RLP beta
and gamma
ancestor RLP gamma
Duplicate
gene
Gene evolves
new role
Transcription
and translation
Enzyme
retains
original
role.
Ancestral
gene
Enzyme
catalyzes
novel
reaction.
Unknown
function
Purple bacteria
RubisCO
(large subunit)
ancestor
RubisCO Cyanobacteria
Form II
RubisCO
Form I
RubisCO
Methanogens
duplicate
RubisCO
Form III
Figure 6.28
Mutations in duplicated gene may eliminate function of
gene or gene product
“Dead” genes are called pseudogenes
Eliminated very quickly from prokaryotic genomes
Due to pressure for genetic economy
In general, the loss of
nonfunctional or unnecessary
genes is called gene deletion
© 2015 Pearson Education, Inc.
6.11 Gene Families, Duplications, and Deletions
• Homologous: related sequence that implies
common genetic ancestry
• A gene related to a second gene by descent from
a common ancestral DNA sequence. The term
homolog, may apply to the relationship between
genes separated by the event of speciation (see
ortholog) or to the relationship between genes
separated by the event of genetic duplication (see
paralog).
• Gene families: groups of gene homologs
(Figure 6.27)
© 2015 Pearson Education, Inc.
Paralogs: genes within an organism whose similarity to one
or more genes in the same organism is the result of gene
duplication
Paralogs are genes related by duplication within a genome.
Orthologs retain the same function in the course of
evolution, whereas paralogs evolve new functions, even if
these are related to the original one.
Paralogs are homologs produced by duplication/divergence
within a species
Paralogs have homologous origin but heterologous
activities.
© 2015 Pearson Education, Inc.
Orthologs: genes found in one organism that are similar to
those in another organism but differ because of speciation
Orthologs are genes in different species that evolved from
a common ancestral gene by speciation. Normally,
orthologs retain the same function in the course of
evolution. Identification of orthologs is critical for reliable
prediction of gene function in newly sequenced genomes.
Orthologs are homologs produced by speciation
Orthologs have homologous origin and homologous
activity.
© 2015 Pearson Education, Inc.
6.12 Horizontal Gene Transfer and Genome
Stability
• Horizontal gene transfer (Figure 6.29)
• Major force in evolution but not duplication and
divergence
• The transfer of genetic information between organisms,
as opposed to vertical inheritance from parental
organism(s)
• May be extensive in nature
• May cross phylogenetic domain boundaries
© 2015 Pearson Education, Inc.
Vertical gene transfer
Horizontal gene transfer
Chromosome
Genome
replication
and cell
division
© 2015 Pearson Education, Inc.
Figure 6.29
Bacterial Transformation
Direct uptake of DNA from environment
Followed by recombination (crossover) into host DNA
© 2015 Pearson Education, Inc.
Bacterial Conjugation
© 2015 Pearson Education, Inc.
Bacterial Conjugation
• Tra operon genes control DNA transfer
• Plasmid such as F plasmid may move alone
• Or may carry chromosomal genes as well
© 2015 Pearson Education, Inc.
Bacterial Transduction
Bacterial DNA carried from cell to cell by a virus or “phage”
© 2015 Pearson Education, Inc.
• General or generalized transduction can involve
any gene from the chromosome
• Special or specialized transduction involves only
certain genes
• Why? Because the virus that carries the genes
has inserted itself into the host chromosome at a
specific location
© 2015 Pearson Education, Inc.
6.12 Horizontal Gene Transfer and Genome
Stability
• Detecting horizontal gene flow
• Presence of genes typically found only in distantly
related species
• Presence of a DNA with GC content or codon bias that
differs significantly from remainder of genome
• “footprints” of gene transfer in the genome
• Horizontally transferred genes typically do not
encode core metabolic functions
© 2015 Pearson Education, Inc.
6.12 Horizontal Gene Transfer and Genome
Stability
• Insertion sequences aka simple transposons—
pieces of transposable DNA whose genes encode
only transposition
IS = insertion sequence
© 2015 Pearson Education, Inc.
6.12 Horizontal Gene Transfer and Genome
Stability
• Complex Transposons—pieces of transposable
DNA whose genes encode transposition genes
and some other gene or genes
Tn = transposon
© 2015 Pearson Education, Inc.
Another example comparison of simple and
complex transposons
© 2015 Pearson Education, Inc.
A mechanism for
transposition
Results in target
site duplication
© 2015 Pearson Education, Inc.
6.12 Horizontal Gene Transfer and Genome
Stability
• Transposons may transfer DNA between different
organisms
• Transposons may also mediate large-scale
chromosomal changes within a single organism
• Presence of multiple insertion sequences (IS)
• Recombination among identical IS can result in
chromosomal rearrangements
• Examples: deletions, inversions, or translocations
© 2015 Pearson Education, Inc.
6.13 Core Genome versus Pan Genome
• The "pan"/"core" concept: genomes of bacterial
species consist of two components
• Core genome: shared by all strains of the species
(Figure 6.31)
• Pan genome: includes all the optional extras present in
some but not all strains of the species (Figure 6.31)
© 2015 Pearson Education, Inc.
6.13 Core Genome versus Pan Genome
• Chromosomal islands
• Region of bacterial chromosome of foreign origin that
contains clustered genes for some extra property such
as virulence or symbiosis
• Pathogenicity islands: chromosomal islands containing
genes for virulence (Figure 6.33)
© 2015 Pearson Education, Inc.
6.13 Core Genome versus Pan Genome
• Chromosomal islands believed to have a "foreign"
origin based on several observations
• Extra regions often flanked by inverted repeats
• Base composition and codon usage in chromosomal
islands often differ from rest of genome
• Often found in some strains of a species but not others
© 2015 Pearson Education, Inc.
6.13 Core Genome versus Pan Genome
• Chromosomal islands contribute specialized
functions not essential to growth
• Virulence
• Biodegradation of recalcitrant compounds
• For example, hydrocarbons and herbicides
• Symbiosis
© 2015 Pearson Education, Inc.