Download Page 517 Duplication of the S. cerevisiae genome

Document related concepts

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Essential gene wikipedia , lookup

Point mutation wikipedia , lookup

Metagenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

X-inactivation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genetic engineering wikipedia , lookup

Polyploid wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression programming wikipedia , lookup

Oncogenomics wikipedia , lookup

Transposable element wikipedia , lookup

NUMT wikipedia , lookup

RNA-Seq wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Ridge (biology) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Public health genomics wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Human genome wikipedia , lookup

Designer baby wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Eukaryotic Genomes:
Fungi
Wednesday, October 22, 2003
Introduction to Bioinformatics
ME:440.714
J. Pevsner
[email protected]
Copyright notice
Many of the images in this powerpoint presentation
are from Bioinformatics and Functional Genomics
by J Pevsner (ISBN 0-471-21004-8).
Copyright © 2003 by Wiley.
These images and materials may not be used
without permission from the publisher.
Visit http://www.bioinfbook.org
Announcements
We are in the last third of the course:
Today: Fungi. Exam #2 is due at the start of class.
Next Monday: Functional genomics (Jef Boeke)
Next Wednesday: Pathways (Joel Bader)
Monday Nov. 3: Eukaryotic genomes
Wednesday Nov. 5: Human genome
Monday Nov. 10: Human disease
Wednesday Nov. 12: Final exam (in class)
Outline of today’s lecture
Description and classification of fungi
The Saccharomyces cerevisiae genome
Duplication of the yeast genome
Functional genomics in yeast
Comparative genomics of fungi
Introduction to fungi: phylogeny
Fungi are eukaryotic organisms that can be filamentous
(e.g. molds) or unicellular (e.g. the yeast Saccharomyces
cerevisiae).
Most fungi are aerobic (but S. cerevisiae can grow
anaerobically). Fungi have major roles in the ecosystem
in degrading organic waste. They have important roles
in fermentation, including the manufacture of steroids
and penicillin.
Several hundred fungal species are known to cause
disease in humans.
Eukaryotes
(Baldauf et al., 2000)
Fungi and metazoa are sister groups
Baldauf et al., 2000
Fig. 15.1
Page 504
Classification of fungi
About 70,000 fungal species have been described
(as of 1995), but 1.5 million species may exist.
Four phyla:
Ascomycota
Basidiomycota
Chytridiomycota
Zygomycota
yeasts, truffles, lichens
rusts, smuts, mushrooms
Allomyces
feed on decaying vegetation
Box 15-1
Page 505
Classification of fungi
About 70,000 fungal species have been described
(as of 1995), but 1.5 million species may exist.
Four phyla:
Ascomycota
yeasts, truffles, lichens
Hemiascomycetae
Génolevure project
Euascomycetae
Neurospora
Loculoascomycetae
Laboulbeniomycetae
parasites of insects
Basidiomycota
rusts, smuts, mushrooms
Chytridiomycota Allomyces
Zygomycota
feed on decaying vegetation
Box 15-1
Page 505
Page 505
Page 505
Introduction to Saccharomyces cerevisiae
First species domesticated by humans
Called baker’s yeast (or brewer’s yeast)
Ferments glucose to ethanol and carbon dioxide
Model organism for studies of biochemistry,
genetics, molecular and cell biology
…rapid growth rate
…easy to modify genetically
…features typical of eukaryotes
…relatively simple (unicellular)
…relatively small genome
Page 505
Sequencing the S. cerevisiae genome
The genome was sequenced by a highly cooperative
consortium in the early 1990s, chromosome by chromosome
(the whole genome shotgun approach was not used).
This involved 600 researchers in > 100 laboratories.
--Physical map created for all XVI chromosomes
--Library of 10 kb inserts constructed in phage
--The inserts were assembled into contigs
The sequence released in 1996, and published in 1997
(Goffeau et al., 1996; Mewes et al., 1997)
Page 505
Features of the S. cerevisiae genome
Sequenced length: 12,068 kb = 12,068,000 base pairs
Length of repeats: 1,321 kb
Total length:
13,389 kb (~ 13 Mb)
Open reading frames (ORFs):
Questionable ORFs (qORFs):
Hypothetical proteins:
Introns in ORFs:
Introns in UTRs:
Intact Ty elements:
tRNA genes:
snRNA genes:
220
15
52
275
40
6,275
390
5,885
Page 506
Features of the S. cerevisiae genome
A notable feature of the genome is its high gene density
(about one gene every 2 kilobases). Most bacteria have
about one gene per kb, but most eukaryotes have a
much sparser gene density.
Also, only 4% of S. cerevisiae genes are interrupted
by introns. By contrast, 40% of Schizosaccharomyces
pombe genes have introns.
What are the most common protein families and protein
domains? You can see the answer at EBI’s website:
http://www.ebi.ac.uk/proteome/
Page 506
Fig. 15.2
Page 508
Page 506
The EBI website offers a variety
of proteome analysis tools, such
as this summary of protein length
distribution in S. cerevisiae.
http://www.ebi.ac.uk/proteome/
Fig. 15.3
Page 509
ORFs in the S. cerevisiae genome
How are ORFs defined? In the initial genome analysis,
an ORF was defined as >100 codons (thus specifying
a protein of ~11 kilodaltons).
390 ORFs were listed as “questionable”, because they
were considered unlikely to be authentic genes.
For example, they were short, or exhibited unlikely
preferences for codon usage.
How many ORFs are there in the yeast genome?
There are 40,000 ORFs > 20 amino acids; how many
of these are authentic?
Page 506-507
ORFs in the S. cerevisiae genome
Several criteria may be applied to decide if ORFs are
authentic protein-coding genes:
[1] evidence of conservation in other organisms
[2] experimental evidence of gene expression
(microarrays, SAGE, functional genomics)
The groups of Elizabeth Winzeler and Michael Snyder each
recently described hundreds of previously unannotated
genes that are transcribed and translated.
Page 507
ORFs in the S. cerevisiae genome
The MIPS Comprehensive Yeast Genome Database
lists criteria for assigning ORFs, based on FASTA
search scores:
Number
Category
of proteins
Known protein
3400
Strong similarity to known protein
230
Similarity or weak similarity to known protein 825
Similarity to unknown protein
1007
No similarity
516
Questionable ORF
472
Total
6450
Page 507, 510
Exploring a typical S. cerevisiae chromosome
We will next familiarize ourselves with the S. cerevisiae
genome by exploring a typical chromosome, XII.
Page 508
Exploring a typical S. cerevisiae chromosome
We will next familiarize ourselves with the S. cerevisiae
genome by exploring a typical chromosome, XII.
This chromosome features
• 38% GC content
• very little repetitive DNA
• few introns
• six Ty elements (transposable elements)
• a high ORF density: 534 ORFs > 100aa, and 72% of the
chromosome has protein-coding genes
Page 508-511
Key S. cerevisiae databases
Web resources include:
NCBI (Entrez  Genome  Eukaryotic genome projects)
EBI
http://www.ebi.ac.uk/proteome/
SGD: Saccharomyces Genome Database
http://genome-www.stanford.edu/Saccharomyces/
MIPS Comprehensive Yeast Genome Database
(MIPS = Munich Information Center for Protein Sequences)
http://mips.gsf.de/proj/yeast/CYGD/db/
Page 508
NCBI: Entrez genomes for yeast resources
Fig. 15.4
Page 510
NCBI: Entrez genomes for yeast resources
~Fig. 15.5
Page 511
NCBI: Entrez genomes for yeast resources
~Fig. 15.5
Page 511
MIPS offers a Comprehensive
Yeast Genome Database
http://mips.gsf.de/genre/proj/yeast/index.jsp
Fig. 15.6
Page 512
Saccharomyces Genome Database (SGD)
http://www.yeastgenome.org/
Fig. 15.7
Page 513
Fig. 15.7
Page 513
S. cerevisiae gene nomenclature
YKL159c
Y = yeast
K = 11th chromosome
L = left (or right) arm
159 = 159th ORF
c = Crick (bottom) or w (Watson, top) strand
Box 15-2
Page 514
S. cerevisiae gene nomenclature
YKL159c
Y = yeast
K = 11th chromosome
L = left (or right) arm
159 = 159th ORF
c = Crick (bottom) or w (Watson, top) strand
RCN1 = wildtype gene
Rcn1p = protein
rcn1 = mutant allele
Box 15-2
Page 514
Duplication of the S. cerevisiae genome
Analysis of the S. cerevisiae genome revealed that many
regions are duplicated, both intrachromosomally and
interchromosomally (within and between chromosomes).
These duplicated regions include both genes and
nongenic regions.
Such duplications reflect a fundamental aspect of
genome evolution.
What are the mechanisms by which regions of the genome
duplicate?
Page 511
Duplication of the S. cerevisiae genome
Mechanisms of gene duplication
tandem repeat
slippage
during
recombination
Gene
conversion
Segmental
duplication
Lateral
gene
transfer
polyploidy
e.g.
genome
tetraploidy
Fig. 15.8
Page 514
Duplication of the S. cerevisiae genome
Fate of duplicated genes
Both
copies
persist
One
copy is
deleted
One copy
becomes a
pseudogene
One copy
functionally
diverges
Fig. 15.8
Page 514
Duplication of the S. cerevisiae genome
In 1970, Susumu Ohno published the book Evolution by
Gene Duplication.
He hypothesized that vertebrate genomes evolved by
two rounds of whole genome duplication. This provided
genomes with the “raw materials” (new genes) with which
to introduce various innovations.
Page 512
Duplication of the S. cerevisiae genome
Ohno (1970):
“Had evolution been entirely dependent upon natural
selection, from a bacterium only numerous forms of
bacteria would have emerged. The creation of metazoans,
vertebrates, and finally mammals from unicellular
organisms would have been quite impossible, for such
big leaps in evolution required the creation of new gene
loci with previously nonexistent function. Only the
cistron that became redundant was able to escape from
the relentless pressure of natural selection. By escaping,
it accumulated formerly forbidden mutations to emerge
as a new gene locus.”
Page 512
Duplication of the S. cerevisiae genome
Wolfe and Shields (1997, Nature) provided support for
Ohno’s paradigm. They hypothesized that the yeast genome
duplicated about 100 million years ago. There was a diploid
yeast genome with about 5,000 genes. It doubled to a
tetraploid number of 10,000 genes. Then there was massive
gene loss and chromosomal rearrangement to yield the
present day 6,000 genes.
Page 515
Distance along chromosome XI (kb)
Wolfe and Shields (1997)
performed blastp and
found 55 blocks of
duplicated regions. They
proposed that the entire
S. cerevisiae genome
underwent a duplication.
Matches with scores >200
are shown. These are
arranged in blocks of
genes.
Distance along chromosome X (kb)
Fig. 15.9
Page 515
Duplication of the S. cerevisiae genome
Evidence of genome duplication in yeast
-- Systematic BLAST searches show 55 blocks
of duplicated sequences.
-- There are 376 pairs of homologous genes.
You can see the results of chromosomal comparisons
on Ken Wolfe’s web site and at the SGD web site.
Page 515
The SGD website
includes a pairwise
chromosome
similarity viewer.
Fig. 15.10
Page 516
Kenneth Wolfe offers a website that permits analysis
of yeast duplications:
http://oscar.gen.tcd.ie/~khwolfe/yeast/
Page 516
Page 516
As an example,
note the SSO1
gene on XVI
SSO1 (XVI) &
SSO2 (XVIII)
are part of
a block
Duplication of the S. cerevisiae genome
Two models for the presence of duplication blocks
[1] Whole genome duplication (tetraploidy) followed by
gene loss and rearrangements
[2] Successive, independent duplication events
Page 516
Duplication of the S. cerevisiae genome
Model [1] is favored for several reasons:
-- For 50 of 55 duplicated regions, the orientation of the
entire block is preserved with respect to the centromere.
The orientation is not random.
-- For model [2] we would expect 7 triplicated regions.
We observe only 0 or 1.
-- Gene order is maintained in 14 hemiascomycetes
(the Génolevures project)
Page 516
Duplication of the S. cerevisiae genome
The Génolevures project:
-- Partial sequencing of 13 hemiascomycetes
-- Gene order can be compared in 14 fungi
-- 70% of the S. cerevisiae genome maps to sister regions
with only minimal overlap
-- Proposal that the 16 centromeres form 8 pairs
Page 517
Duplication of the S. cerevisiae genome
The Génolevures project:
-- Partial sequencing of 13 hemiascomycetes
-- Gene order can be compared in 14 fungi
-- 70% of the S. cerevisiae genome maps to sister regions
with only minimal overlap
-- Proposal that the 16 centromeres form 8 pairs
Phylogenetic analyses place the divergence of S. cerevisiae
and Kluyveromyces lactis prior to the whole genome
duplication (~100 million years ago). Perhaps the genome
duplication enabled S. cerevisiae to acquire new properties
such as the capacity for anaerobic growth.
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half life
of just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half life
of just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:
[1] Both copies persist (gene dosage effect)
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half life
of just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:
[1] Both copies persist (gene dosage effect)
[2] One copy is deleted (a common fate)
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half life
of just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:
[1] Both copies persist (gene dosage effect)
[2] One copy is deleted (a common fate)
[3] One copy accumulates mutations and becomes
a pseudogene (no functional protein product)
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half life
of just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:
[1] Both copies persist (gene dosage effect)
[2] One copy is deleted (a common fate)
[3] One copy accumulates mutations and becomes
a pseudogene (no functional protein product)
[4] One copy (or both) diverges functionally. The
organism can perform a novel function.
Page 517
Duplication of the S. cerevisiae genome
Why are duplicated genes commonly lost? It might seem
highly advantageous to have a second copy of gene,
thus permitting functional divergence.
Ohno suggested two reasons:
[1] After duplication, a deleterious mutation in one of the two
genes might now persist. Without duplication, the individual
would have been selected against by such a mutation.
[2] The presence of a new paralogous sequence could lead to
unequal crossing over of homologous chromosomes
during meiosis.
Page 518
Duplication of the S. cerevisiae genome
To consider the fate of duplicated genes, consider the
example of genes involved in vesicle transport.
Vesicles carry cargo from one destination to another.
Proteins on vesicles (e.g. vesicle-associated membrane
protein, VAMP; Snc1p in yeast) bind to proteins on target
membranes (e.g. syntaxin in mammalian and other
eukaryotic systems, or Sso1p in yeast).
In S. cerevisiae, genome duplication appears to be
responsible for the presence of two syntaxins
(SSO1 and SSO2) and two VAMPs (SNC1 and SNC2).
Page 518
Duplication of the S. cerevisiae genome
Snc1p
Sso1p
Snc2p
Sso2p
Fig. 15.11
Page 518
Search for
information
on SSO1 (or any
yeast gene) at the
SGD website
The SGD record for SSO1 provides information on function
Fig. 15.12
Page 519
Duplication of the S. cerevisiae genome
The SGD website reveals that the SSO1 gene is nonessential
(i.e. the null mutant is viable), but the double knockout of
SSO1 and SSO1 is lethal. Thus, these paralogs may offer
functional redundancy to the organism.
Also, these proteins could participate in distinct (but
complementary) intracellular trafficking steps.
Page 519
Duplication of the S. cerevisiae genome
Andreas Wagner (2000) considered two ways an organism
can compensate for mutations: via genes with overlapping
functions (e.g. paralogs), or via genes with unrelated
functions that participate in regulatory networks.
He reported that overall, gene duplications did not provide
robustness. Instead, interactions among unrelated genes
provide robustness against mutations.
Page 519
Functional genomics in yeast
Functional genomics refers to the assignment
of function to genes based on genome-wide
screens and analyses.
Next week, Jef Boeke will describe functional genomics
(Monday). Joel Bader will describe proteomics
in yeast (Wednesday).
Page 520
We can consider functional genomics in yeast
in terms of high throughput approaches
at the levels of genes, transcripts, and proteins
Fig. 15.13
Page 520
Functional genomics in yeast (next week)
Protein level
Two-hybrid screens
Affinity purification and mass spectrometry
Pathways
RNA level
Microarrays
SAGE
transposon tagging
Gene level
Genetic footprinting
Transposon insertion: random mutagenesis
Gene deletion: targeted deletion of all ORFs!!!
Today’s final topic:
comparative analysis of fungal genomes
The fungi offer unprecedented opportunities
for comparative genomic analyses
-- relatively small genome sizes
-- they are eukaryotes
-- they exhibit significant differences in biology
-- opportunities to apply functional genomics approaches
in a comprehensive, genome-wide manner
Page 528
Fungal and metazoan phylogeny
Baldauf et al., 2000
Page 528
A variety of fungal genome sequencing projects
Aspergillus fumigatus
Aspergillus nigrans
Apergillus parasiticus
Candida albicans
Cryptococcus neoformans
Fusarium sporotrichiodes
Magnaporthe grisea
Neurospora crassa
Phanerochaete chrysoporium
Saccharomyces cerevisiae
Schizosaccharomyces pombe
Ustilago maydis
size chromosomes
30 Mb
8
29 Mb
8
16 Mb
21 Mb
8
40 Mb
43 Mb
30 Mb
13 Mb
14 Mb
20 Mb
7
7
10
16
3
An atypical fungus: Encephalitozoon cuniculi
Microsporidia are single-celled eukaryotes that lack
mitochondria and peroxisomes. Consistent with their
roles as parasites, the E. cuniculi genome is severely
reduced in size (2000 proteins, only 2.9 Mb). They were
thought to represent deep-branching protozoans, but
recent phylogenetic studies place them as an outgroup
to fungi.
Page 529
Encephalitozoon cuniculi as a fungal outgroup
Fig. 15.22
Page 529
Orange bread mold: Neurospora crassa
Beadle and Tatum chose N. crassa as a model organism
to study gene-protein relationships. The genome sequence
was reported: 39 Mb, 7 chromosomes, 10,082 ORFs
(Galagan et al., 2003).
N. crassa has only 10% repetitive DNA, and incredibly,
only 8 pairs of duplicated genes that encode proteins
>100 amino acids. This is because Neurospora uses
“repeat-induced point mutation” (RIP), a mechanism by
which the genome is scanned for duplicated (repeated)
sequences. This appears to serve as a genomic defense
system, inactivating potentially harmful transposons.
Page 530
Schizosaccharomyces pombe
The S. pombe genome is 13.8 Mb and encodes ~4900
predicted proteins. Some bacterial genomes encode
more proteins (e.g. Mesorhizobium loti with 6752,
and Streptomyces coelicolor with 7825 genes).
Chromosome
1
5.6 Mb
2
4.4 Mb
3
2.5 Mb
genes
2,255
1,790
884
Coding
59%
58%
55%
Total
4,929
58%
12.5 Mb
See: TIGR www.tigr.org
EBI www.sanger.ac.uk/Projects/S_pombe
Page 530
Schizosaccharomyces pombe
Chromosome
1
5.6 Mb
2
4.4 Mb
3
2.5 Mb
genes
2,255
1,790
884
Coding
59%
58%
55%
Total
4,929
58%
12.5 Mb
See: TIGR www.tigr.org
EBI www.sanger.ac.uk/Projects/S_pombe
Schizosaccharomyces pombe
S. pombe diverged from S. cerevisiae about
330 to 420 million years ago.
Many genes are as divergent between these
two fungi as they are diverged from humans.
To see this, try TaxPlot at NCBI.
Page 530
Perspective and pitfalls
The budding yeast S. cerevisiae is one of the most significant
organisms in biology:
• Its genome is the first of a eukaryote to be sequenced
• Its biology is simple relative to metazoans
• Through yeast genetics, powerful functional genomics
approaches have been applied to study all yeast genes
It is important to note that even for yeast, our knowledge
of basic biological questions is highly incomplete.
We still understand little about how the genotype of an
organism leads to its characteristic phenotype.
Page 531