Download Lecture 25 student powerpoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ridge (biology) wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genomic imprinting wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression wikipedia , lookup

Molecular ecology wikipedia , lookup

Gene desert wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene regulatory network wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Community fingerprinting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Lecture 25 Phylogeny
Based on Chapter 23 Molecular Evolution
Copyright © 2010 Pearson Education Inc.
1. Introduction
2. Substitutions in Protein and DNA
sequences
3. Comparing Sequences Using Sequence
Alignments
4. Substitutions and the Jukes-Cantor Model
5. Rates of Nucleotide Substitution
6. Comparative Genomics
1. Genome sequencing provides a map to genes but does not
reveal their function. Comparative genome analysis:
a. Compares genes with low evolutionary rate and high functional
significance.
b. Pseudogenes, which are free to mutate, are used to calculate
expected mutation rates.
c. Regions of high sequence similarity in distantly related species
are likely to contain functional genes.
2. Between mice and humans, for example, pseudogenes show
about five times as many changes as regions that encode
proteins or regulate gene expression.
3. Natural selection evaluates the consequences of an
enormous number of changes, on an evolutionary time scale.
4. Comparative genome analysis can point the way to
meaningful experiments by:
a. Saving the effort of saturation mutagenesis.
b. Allowing use of model organisms (e.g., yeast).
7. Codon Usage Bias
8. Variation in Evolutionary Rates between Genes
9. Molecular Clocks
10. Molecular Phylogeny
11. Phylogenetic Trees
NR = (2n - 3)! / [2n-2(n - 2)!]
NU = (2n - 5)! / [2n-3(n - 3)!]
12. Number of Possible Trees
NR = (2n - 3)! / [2n-2(n - 2)!]
NU = (2n - 5)! / [2n-3(n - 3)!]
# sequences
2
3
4
5
6
7
8
9
10
# unrooted
trees
1
1
3
15
105
945
10,395
135,135
2,027,025
# rooted
trees
1
3
15
105
945
10,395
135,135
2,027,025
34,459,425
13. Gene versus Species Treees
14. Reconstruction Methods
•
Many possibilities exist for the REAL phylogenetic trees,
and it is generally impossible to know which is the true
tree that represents actual events in evolution. Most
phylogenetic trees generated with molecular data are
considered inferred trees.
• Computer algorithms that generate these inferred trees
use three types of approaches:
– Distance matrix methods.
– Parsimony-based methods.
– Maximum likelihood methods.
• We will examine each of these mehtods briefly.
15. Distance Matrix Phylogenetic Reconstruction
Let Ci and Cj be two disjoint clusters:
1
di,j = ————————
|Ci| × |Cj|
Ci
pq dp,q, where p  Ci and q  Cj
Cj
In words: calculate the average over all pairwise inter-cluster
distances for all taxa under consideration.
16. Parsimony-Based approaches to
Phylogenetic Reconstruction
17. Parsimony based Approaches II
18 . Maximum Likelihood Approaches to
Phylogenetic Tree Reconstruction
•
Maximum likelihood approaches offer a purely statistical
alternative method to reconstruct phylogenies.
–
–
•
Complications of the maximum likelihood method include:
–
–
–
•
In a set of sequence alignments, probability is considered for
every possible nucleotide substitution.
Transitions, for example, are three times more likely than
transversions, and so individuals with transition divergences
may be considered more closely related than those with
transversions.
Lack of knowledge about the ancestral sequence.
The probability that multiple substitutions have occurred.
The fact that sites are not necessarily independent or
equivalent.
A vast number of trees are possible with this computationintensive method. The one with the highest aggregate
probability is most likely to reflect the true phylogenetic tree.
19. Bootstrapping and Tree Reliability
•
Large numbers (e.g., >30 species) of long sequences
are difficult to analyze, even with fast computers and
streamlined algorithms.
• Neither distance matrix nor maximum parsimony
methods can guarantee the correct tree; but generally, if
a similar tree results from both of these fundamentally
different methods, it is considered fairly reliable.
• The confidence level for portions of inferred trees can
be determined by bootstrap tests, in which a subset of
the original data is drawn with replacement and a new
tree inferred.
• Caution is needed in interpreting bootstrap results.
20. The Tree of Life
21. Multigene Families
22a. Gene Duplication and Gene Conversion
1. Duplication frees a copy of the sequence to undergo
changes, since a functional copy will still exist.
a. Most changes would produce less functional
products, or even nonfunctional pseudogenes.
b. A few changes, however, might alter function and/or
pattern of expression to something more
advantageous for the organism. Selection would
allow these genes to become widespread in the
population.
Gene X
Gene X
Gene X
Gene X
Gene x’
22b. Gene Duplication and Gene Conversion
2. Misalignment between a pseudogene and a functional copy
can result in gene conversion through recombination events.
a. The allele on one homolog is copied and replaces the
DNA sequence of the allele on the other homolog; it is not
reciprocal exchange.
b. Gene conversion gives organisms even more
opportunities to create a gene with a new function.
3. Gene conversion continues to operate in modern humans.
An example is two genes for red-green color vision on the X
chromosome that undergo gene conversion in most of the
known cases of spontaneous deficiencies in green color
vision.
23. Domain (Exon) Shuffling
1. Often, less than an entire gene is duplicated, resulting in
copies of protein domains.
a. Example - human serum albumin.
b. Internal duplication not a rapid method of producing proteins
with new functions.
2. Most complex proteins arise from assemblages of
several protein domains performing different functions
a. The beginnings and ends of exons and protein domains often
correspond.
b. Gilbert (1978) - most gene families arose through domain
shuffling involving duplication and rearrangement.
c. Domain shuffling theory proposes that introns were a feature of
early life on Earth, even though they are now missing from
prokaryotes.
d. Numerous examples of complex genes made from segments of
other genes are known, and clearly some novel functions have
been created in this way.