Download MOLECULAR EVOLUTION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
MOLECULAR EVOLUTION

Molecular evolution examines DNA and
proteins, addressing two types of
questions:


How do DNA and proteins evolve?
How are genes and organisms
evolutionarily related?
Applications



Reveal dynamics of evolutionary
processes.
Indicate chronology of change.
Identify phylogenetic relationships.
Alignment of two sequences
Number of aligned positions = 23
Sequence Alignments



Matching nucleotides are interpreted as
unchanged since a common ancestor.
Substitutions, insertions, and deletions
can be identified.
Gaps inserted to maximize the similarity
between aligned sequences indicate
occurrence of insertions and deletions
(indels).
Optimal alignment

Many alignments are possible between
sequences, and algorithms typically
maximize the matching number of
amino acids or nucleotides, invoking the
smallest possible number of indel
events.
Substitutions

When DNA sequences diverge, they
begin to collect mutations. The number
of substitutions (P) found in an
alignment is widely used in molecular
evolution analysis.
An exemplary alignment
Number of aligned positions = 23
Number of different positions (P) = 8
Number of substitutions


If the alignment shows few
substitutions, a simple count is used.
If many substitutions occurred, it is
likely that a simple count will
underestimate the substitution events,
due to the probability of multiple
changes at the same site.
Jukes and Cantor Model
• They assumed that each nucleotide is
equally likely to change into any other
nucleotide, and created a mathematical
model to describe multiple base
substitutions.
• What other models could be
developed?
Jukes and Cantor model

K= -(3/4)*ln (1-(4/3)*P)


P= observed number of substitutions over the
total number of sites.
K=distance between sequence x and sequence y
expressed as the number of changes per site
corrected for multiple substitutions at the
same site


natural log (ln) corrects for the underestimation of
substitutions).
¾ and 4/3 are terms reflect that there are four types of
nucleotides and three ways in which a second nucleotide
may be substituted with.
Calculation of distance (K)
between sequences
P = 8/23 = 0.348
K = -(3/4)*ln(1-4/3*P) = 0.467
Observed distance P = 0.348 increases when Jukes Cantor
Model is used to correct for the multiple substitutions.
Correction for multiple
substitutions

If two sequences are 95% identical, then P =
0.05; and



K=
0.0517-0.05 = 0.0017
If two sequences are only 50% identical, then
P = 0.5; and


K=
0.824 – 0.5 = 0.324
Rates of nucleotide
substitutions



Substitutions accumulate independently
and simultaneously in different
sequences.
Substitution rate, R, can be calculated
by dividing the distance (K) between
two homologous sequences by 2T,
where T is the divergence time.
R = K/(2T).
Example



The following sequences represent an optimum
alignment of the first 50 nucleotides of human
and sheep preproinsulin genes, which last
shared a common ancestor 80 million years
ago:
Human:
Sheep:
ATGGCCTGT GGATGCGCCT CCTGCCCCTG CTGGCGCTGC TGGCCCTCTG
ATGGCCTGT GGACACGCCT GGTGCCCCTG CTGGCCCTGC TGGCACTCTG
Example






Human:
Sheep:
ATGGCCTGT GGATGCGCCT CCTGCCCCTG CTGGCGCTGC TGGCCCTCTG
ATGGCCTGT GGACACGCCT GGTGCCCCTG CTGGCCCTGC TGGCACTCTG
P = 6/50 = 0.12 (observed)
K = -(3/4)ln(1-(4/3)(0.12)) = 0.1308
Estimated number of substitutions = 50 x 0.1308 =
6.56
R = K/(2T) = 0.1308/(2 x 80 x 106) = 8.175 x 1010/year
Degenerate Code




Codons are degenerate.
Of 20 amino acids, 18 are encoded by more
than one codon.
Met (AUG) and Trp (UGG) are the exceptions;
all other correspond to a set of two or more
codons.
Codon sets often show a pattern in their
sequences; variation at the third position is
most common.
Degenerate Code


The code has start and stop signals. AUG, the
start signal for protein synthesis. Stop
codons have no corresponding tRNA (UAG,
amber; UAA, ochre; UGA, opal).
Wobble occurs in the anticodon. The 3rd base
in the codon is able to base-pair less
specifically, because it is less constrained
three dimensionally.
Patterns and Modes of
Substitutions

Patterns of variation within homologous
genes show that some amino acid
substitutions are found more frequently than
others.
Patterns and Modes of
Substitutions

Substitutions often involve amino acids with
similar chemical characteristics, supporting
two evolutionary principles:
 Mutations are rare events
 Most dramatic changes are removed by
natural selection.
Patterns and Modes of
Substitutions

Chemically similar amino acids tend to have
similar codons, and so may result from a
single mutation.
 Natural selection acting on this variation
produces proteins optimized for role and
environment.
 More substantial alterations of protein
structure are likely to be deleterious and
removed from gene pool.
Synonymous and nonsynonymous sites
• Synonymous changes, which do not alter the
amino acids in the protein, are found five
times more often than non-synonymous
changes.
Synonymous and nonsynonymous sites
– Both types of change are equally likely to
occur, but non-synonymous changes are
usually detrimental to fitness, and are
eliminated by natural selection.
• Mutations are changes in nucleotide
sequences due to errors in replication or
repair.
• Substitutions are mutations that have
passed through the filter of selection.
Mutations vs. substitutions



Would the mutation rate would be
greater or less than the observed
substitution rate, for example 8.175 x
10-10 is for preproinsulin gene.
YES
Why?
Variation in evolutionary rates
within genes
• Studies show that different regions of
genes evolve at different rates.
• Distinctions are seen between and
within coding and non-coding regions.
Examples of non-coding regions include
introns, leaders, non-transcribed
flanking regions, pseudogenes.
Relative rates of evolutionary
change in mammals
Sequence
Functional genes
5’ flanking region
CDS, synonymous
CDS, nonsynonymous
Intron
3’ flanking region
Pseudogenes
R (x 10-9)
2.36
4.65
0.88
3.70
4.46
4.85
Flanking regions and introns



Changes in 3’ sequences have no known
effect on the amino acid sequence; so most
substitutions are tolerated.
Rate of substitutions are high in introns but
not as high as in synonymous of CDS.
5’ untranslated regions have low rates: they
contain regulatory regions for transcription.
Pseudogenes


Highest rate of evolution is that of
nonfunctional pseudogenes, which no
longer code for proteins.
What advantage pseudogenes provide
for evolution of multiple gene families?
Coding sequences with high
rates of nonsynonymous
substitution

Major histocompatibility complex (MHC) in
mammals




If there is evolutionary pressure for diversity,
substitutions become advantageous.
MHC is involved in immune function where
diversity favors fewer individuals vulnerable to an
infection by any single virus.
Viruses utilize error-prone replication coupled with
diversifying selection.
Both viruses and MHC complex rapidly evolves due
to natural selection for diversification.
Ribosomal RNAs


Sequences of rRNA regions that interact and
provide for ribosomal function by pairing will
be subject to mutation at the same rates as
sequences that do not pair.
However, mutations that disrupt pairing will
be selected against, since such mutations will
alter ribosomal function and become
detrimental to fitness.
Mitochondrial DNA (mtDNA)


Mammalian mitochondrial genome contains a
circular, double-stranded mtDNA about 15000
bp long (1/10000 of the nuclear genome,
encoding 2 rRNAs, 22 tRNAs, and 13
proteins).
The average synonymous substitution rate in
mammalian mitochondria is 5.7 x 108/site/year, 10 times higher than the
synonymous substitutions in nuclear genes.
Mitochondrial DNA (mtDNA)

The higher rates of mutation in mtDNA are
likely to be due to:



The higher error rate during mtDNA replication and repair.
mtDNA polymerases have no proofreading ability.
Higher concentrations of mutagens such as free radicals
resulting from metabolic processes.
Less selective pressure because there are many of them
within the cell; changes are less detrimental.
Maternal transmission



Clonal inheritance from mother, when the
mother’s egg contributes to the zygote. So no
meiosis occurs, all offspring will have the
same mtDNA from the same mother.
Study matriarchal lineages can be traced
allowing examination of family structure.
Example: geographic variation in mtDNA
sequences of pocket gophers in south eastern
USA.
Lineage relationships among
mtDNA types in pocket gophers
Letters are different mtDNA
types grouped according to
similarity, and are
superimposed on a geographic
map of the collection sites.
The tick marks across
connecting lines are the
number of mutations.
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
mtDNA or nuclear DNA

Suppose you are studying human
migrational patterns?

Would you use mtDNA or nuclear genes to
estimate how long ago humans moved
from a particular place to another?
mtDNA or nuclear DNA?

Since the time scale is on the order of tens of
thousands years, and mtDNA accumulate
more mutations than nuclear DNA, mtDNA
will provide more information about the
differences between human populations
geographically separated.

What about if you want to study the phylogenetic
relationships of mammalian species that diverged
80 million years ago?

(HINT: multiple substitutions)
Molecular Clock

Suggests that rates of molecular
evolution for loci with similar functional
constraints are uniform during the time
period after divergence from a common
ancestor (Fossil record).
The molecular clock for alphaglobin
The molecular clock for alpha-globin:
Each point represents the number of substitutions separating each animal
from humans
shark
80
carp
60
platypus
chicken
40
500
400
300
200
0
100
cow
20
0
number of substitutions
100
Time to common ancestor (millions of years)
Rates of amino acid
replacement
in proteins
Rates
of amino acid
replacement
different proteins
Protein
Fibrinopeptides
Insulin C
Ribonucleas e
Haemoglobins
Cytochrome C
Histone H4
in
Rate (mean replacements per site
per 10 9 years)
8.3
2.4
2.1
1.0
0.3
0.01
Fig. 24.3 The molecular clock runs at different rates in different proteins
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
Molecular Phylogeny


Organisms are similar at the molecular
level are expected to be more closely
related than dissimilar organisms.
Phylogenetic relationships among living
things are inferred from molecular
similarity.