Download Gene Substitution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Phylogenetic Tree Construction
Outgroup)
Taxon 1)
Taxon 2)
Taxon 3)
Taxon 4)
ATGTCAGGGACTCAGATCGAATGGGATCTAG
.....G......T..................
.....G......T........C.........
.....G...........A.............
.....G...........A........G....
Phylogenetic Tree Construction
Outgroup)
Taxon 1)
Taxon 2)
Taxon 3)
Taxon 4)
ATGTCAGGGACTCAGATCGAATGGGATCTAG
.....G......T..................
.....G......T........C.........
.....G...........A.............
.....G...........A........G....
Outgroup
AG
Ancestor to taxa 1-4
Phylogenetic Tree Construction
Outgroup)
Taxon 1)
Taxon 2)
Taxon 3)
Taxon 4)
ATGTCAGGGACTCAGATCGAATGGGATCTAG
.....G......T..................
.....G......T........C.........
.....G...........A.............
.....G...........A........G....
Outgroup
Taxon 1
CT
TC
AG
Taxon 2
Taxon 3
CA
TG
Taxon 4
Rates of Nucleotide Substitution

Basic quantity in studying molecular
evolution
– Among genes
– Within genes
– Among organisms
– Among codon positions or 2nd structure
r = rate of nucleotide substitution

It is defined as the number of substitutions
per site per year.
Ancestor
T
Seq1
T
Seq2
K
r
2T
Calculating the rate of nucleotide substitution (r)
Ancestral sequence
T = years since
divergence
K = substitutions
that occurred
since divergence
T
T
Sequence A
Sequence B
r = K/2T
Different Gene Regions

Coding regions
– Nondegenerate sites
– Twofold degenerate sites
– Fourfold degenerate sites

Noncoding regions
– 5’ & 3’ untranslated regions
– Introns
– Psuedogenes
Table 4.1 Rates of synonymous and nonsynonymous nucleotide sustitutions (± standard errors) in various
mammalian protein-coding genesa
Gene
Number of codons
compared
Nonsynonymous rate
Synonymous rate
S14
150
0.02 ± 0.02
2.16 ± 0.42
S17
134
0.06 ± 0.04
2.69 ± 0.53
Actin α
376
0.01 ± 0.01
2.92 ± 0.34
Myosin β heavy chain
1933
0.10 ± 0.01
2.15 ± 0.13
Glucagon
29
0.00 ± 0.00
2.36 ± 1.08
Insulin
51
0.20 ± 0.10
3.03 ± 1.02
Interleukin-1
265
1.50 ± 0.15
3.27 ± 0.46
Relaxin
53
2.59 ± 0.51
6.39 ± 3.75
153
0.57 ± 0.11
4.10 ± 0.85
106
2.03 ± 0.30
5.56 ± 1.18
136
3.06 ± 0.37
5.50 ± 1.45
Aldolase A
363
0.09 ± 0.03
2.78 ± 0.33
Amylase
506
0.63 ± 0.06
3.42 ± 0.38
0.74 (0.67)
3.51 (1.01)
Ribosomal proteins
Contractile system proteins
Activators, factors, and
receptors
Blood proteins
Myoglobin
Immunoglobulins
Ig κ
Interferons
γ
Enzymes
Averageb
Table 4.2 Rates of transitional and transversional substitutions (per site
per 109 years) at nondegenerate, twofold degenerate, and fourfold
degenerate codon sitesa
Type of substitution
Nondegenerate
Twofold degenerate
Fourfold degenerate
Transition
0.40
1.86
2.24
Transversion
0.38
0.38
1.47
Total
0.78
2.24
3.71
aThe
rates are averages over the genes in Table 4.1.
Noncoding regions
Causes of Rate Variation

Functional constraints
Causes of Rate Variation

Synonymous vs. Nonsynonymous rates
– Should be similar in rate (Ka/Ks=1)
– Why not?

Selection
– Advantageous
– Purifying
Causes of Rate Variation
Variation
within a
gene
Causes of rate Variation

Variation among genes
– Rate of mutation
– The intensity of selection (1000 fold in Ks)
• Intensity of purifying selection (functional cont)

Partial loss of function
– Relaxation of selection
Rate variation is explained by:

Mutation input

Random genetic drift of nearly neutral
alleles

Purifying selection against
deleterious alleles
Positive Selection

Nonsynonymous changes are far more
likely than synonymous changes to
improve function

Advantageous mutations are fixed more
quickly than neutral mutations

Ka should exceed Ks if positive selection
plays a major role in the evolution of the
protein
Detecting Positive selection

Multiple methods
KA  KS
t
V ( KA)  V ( KS )
Lysozyme and foregut fermentation

Lysozymes - enzymes that catalyse the
break up of some bacterial cell walls .
 Important bacterial defence.
 Differences in Gastric lysozymes:
– They are most active at low pH.
– They are unusually resistant to cleavage by
pepsin.
Colubine monkeys
(colubus and langurs)
Hoatzin
E14
E21
D75
N87
K126
Lysozyme
Hoatzin
Pigeon
Calcium-binding
lysozymes
Horse
K14
K21
D75 Langur
N87
K126
Human
Chicken
K14
K21
D75
N87
E126
Cow
Conventional
lysozymes
A
Pattern of nucleotide substitutions
G
nucDNA
C
T
mtDNA
cmos
ND4
12S rRNA
16S rRNA
16
14
AC
AT
CG
CT
GT
12
10
8
6
4
1775
1705
AC
1635
1565
1495
1425
1355
1285
AT
1215
1145
1075
1005
935
865
GT
CT
CG
795
725
655
585
515
445
375
305
235
165
95
0
25
2
U
U
C
A
G
UUU
Phe
UUC
UUA
Leu
UUG
CUU
CUC
Leu
CUA
CUG
AUU
AUC Ile
AUA
AUG Met
GUU
GUC
Val
GUA
GUG
C
UCU
UCC
UCA
UCG
CCU
CCC
CCA
CCG
ACU
ACC
ACA
ACG
GCU
GCC
GCA
GCG
A
Ser
Pro
Thr
Ala
UAU
UAC
UAA
UAG
CAU
CAC
CAA
CAG
AAU
AAC
AAA
AAG
GAU
GAC
GAA
GAG
G
Tyr
Stop
Stop
His
Gln
Asn
Lys
Asp
Glu
UGU
UGC
UGA
UGG
CGU
CGC
CGA
CGG
AGU
AGC
AGA
AGG
GGU
GGC
GGA
GGG
Cys
Stop
Trp
Arg
Ser
Arg
Gly
U
C
A
G
U
C
A
G
U
C
A
G
U
C
A
G
Codon Usage

Nonrandom Usage of Synonymous
codons
– should be equally used
– not what is found
Graph
Molecular Clocks

Zuckerkandl and Pauling, 62 & 65
– Similar substitution rates among various
lineages of mammals
– Proposed that for any given protein, the
rate of molecular evolution is
approximately constant over time
Molecular Clocks

Use sub. Rate
 Paleontological
data for a know
split
 Apply to unkown
splits
K
r
2T
Molecular Clock
Rate of substitution = rate of mutation

MOLECULAR CLOCK
# differences

time
Number of changes is proportional to
time
 Use number of changes to estimate
relative divergence of species or genes

Calculating the rate of nucleotide substitution (r)
Ancestral sequence
T = years since
divergence
K = substitutions
that occurred
since divergence
T
T
Sequence A
Sequence B
r = K/2T
Once the molecular clock is calibrated it can date other events
Ancestral sequence
Can now date
this event
T
T
Sequence A
Sequence C
Sequence B
T = K/2r
Molecular Clock


Rate = substitutions per bp per year
Rate of evolution of DNA is constant over
time and across lineages
 Resolve history of species
– Timing of events
– Relationship of species

Morphological evolution (fossil record) not
constant

Early protein studies showed approximately
constant rate of evolution
Different rates within a gene or
genome
Coding sequences evolve more
slowly than non-coding sequences
 Synonymous substitutions are often
more common than non-synonymous
 Some sequences are under functional
constraint
 Different genes evolve at different
rates

Useless concept?
There is no Universal Molecular Clock
 Still a very useful concept
 Possible to examine both short and
long term evolutionary processes by
choosing appropriate dataset


Probably more useful than a constant
clock
Rates
There is no single rate
 How do we relate molecular time to
geological time?
 Calibrate the clock

– Lineage divergences in fossil record
– Major geological events causing
isolation of populations
• Continental drift (Panama Isthmus)
• Island or lake formation
Testing the Molecular Clock
Estimate the number of divergences
over time
 Are these equal for the lineages of
interest?

Problem: fossil dating of divergence
times is often inaccurate, and not
possible for all lineages
 Cannot measure absolute rates

equal
A
A slower
B
B slower
B
A
B
A
Molecular distance from A to B is the same in all cases
Molecular Clock-Controversy

Exceptional cases
– Guinea pig
– Human-Ape
– Rodents-primates
– Turtles
– Plants
Relative Rate Test
KAC = KOA + KOC
KBC = KOB + KOC
KAB = KOA + KOB
O
A
B
C
Relative Rate Test
KOA = (KAC + KAB -KBC)/2
KOB = (KAB + KBC -KAC)/2
KOC = (KAC + KBC -KAB)/2
O
A
B
C
Null (HO): KOA = KOB
Alt (Ha) : KOA  KOB
Local Clocks

Within lineages or similar groups
– Mice and rats

Humans and African Apes
– Slow-down in humans
Causes of rate variation

Replication dependent errors
– Generation time effect (germline
replications higher in rodents than
primates)
– Efficiency of DNA repair enzymes
– Metabolic rates
Causes of rate variation

Replication independent errors
– Metabolic rate
• Lower rates in poikilotherms than
endotherms
– Body size
– Natural history
Organelle DNA substitutions
Mitochondria
 Chloroplasts


Most uniparentally inherited
– Most maternal although some paternal
Mammalian mtDNA
17 kb, 13 protein encoding genes
 2 rDNA genes and 22 tRNA genes


Substitution rate is generally thought
to be higher than nuclear genes
Why?
A low fidelity of DNA replication
process
 An inefficient repair mechanism
 High concentration of mutagens (from
metabolic functions)
 Reduction in the intensity of selection

Nucleotide Substitution rates in
Eukaryotic Genomes
Genome
Angiosperm mt
Angiosperm cp
single copy
inverted Repeat
Angiosperm nuc.
Mammalian nuc.
Mammalian mt
Ks rate
Relative Ks rate
Ka rate
0.5
1
0.1
1.5
0.3
5.4
2-8
20-50
3
0.6
12
4-16
40-100
0.2
0.1
0.4
0.5-1.3
2-3
Estimated rate of substitutions/site/10 9 years.
From Palmer, 1991
Related documents