Download Molecular Evolution of Pasteurella multocida During

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Codon usage bias
Ref: Chapter 9
Xuhua Xia
[email protected]
http:// dambe.bio.uottawa.ca
Objectives
• Understand how codon usage bias affect translation
efficiency and gene expression
• Biomedical relevance
– Protein drugs in pharmaceutical industry
– Transgenic experiments in agriculture
• Factors affecting codon usage bias
• Indices measuring codon usage bias
• Develop bioinformatic skills to study the genomic
codon usage.
Xuhua Xia
Slide 2
Codon Usage Bias
•
•
Observation: Strongly biased codon usage in a variety of species ranging from
viruses, mitochondria, plastids, prokaryotes and eukaryotes.
Hypotheses:
– Differential mutation hypothesis, e.g., Transcriptional hypothesis of codon usage (Xia
1996 Genetics 144:1309-1320 )
– Different selection hypothesis, e.g., (Xia 1998 Genetics 149: 37-44)
•
Predictions:
– From mutation hypothesis: Concordance between codon usage and mutation pressure
– From Selection hypothesis:
• Concordance between differential availability of tRNA and differential codon usage.
• The concordance is stronger in highly expressed genes than lowly expressed genes (CAI is
positively correlated with gene expression).
Gene 1
Polycistronic
mRNA
Ribosome
Protein
Gene 2
Gene 3
RNA
polymerase
GCC~tRNA~Gly
UCC~tRNA~Gly
UCC~tRNA~Gly
Xuhua Xia
UCC~tRNA~Gly
Slide 3
Codon usage of HEGs in yeast
AA(1)
Arg
Arg
Asn
Asn
Asp
Asp
Cys
Cys
Gln
Gln
Glu
Glu
His
His
Leu
Leu
Lys
Lys
Phe
Phe
Ser
Ser
Tyr
Tyr
Xuhua Xia
Codon(2)
AGA
AGG
AAC
AAU
GAC
GAU
UGC
UGU
CAA
CAG
GAA
GAG
CAC
CAU
UUA
UUG
AAA
AAG
UUC
UUU
AGC
AGU
UAC
UAU
T(3)
11
1
10
0
16
0
4
0
9
1
14
2
7
0
7
10
7
14
10
0
2
0
8
0
Xia 2007. Bioinformatics and the cell.
w(4)
1
0.091
1
0
1
0
1
0
1
0.111
1
0.143
1
0
0.7
1
0.5
1
1
0
1
0
1
0
F(5)
314
1
208
11
202
112
3
39
153
1
305
5
102
25
42
359
65
483
168
19
6
4
141
10
Slide 4
Calculation of RSCU
RSCU ij 
CodFreq j
 NumCodoni



CodFreq
i
 

j 1


NumCodoni
Codon
GCU
GCC
GCA
GCG
GAA
GAG
GGU
GGC
GGA
GGG
UUA
UUG
CUU
CUC
CUA
CUG
RSCU Ala 
52
 0.84
 52  91  103  2 
4
AA N RSCU Codon
Ala
52
0.84 CCU
Ala
91
1.47 CCC
Ala 103
1.66 CCA
Ala
2
0.03 CCG
Glu 78
1.64 CAA
Glu 17
0.36 CAG
Gly 29
0.53 CGU
Gly 62
1.13 CGC
Gly 97
1.77 CGA
Gly 31
0.57 CGG
Leu 110
1.11 AUA
Leu 16
0.16 AUG
Leu 62
0.62 UCU
Leu 95
0.95 UCC
Leu 285
2.86 UCA
Leu 29
0.29 UCG
AA N RSCU Codon
Pro 42
0.87 UAA
Pro 63
1.31 UAG
Pro 85
1.76 AGA
Pro
3
0.06 AGG
Gln 79
1.82 AAA
Gln
8
0.18 AAG
Arg
7
0.44 ACU
Arg 11
0.7 ACC
Arg 42
2.67 ACA
Arg
3
0.19 ACG
Met 218
1.66 UGA
Met 44
0.34 UGG
Ser 51
1.11 GUU
Ser 65
1.42 GUC
Ser 99
2.16 GUA
Ser
5
0.11 GUG
RSCU and proportion:
Different scaling.
AA N RSCU
*
8
3.2
*
1
0.4
*
1
0.4
*
0
0
Lys 90
1.78
Lys 11
0.22
Thr 44
0.57
Thr 96
1.25
Thr 153
1.99
Thr 15
0.19
Trp 92
1.77
Trp 12
0.23
Val 40
0.84
Val 48
1.01
Val 87
1.83
Val 15
0.32
RSCU (Sharp et al. 1986) is codon-specific
Xuhua Xia
Slide 5
RSCU (HIV-1 vs Human)
2.5
V
2
RSCU (HIV-1)
R
S
A
I
1.5
L
E
K
L
(a)
G
P
T
A-ending
C-ending
G-ending
R
Q
1
U-ending
0.5
Fig. 1. Relative synonymous
codon usage (RSCU) of HIV1 compared to RSCU of
highly expressed human
genes. Data points for codons
ending with A, C, G or U are
annotated with different
combinations of colors and
symbols. A-ending codons
exhibit strong discordance in
their usage between HIV-1
and human and are annotated
with their coded amino acids.
0
0
0.5
1
1.5
2
2.5
RSCU (Human)
Xuhua Xia
van Weringh et al. 2011. MBE.
Slide 6
RSCU (HTLV-1 vs Human)
3
RSCU (HTLV-1)
2.5
2
A-ending
C-ending
1.5
G-ending
U-ending
1
0.5
0
0
0.5
1
1.5
2
2.5
RSCU (Human)
Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human
genes. Data points for codons ending with A, C, G or U are annotated with different combinations of
colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and
human and are annotated with their coded amino acids.
Xuhua Xia
Slide 7
Calculation of CAI
wij 

RefCodFreqij
RefCodFreqi.max
Codon
UGA
UAG
UAA
GCA
GCU
GCG
GCC
UGC
UGU
GAU
GAC
GAG
GAA
UUU
UUC
…
Xuhua Xia
AA
*
*
*
A
A
A
A
C
C
D
D
E
E
F
F
…
 N 2,3,4


[ CodFreqi ln( wi )] 
 i 1



N 2,3,4


CodFreqi


i 1


CAI  e
ObsFreq
0
0
0
1
15
0
8
3
3
9
11
11
14
3
9
…

RefCodFreq
6
4
16
195
322
81
242
123
112
69
40
289
335
118
213
N2,3,4: Number of 2-, 3-, 4-fold codon families
e
w
0.375
0.250
1.000
0.606
1.000
0.252
0.752
1.000
0.911
1.000
0.580
0.863
1.000
0.554
1.000
…
 1*ln(0.606) 15*ln(1) 8*ln(0.752) ... 


1158...


Compound 6- or 8-fold codon families
should be broken into two codon families
CAI is gene-specific.
0  CAI  1
CAI computed with different reference
sets are not comparable.
Problem with computing w as Fi/Fi.max:
Suppose an amino acid is rarely used in
highly expressed genes, then there is little
selection on it, and the codon usage might
be close to even, with wi  1. Now if we
have a lowly expressed gene that happen
to be made of entire of this amino acid,
then the CAI for this lowly expressed gene
would be 1, which is misleading.
There has been no good alternative.
Further research is needed.
Slide 8
Weak mRNA predictive power
80
Protein abundance
70
y = 5.6507x + 4.1367
R2 = 0.1936
60
50
ENO1
40
30
20
10
FRS2
0
0.5
1.5
2.5
3.5
4.5
mRNA abundance
Xuhua Xia
Slide 9
Effect of Codon Usage Bias
80
Protein abundance
70
y = 70.398x - 11.739
60
R 2 = 0.5668
50
40
ENO1
30
20
FRS2
10
0
0.05
0.25
0.45
0.65
0.85
Codon usage bias
Xuhua Xia
Slide 10
Any problem with the mutation hypothesis?
Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three
HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding
sequences (CDS).
Gene
CDS (bp)
CAI
tat
261
0.66875
rev
351
0.66211
nef
621
0.67523
gag
1503
0.62784
pol
3012
0.58139
vif
579
0.61941
vpr
291
0.64272
vpu
249
0.49068
env
2571
0.61924
van Weringh et al. 2011. MBE.
Problem with CAI and a new ITE
AA
A
A
Codon
GCA
GCG
Cfnon-HEG
20
80
CFHEG
40
60
tRNA
3
CAI
ITE
AA Codon
A GCA
A GCG
CFnon-HEG
20
80
CFHEG
40
60
w
2/3
1
pHEG
0.4
0.6
pnon-HEG
0.2
0.8
s
2
0.75
w
1
0.375
AA Codon
A GCA
A GCG
CFnon-HEG
50
50
CFHEG
40
60
w
2/3
1
pHEG
0.4
0.6
pnon-HEG
0.5
0.5
s
0.2
0.3
w
2/3
1
CAI is a special case of ITE (when there is no background codon usage bias)
Xuhua Xia
Slide 12
Problem with CAI and a new ITE
AA Codon
A GCA
A GCG
𝐶𝐴𝐼 = 𝑒
𝐹𝑖 ln(𝑤𝑖 )
𝐹𝑖
AA Codon
A GCA
A GCG
𝐼𝑇𝐸 = 𝑒
Xuhua Xia
CFnon-HEG
20
80
w
2/3
1
Gene1
10
40
Gene2
20
30
CAI1 = 0.9221; CAI2 = 0.8503
Wrong conclusions:
1. Excellent codon adaptation in the codon family (high CAI values)
2. Gene 1 has better codon adaptation than Gene2.
CFnon-HEG CFHEG
20
40
80
60
𝐹𝑖 ln(𝑤𝑖 )
𝐹𝑖
CFHEG
40
60
s
w Gene1 Gene2
pHEG pnon-HEG
0.4
0.2
2
1
10
20
0.6
0.8 0.75 0.375
40
30
ITE.1 = 0.4563;ITE.2 = 0.5552
Correct conclusions:
1. Poor codon adaptation in the codon family (low ITE values)
2. Gene 2 has better codon adaptation than Gene1.
Slide 13
Problem with CAI and a new ITE
AA
A
A
Codon
GCA
GCG
CFOther
25511
43261
CFHEG
1973
2654
tRNA
3
CAI
ITE
AA Codon
A GCA
A GCG
CFOther
25511
43261
CFHEG
1973
2654
w
0.7434
1
pHEG
0.4264
0.5736
pOther
0.3710
0.6290
s
1.1495
0.9118
w
1
0.7932
AA Codon
A GCA
A GCG
CFOther
25511
25511
CFHEG
1973
2654
w
0.7434
1
pHEG
0.4264
0.5736
pOther
0.5
0.5
s
0.8528
1.1472
w
0.7434
1
CAI is a special case of ITE (when there is no background codon usage bias)
Xuhua Xia
Slide 14
Contrast between CAI and ITE
y = 2036.3x + 3020.8
R² = 0.0052
p = 0.3746
10000
Protein abundance
Kudla et al. (2009) engineered a synthetic
library of 154 genes, all encoding the same
protein but differing in degrees of codon
adaptation, to quantify the effect of
differential codon usage on protein
production in E. coli. They concluded that
“codon bias did not correlate with gene
expression” and that “translation initiation,
not elongation, is rate-limiting for gene
expression”
8000
6000
4000
2000
0
Protein production
10000
y = 10855x + 4255.3
R² = 0.093
p = 0.0001
8000
6000
4000
2000
0
-0.24
-0.14
-0.04
0.06
Index of translation elongation (ITE)
0.3
0.4
0.5
0.6
0.7
Codon adaptation index (CAI)
ITE reveals that
1) Low protein production with low
ITE, regardless of translation
initiation efficiency
2) If translation initiation is efficient,
protein production increases with
ITE.
Slide 15 of x
0.8
Hypothesis and Predictions
Met
Leu
Glu
Lys
Gln
Arg
Trp
tRNAMet/CAU
tRNALeu/UAA
tRNAGlu/UUC
tRNALys/UUU
tRNAGln/UUG
tRNAArg/UCU
tRNATrp/UCA
AUG
UUG
GAG
AAG
CAG
AGG
UGG
AUA
UUA
GAA
AAA
CAA
AGA
UGA
AUA is favoured by
mutation, but not by
tRNA-mediated
selection
A-ending codons are favoured by both
mutation and tRNA-mediated selection.
Predictions:
1. Proportion of A-ending codons (or RSCU)
should be smaller in the Met codon family
than in other R-ending codon families:
PNNA = NNNA/NNNG
Xuhua Xia
2. Availability of tRNAMet/UAU should increase
PAUA.
Xia et al. 2007
Testing prediction 1
Met
Leu
Glu
Lys
Gln
Arg
Trp
Species
AUA UUA GAA AAA
CAA AGA UGA
A. gossypii
1.473 1.993 1.826 1.852 1.917
2
2
C. glabrata
1.043 1.995 2.000 1.938 1.889
2
2
K. thermotolerans
0.556 1.973 1.910 1.948 1.945
2 1.967
S. cerevisiae
1.140 1.969 1.800 1.883 1.794 1.947 1.908
S. castelli
1.299 1.994 1.891 1.981 1.969
S. servazzii
1.321 1.931 1.702 1.824 1.841 1.959
Y. lipolytica
1.440 1.968 1.536 1.859 1.963 1.922 1.882
2 1.918
2
Carullo, M. and Xia, X. 2008 J Mol Evol 66:484–493.
Xuhua Xia
Slide 17
Testing prediction 2
(a)
80
PAUA
70
60
50
40
30
30
40
50
60
70
80
PUUA
0.95
(b)
0.85
PAUA
0.75
0.65
0.55
0.45
0.35
0.25
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
PUUA
Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a
tRNAMet/CAU and a tRNAMet/UAU are present than when only tRNAMet/CAU is present in the mtDNA, for bivalve
species (a) and chordate species (b). The filled squares are for mtDNA containing both tRNA Met/CAU and
tRNAMet/UAU genes, and the open triangles are for mtDNA without a tRNAMet/UAU gene.
Xia, X. 2012. In: RS Singh et al.. Evolution in the fast lane: Rapidly evolving genes and genetic systems. Oxford University Press.
Related documents