Download Translation elongation and codon usage

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Translation elongation, amino acid
usage, and codon usage indices
Xuhua Xia
[email protected]
http:// dambe.bio.uottawa.ca
Objectives
• Understand how amino acid and codon usage biases
affect translation efficiency and gene expression
• Biomedical and biopharmaceutical relevance
– Protein drug production in pharmaceutical industry
– Transgenic experiments in agriculture
• Factors affecting amino acid and codon usage bias
• Indices measuring codon usage bias
• Develop bioinformatic skills to study the genomic
codon usage.
Xuhua Xia
Slide 2
Energetic Cost
Amino acid
Ala
Cys
Asp
Glu
Phe
Gly
His
Ile
Lys
Leu
Met
Asn
Pro
Gln
Arg
Ser
Thr
Val
Trp
Tyr
1-letter code
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
Precursor metabolites
pyr
3pg
oaa
_kg
2 pep,eryP
3pg
penP
pyr, oaa
oaa, pyr
2 pyr, acCoA
oaa, Cys, _pyr
oaa
_kg
_kg
_kg
3pg
oaa
2 pyr
2 pep, eryP, PRPP, _pyr
eryP, 2 pep
~P
1.0
7.3
1.3
2.7
13.3
2.3
20.3
4.3
4.3
2.7
9.7
3.3
3.7
3.7
10.7
2.3
3.3
2.0
27.7
13.3
Energetic cost
H
Total ~P
5.3
11.7
8.7
24.7
5.7
12.7
6.3
15.3
19.3
52.0
4.7
11.7
9.0
38.3
14.0
32.3
13.0
30.3
12.3
27.3
12.3
34.3
5.7
14.7
8.3
20.3
6.3
16.3
8.3
27.3
4.7
11.7
7.7
18.7
10.7
23.3
23.3
74.3
18.3
50.0
Hiroshi Akashi and Takashi Gojobori 2002, PNAS 99:3695–3700
Xuhua Xia
Slide 3
Numerical Prediction
Number of copies
Prediction: Usage of energetically expensive (and also rare) amino acids
should decrease with gene expressionLarge ~P/Copy should be
associated with small NumCopy and small ~P/Copy should be
100 with large NumCopy.
associated
80
60
40
20
0
0
20
40
60
80
Energetic cost
Xuhua Xia
Slide 4
AA usage and tRNA abundance
Saccharomyces cerevisiae
Salmonella typhymurium
Xia, X. 1998. Genetics. 149: 37:44
Slide 5
AA usage and tRNA gene copies
A
1800
S2
AA Freq in 11 ssDNA coliphages
1600
G
T
1400
D
I
1200
F
1000
800
L2
V
K
L1
N
R2
Q
P
Y
E
600
H
S1 W
C
400
200
R1
y = 231.88x + 244.93
r = 0.8426
p<0.0001
0
0
1
2
3
4
5
Number of tRNA genes in E. coli
Chithambaram, S. et al. 2014. Genetics: 197:301-315
6
7
Summary of AA usage
• Energetic cost: mass-produced proteins should use
cheap amino acids.
• Translation efficiency: mass-produced proteins
should use abundant amino acids
• Much-used amino acids should have a more tRNA
(in gene copies and in abundance) to carry them than
little-used amino acids
Xuhua Xia
Slide 7
Codon Usage Bias
•
•
Observation: Strongly biased codon usage in a variety of species ranging from
viruses, mitochondria, plastids, prokaryotes and eukaryotes.
Hypotheses:
– Differential mutation hypothesis, e.g., Transcriptional hypothesis of codon usage (Xia
1996 Genetics 144:1309-1320 )
– Different selection hypothesis, e.g., (Xia 1998 Genetics 149: 37-44)
•
Predictions:
– From mutation hypothesis: Concordance between codon usage and mutation pressure
– From Selection hypothesis:
• Concordance between differential availability of tRNA and differential codon usage.
• The concordance is stronger in highly expressed genes than lowly expressed genes (CAI is
positively correlated with gene expression).
Gene 1
Polycistronic
mRNA
Ribosome
Protein
Gene 2
Gene 3
RNA
polymerase
GCC~tRNA~Gly
UCC~tRNA~Gly
UCC~tRNA~Gly
Xuhua Xia
UCC~tRNA~Gly
Slide 8
Codon usage of HEGs in yeast
Xuhua Xia
Xia 2007. Bioinformatics and the cell.
Slide 9
Major and minor codons
• Major codon: the codon in a synonymous codon family
that can be most efficiently translated in a species,
typically with three associated properties:
– it is over-represented in highly expressed genes relative to lowly
expressed genes.
– it corresponds to the most abundant tRNA
– replacing it with another codon leads to reduced translation
efficiency (reduced protein production)
• Minor codon is the opposite
• Their identification is NOT based on the codon
frequencies of all coding sequences in a species
• Different species may have different major and minor
codons in the same synonymous codon family.
Xuhua Xia
Slide 10
Calculation of RSCU
RSCU ij 
CodFreq j
 NumCodoni



CodFreq
i
 

j 1


NumCodoni
Codon
GCU
GCC
GCA
GCG
GAA
GAG
GGU
GGC
GGA
GGG
UUA
UUG
CUU
CUC
CUA
CUG
RSCU Ala 
52
 0.84
 52  91  103  2 
4
AA N RSCU Codon
Ala
52
0.84 CCU
Ala
91
1.47 CCC
Ala 103
1.66 CCA
Ala
2
0.03 CCG
Glu 78
1.64 CAA
Glu 17
0.36 CAG
Gly 29
0.53 CGU
Gly 62
1.13 CGC
Gly 97
1.77 CGA
Gly 31
0.57 CGG
Leu 110
1.11 AUA
Leu 16
0.16 AUG
Leu 62
0.62 UCU
Leu 95
0.95 UCC
Leu 285
2.86 UCA
Leu 29
0.29 UCG
AA N RSCU Codon
Pro 42
0.87 UAA
Pro 63
1.31 UAG
Pro 85
1.76 AGA
Pro
3
0.06 AGG
Gln 79
1.82 AAA
Gln
8
0.18 AAG
Arg
7
0.44 ACU
Arg 11
0.7 ACC
Arg 42
2.67 ACA
Arg
3
0.19 ACG
Met 218
1.66 UGA
Met 44
0.34 UGG
Ser 51
1.11 GUU
Ser 65
1.42 GUC
Ser 99
2.16 GUA
Ser
5
0.11 GUG
RSCU and proportion:
Different scaling.
AA N RSCU
*
8
3.2
*
1
0.4
*
1
0.4
*
0
0
Lys 90
1.78
Lys 11
0.22
Thr 44
0.57
Thr 96
1.25
Thr 153
1.99
Thr 15
0.19
Trp 92
1.77
Trp 12
0.23
Val 40
0.84
Val 48
1.01
Val 87
1.83
Val 15
0.32
RSCU (Sharp et al. 1986) is codon-specific
Xuhua Xia
Slide 11
Codon adaptation: E. coli & phage
RSCU ij 
Phage TLS RSCU
2
CodFreq j
 NumCodoni


CodFreqi 
 

j 1


NumCodoni
1.5
1
y = 0.4046x + 0.5954
2
R = 0.672
0.5
0
0.0
0.5
1.0
1.5
2.0
E. coli RSCU
2.5
3.0
3.5
Calculation of CAI
wij 

RefCodFreqij
RefCodFreqi.max
Codon
UGA
UAG
UAA
GCA
GCU
GCG
GCC
UGC
UGU
GAU
GAC
GAG
GAA
UUU
UUC
…
Xuhua Xia
AA
*
*
*
A
A
A
A
C
C
D
D
E
E
F
F
…
 N 2,3,4


[ CodFreqi ln( wi )] 
 i 1



N 2,3,4


CodFreqi


i 1


CAI  e
ObsFreq
0
0
0
1
15
0
8
3
3
9
11
11
14
3
9
…

RefCodFreq
6
4
16
195
322
81
242
123
112
69
40
289
335
118
213
N2,3,4: Number of 2-, 3-, 4-fold codon families
e
w
0.375
0.250
1.000
0.606
1.000
0.252
0.752
1.000
0.911
1.000
0.580
0.863
1.000
0.554
1.000
…
 1*ln(0.606) 15*ln(1) 8*ln(0.752) ... 


1158...


Compound 6- or 8-fold codon families
should be broken into two codon families
CAI is gene-specific.
0  CAI  1
CAI computed with different reference
sets are not comparable.
Problem with computing w as Fi/Fi.max:
Suppose an amino acid is rarely used in
highly expressed genes, then there is little
selection on it, and the codon usage might
be close to even, with wi  1. Now if we
have a lowly expressed gene that happen
to be made of entire of this amino acid,
then the CAI for this lowly expressed gene
would be 1, which is misleading.
There has been no good alternative.
Further research is needed.
Slide 13
Weak mRNA predictive power
80
Protein abundance
70
y = 5.6507x + 4.1367
R2 = 0.1936
60
50
ENO1
40
30
20
10
FRS2
0
0.5
1.5
2.5
3.5
4.5
mRNA abundance
Xuhua Xia
Slide 14
Effect of Codon Usage Bias
80
Protein abundance
70
y = 70.398x - 11.739
60
R 2 = 0.5668
50
40
ENO1
30
20
FRS2
10
0
0.05
0.25
0.45
0.65
0.85
Codon usage bias
Xuhua Xia
Slide 15
Hypothesis and Predictions
Met
Leu
Glu
Lys
Gln
Arg
Trp
tRNAMet/CAU
tRNALeu/UAA
tRNAGlu/UUC
tRNALys/UUU
tRNAGln/UUG
tRNAArg/UCU
tRNATrp/UCA
AUG
UUG
GAG
AAG
CAG
AGG
UGG
AUA
UUA
GAA
AAA
CAA
AGA
UGA
AUA is favoured by
mutation, but not by
tRNA-mediated
selection
A-ending codons are favoured by both
mutation and tRNA-mediated selection.
Predictions:
1. Proportion of A-ending codons (PNNA =
NNNA/NNNG) or RSCU should be smaller in the Met
codon family than in other R-ending codon families:
2. Availability of tRNAMet/UAU should increase PAUA.
Xuhua Xia
Xia et al. 2007
Testing prediction 1
Met
Leu
Glu
Lys
Gln
Arg
Trp
Species
AUA UUA GAA AAA
CAA AGA UGA
A. gossypii
1.473 1.993 1.826 1.852 1.917
2
2
C. glabrata
1.043 1.995 2.000 1.938 1.889
2
2
K. thermotolerans
0.556 1.973 1.910 1.948 1.945
2 1.967
S. cerevisiae
1.140 1.969 1.800 1.883 1.794 1.947 1.908
S. castelli
1.299 1.994 1.891 1.981 1.969
S. servazzii
1.321 1.931 1.702 1.824 1.841 1.959
Y. lipolytica
1.440 1.968 1.536 1.859 1.963 1.922 1.882
2 1.918
2
Carullo, M. and Xia, X. 2008 J Mol Evol 66:484–493.
Xuhua Xia
Slide 17
Testing prediction 2
(a)
80
PAUA
70
60
50
40
30
30
40
50
60
70
80
PUUA
0.95
(b)
0.85
PAUA
0.75
0.65
0.55
0.45
0.35
0.25
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
PUUA
Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a
tRNAMet/CAU and a tRNAMet/UAU are present than when only tRNAMet/CAU is present in the mtDNA, for bivalve
species (a) and chordate species (b). The filled squares are for mtDNA containing both tRNA Met/CAU and
tRNAMet/UAU genes, and the open triangles are for mtDNA without a tRNAMet/UAU gene.
Why a systems biology perspective?
No aphorism is more frequently repeated in connection
with field trials, than that we must ask Nature few
questions, or ideally, one question at a time. The writer is
convinced that this view is wholly mistaken. Nature, he
suggests, will respond to a logical and carefully thoughtout questionnaire; indeed, if we ask her a single question,
she will often refuse to answer until some other topic has
been discussed.
--Ronald A. Fisher (1926). Journal of the
Ministry of Agriculture of Great Britain 33:
503–513
Simpson’s paradox
Treatment A
Treatment B
Small Stones
93% (81/87)
87% (234/270)
Large Stones
73% (192/263)
69% (55/80)
Pooled
78% (273/350)
83% (289/350)
C. R. Charig et al. 1986. Br Med J (Clin Res Ed) 292 (6524): 879–882
Treatment A: all open procedures
Treatment B: percutaneous nephrolithotomy
Question: which treatment is better?
RSCU (HIV-1 vs Human)
2.5
V
2
RSCU (HIV-1)
R
S
A
I
1.5
L
E
K
L
(a)
G
P
T
A-ending
C-ending
G-ending
R
Q
1
U-ending
0.5
Fig. 1. Relative synonymous
codon usage (RSCU) of HIV1 compared to RSCU of
highly expressed human
genes. Data points for codons
ending with A, C, G or U are
annotated with different
combinations of colors and
symbols. A-ending codons
exhibit strong discordance in
their usage between HIV-1
and human and are annotated
with their coded amino acids.
0
0
0.5
1
1.5
2
2.5
RSCU (Human)
Xuhua Xia
van Weringh et al. 2011. MBE.
Slide 21
RSCU (HTLV-1 vs Human)
3
RSCU (HTLV-1)
2.5
2
A-ending
C-ending
1.5
G-ending
U-ending
1
0.5
0
0
0.5
1
1.5
2
2.5
RSCU (Human)
Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human
genes. Data points for codons ending with A, C, G or U are annotated with different combinations of
colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and
human and are annotated with their coded amino acids.
Xuhua Xia
Slide 22
Differential adaptation: early & late genes
Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three HIV-1
early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding sequences (CDS).
Gene
CDS (bp)
CAI
tat
261
0.66875
rev
351
0.66211
nef
621
0.67523
gag
1503
0.62784
pol
3012
0.58139
vif
579
0.61941
vpr
291
0.64272
vpu
249
0.49068
env
2571
0.61924
Any problem with the mutation hypothesis?
Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three
HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding
sequences (CDS).
Gene
CDS (bp)
CAI
tat
261
0.66875
rev
351
0.66211
nef
621
0.67523
gag
1503
0.62784
pol
3012
0.58139
vif
579
0.61941
vpr
291
0.64272
vpu
249
0.49068
env
2571
0.61924
van Weringh et al. 2011. Molecular Biology and Evolution 28:1827-1834.
CAI values may change depending
on what reference set of highly
expressed genes is used, but the
relative magnitude should be
maintained (unless the reference
set is not of highly expressed
genes)
tRNA
Table 1. Relationship between codon usage measured by RSCU for human and HIV-1 (RSCUHum and
RSCUHIV) and packaging of host tRNA by HIV-1. Rank(Icodon) and Rank(ItRNA) are significantly and
positively correlated (r = 0.5780, p = 0.0304).
AA(Codon) RSCUHum RSCUHIV Rank(Icodon) tRNAHIV(1) tRNAGagVLP(2) Rank(ItRNA)
Arg(AGA)
0.97
1.44
8
0.0494
0.0277
4
Arg(AGG)
1.03
0.56
4
0.0660
0.0544
2
Ile(AUA)
0.24
1.59
14
1.3397
0.0614
14
Ile(AUY)
2.64
1.41
3
0.2672
0.1371
5
Leu(UUA)
0.68
1.38
11
0.0900
0.0374
8
Leu(UUG)
1.32
0.62
1
0.0450
0.0496
1
Lys(AAA)
0.76
1.27
9
0.6405
0.0340
13
Lys(AAG)
1.24
0.73
5
1.0081
0.0601
12
Gly(GGA)
0.93
2.08
12
0.0708
0.0290
9
Gly(GGB)
3.07
1.92
6
0.2016
0.0486
10
Val(GUA)
0.39
2.08
13
0.0662
0.0282
7
Val(GUB)
3.61
1.92
2
0.0739
0.0485
3
Thr(ACA)
0.97
1.94
10
0.0481
0.0215
6
Thr(ACB)
3.03
2.06
7
0.2522
0.0347
11
(1) tRNAHIV: the relative tRNA abundance of HIV-1 virion versus human HEK293T cells
(2) tRNAGagVLP: the relative tRNA abundance of Gag viral-like particles (GagVLP) versus human
HEK293T cells.
van Weringh et al. 2011. MBE.
 RSCU i. HIV 1 

 RSCUi.Human 
Icodon.i  log 2 
Xuhua Xia
 tRNAi.HIV 1 
Ι tRNA.i  log 2 
 tRNAi.GagVLP 


Slide 26
I/A wobble pair is error-prone
NH
A/U pair
O
I/C wobble pair
2
N
N
H
N
H2
O
N
N
N
N
N
N
N
C
H
N
C
C
C
O
N
N
O
G/C pair
N
H 2
N
O
O
G/U (or I/U) wobble pair
N
N
N
N
C
O
H N
H
N
N
N
N
C
N
I/A wobble pair
H
N
N
H2
O
N
C
O
H
2
O
N
N
N
H
2
N
N
N
N
C
H
C
N
N
Anticodon: 3’ CIIIGIIII 5’
5’ AUG ...... G1 C2C3...... UAA 3’
C
Translation rate & codon adaptation
Kudla et al. (2009, Science) engineered a synthetic library of 154 genes, all encoding the same protein
but differing in degrees of codon adaptation, to quantify the effect of differential codon usage on protein
production in E. coli. They concluded that “codon bias did not correlate with gene expression” and that
“translation initiation, not elongation, is rate-limiting for gene expression”
y = 2036.3x + 3020.8
R² = 0.0052
p = 0.3746
10000
Protein abundance
8000
6000
4000
2000
0
0.3
0.35
0.4
0.45
0.5
0.55
0.6
Codon adaptation index (CAI)
0.65
0.7
0.75
0.8
Slide 28 of x
Problem with CAI and a new ITE
AA
A
A
Codon
GCA
GCG
Cfnon-HEG
20
80
CFHEG
40
60
tRNA
3
Identification of major and minor codons
CAI
ITE
AA Codon
A GCA
A GCG
CFnon-HEG
20
80
CFHEG
40
60
w
2/3
1
pHEG
0.4
0.6
pnon-HEG
0.2
0.8
s
2
0.75
w
1
0.375
AA Codon
A GCA
A GCG
CFnon-HEG
50
50
CFHEG
40
60
w
2/3
1
pHEG
0.4
0.6
pnon-HEG
0.5
0.5
s
0.8
1.2
w
2/3
1
CAI is a special case of ITE (when there is no background codon usage bias)
Xuhua Xia
Slide 29
Problem with CAI and a new ITE
AA Codon
A GCA
A GCG
𝐶𝐴𝐼 = 𝑒
𝐹𝑖 ln(𝑤𝑖 )
𝐹𝑖
AA Codon
A GCA
A GCG
𝐼𝑇𝐸 = 𝑒
Xuhua Xia
CFnon-HEG
20
80
w
2/3
1
Gene1
10
40
Gene2
20
30
CAI1 = 0.9221; CAI2 = 0.8503
Wrong conclusions:
1. Excellent codon adaptation in the codon family (high CAI values)
2. Gene 1 has better codon adaptation than Gene2.
CFnon-HEG CFHEG
20
40
80
60
𝐹𝑖 ln(𝑤𝑖 )
𝐹𝑖
CFHEG
40
60
s
w Gene1 Gene2
pHEG pnon-HEG
0.4
0.2
2
1
10
20
0.6
0.8 0.75 0.375
40
30
ITE.1 = 0.4563;ITE.2 = 0.5552
Correct conclusions:
1. Poor codon adaptation in the codon family (low ITE values)
2. Gene 2 has better codon adaptation than Gene1.
Slide 30
160
MFE4: (-6,-3.5)
y = 231.34x - 60.847
R² = 0.1814
p = 0.0077
MFE3: (-8.7,-6.2)
y = 263.87x - 103.77
R² = 0.1686
p = 0.0069
140
Ranked protein abundance (rProt)
120
100
MFE2: (-10.9, -9)
y = 216.6x - 105.89
R² = 0.1509
p = 0.0132
80
60
MFE1: (-15.3 -11)
y = 67.545x - 13.875
R² = 0.0203
p = 0.4213
40
20
0
0.6
0.65
0.7
0.75
0.8
Index of Translation Elongation (ITE)
0.85
0.9
Related documents