Download Relationship between expression amount and codon usage bias

研習報告與討論姓名：周逸民學號：891604 研習地點：李文雄實驗室, University of Chicago Codon Usage Bias Differences and Expression Amount Differences in Yeast Duplicated Genes Yi-Ming Chou*†, I-Fan Tsai*†, Zhenglong Gu§ and Wen-Hsiung Li§ *Department of Life Science, National Tsing-Hua University, Hsinchu §Department of Ecology and Evolution, University of Chicago †These authors contributed equally to this work. Abstract We examined the relationship between codon usage biases, expression and Ks data from the yeast 2 gene family, multiple gene family and big gene family. We found that in the gene pairs of yeast 2 gene & multiple gene family, if one gene in the pair has a significant large codon usage bias, then the codon usage bias difference between this pair of gene would tend to be large, which means that if one of the duplicated important gene is somehow mutated, it will be relaxed from the codon usage constrains. Another thing we have found is that in yeast 2 gene & multiple gene family, if we group the genes according to its Ks value (Ks≤0.75, 0.75<Ks≤1.5, Ks>1.5), the average/median codon usage difference of the grouped pairs tend to increase while the Ks of the pairs increase. We also found that in each yeast big gene family, 2 gene family and all gene family, the codon usage bias generally increased as expected while expression amount increased. Another examination was done on the yeast 2 gene family, as we looked at the differences of codon usage bias and expression amount instead of the raw data, we generally found that the expression amount differences are proportional to codon usage bias difference. But this relationship is not definite since high expression difference may not affect codon usage bias. Introduction There are many duplicated genes in specie yeast, which is called a gene family. And it’s known that higher expression rate will generally cause the codon usage bias to be higher, also it has been reported that highly expressed genes evolved slowly. So we want to see that if there is any other relationship among different yeast genomic data, say codon usage bias difference, expression amount difference and sequential difference in yeast gene families. We suppose that the if a highly constrained gene was duplicated, then the new copy of that gene might not need to maintain that strong codon usage bias, since there are another copy functioning fine. So it’s constrains might be released by mutation. Thus, the expression amount of the released gene copy might also drops. Materials & Methods Data sets All the data sets come from the yeast strain S. cerevisiae and were downloaded from Internet databases. The gene expression data is measured by an Affymetrix microarray and the sequence data comes from NCBI GenBank database. The sequence data is further processed into Ks and Kas. The codon usage bias indices used here are CAI(codon adaptation index) and ENC(effective number of codons). These data are also calculated from statistic data of the yeast genome on the Internet. The gene families studied were the yeast 2 gene family which have only two genes in a single family, all gene family with 50% or greater similarities between the gene pairs and big gene family, which have more then 5 genes in a single gene family. Data normalization The expression rate difference values could range from very low (~100) to very high (~104), thus some median differences might be significant changes if the original expression rate is very low or zero, but on the other hand, it might also be non meaningful at all, since a median expression difference could be only a measurement error or a natural fluctuation to highly expressed genes. So we must apply a data normalization process. At here we only use a very simple way: we divide the blunt difference by the average of the expression data of the gene pair to generate a “normalized” difference data, which has a maximal of 2 and minimal of 0(if fetches all values as absolute value). Ks values When one comes to analyze the Ks values, one often groups them into three: one has values smaller then 0.75, one has values between 0.75 and 1.5, and one has values larger then 1.5. In the tests we have done here, ones that involves Ks values all have its genes categorized into three groups following the Ks criteria described above, and the grouped data are represented by both group average and group median. Graph We simply plot the data points as x-y relation graph and histograms to see whether there is a relationship within those values. Significant test ant other statistical data The p values (significant test) and all the other statistic data were obtained by functions in Matlab build-in statistic toolbox. Results Frequency distribution The occurrence frequency distribution of codon usage bias difference and expression difference for 2 gene family and all gene family have shown that the frequency generally decreases with the increment of differences. This is very trivial since small deviation is more likely to occur then large differences. Fig.1 The frequency distribution of differences in codon usage bias and expression data of the yeast 2 gene family (left) and all gene family (right). Generally the occurrence frequency decreases as the difference becomes bigger. Relationship between codon usage bias and codon usage bias difference We can see from the graph that the codon usage bias difference is larger when one of the genes in the gene pair has a larger codon usage bias. This phenomenon could be observed both in the yeast 2 gene family and all gene family. We think this is because once the highly biased gene (usually an important gene which is highly expressed) is duplicated, if it is not advantageous nor deleterious, then one of the pair is very likely to be further mutated, since another mutation would not bring disadvantages to the organism. So we suppose that if one in a gene pair has a rather high expression amount, that gene pair tends to have a larger difference in codon usage bias. Fig.2 The positive correlation of codon usage bias and its difference of yeast 2 gene family (left) and all gene family (right). The codon usage bias value here is chosen as the larger one in a gene pair. Relationship between codon usage bias difference and gene distance We found that in both 2 gene family and all gene family, the codon usage bias difference generally increases with the Ks value, which stands for the distance between the gene pair. This is also a trivial idea, since if the two genes are more different in its sequence, the more it might changed in its codon usage bias. Fig.3 The positive relationship of grouped Ks to grouped codon usage bias difference for yeast 2 gene family (left) and all gene family (right). Relationship between expression amount and codon usage bias We examined the relationship between expression data and codon bias data in yeast big gene families and 2 gene families. It agreed with other observations as the bias increased with the expression amount. We have also done a test on expression amount differences and codon usage bias differences, which also exert a positive correlation. The codon usage bias usually varies with expression amount changes in a gene pair in positive correlated manner. And if we look at the normalized codon usage bias difference to expression difference graph, we will found that highly expressed genes tend to have higher codon usage bias changes while its expression amount changes then those lower expressed genes. Fig.4 The positive correlation between codon usage bias and expression amount of yeast big gene family (upper), 2 gene family (lower left) and all gene family (lower right) Fig.5 The positive correlation graph for codon usage differences to expression differences of the yeast 2 gene family. Non-normalized data is on the left and normalized data on the right. Fig.6 The positive relation plot for codon usage differences to expression differences of the yeast all gene family. Non-normalized data is on the left and normalized data on the right. Discussion We generally found that our assumptions are correct. The higher codon usage bias of one gene in a pair, the higher bias difference might have. And a positive correlation also lies between expression difference and codon usage bias difference. The “highly expressed genes have more constrains” prediction has been proven again. We have also found that the codon usage bias difference has a positive correlation with Ks values, which means closely related genes have a similar codon usage bias. Acknowledgement We especially thanks to Zhenglong Gu, who contributed most to these tests. And we’d also like to thank to Wen-Hsiung Li, who was very concerned and been supportive to what we are doing during our visiting.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Relationship between expression amount and codon usage bias