Download Disentangling factors of gene expression regulation in human

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein structure prediction wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Degradomics wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

List of types of proteins wikipedia , lookup

Proteomics wikipedia , lookup

Transcript
Disentangling factors of gene expression regulation in human tissues and cells by transcriptomics and proteomics approaches Advisors: Alvis Brazma (EBI), Juan Antonio Vizcaino (EBI), Jyoti Choudhary (Sanger) The central dogma of molecular biology states that genes encoded in DNA are transcribed to mRNA and then translated to proteins. Thus the abundance of a particular protein species in a cell can be regulated both at transcription and translation stage. The relative contribution of each of these steps has been a long‐standing subject of research and discussions in the literature, however the conclusions of different studies have often been contradictory, the extent of reported correlations between mRNA and protein abundances have been widely varying. These studies have suffered from the lack of systematic datasets spanning a sufficient number of different tissues, cell types or conditions – usually either only a rather small number of conditions have been considered, or different samples were used in the transcriptomic and proteomics assays. The situation has changed recently with the ability to generate large‐scale datasets through RNA sequencing (RNAseq) and mass spectrometry, many of which are already available in data archives such as PRIDE and ArrayExpress. The applicants have analysed some of the existing datasets and identified sets of genes where transcript and protein abundances across many samples were strongly correlating or on the contrary – anti‐correlating consistently in independent datasets, however the reason for these difference are still to be uncovered. Additionally, in higher eukaryotes the same gene can potentially lead to a number of different transcripts through alternative splicing. We have recently demonstrated that in human tissues most genes express only one transcript to significantly higher levels than other isoforms of the same gene [Porta et al, 2013] and we have preliminary small‐scale proteomics data evidence that this dominant transcript is also likely to be predominantly translated. In some cases the dominant isoforms are tissue specific. Thus there are at least three levels of gene expression regulation – transcription, transcript processing, and translation. We have also looked for expression quantitative traits on RNA and protein level to understand the contribution of the sequence variation to protein abundance regulation. The preliminary results have raised many new questions, but have also outlined the way forward. For instance, using a transcript based search space in proteomics would circumvent current practice of peptide apportioning or protein grouping and will also enhance quantitative analysis. Also, our finding that for a relatively small number of genes there is a clear switch from one dominant alternative transcript in one tissue to a different one in another tissue, will enable the comparison of transcript and proteome data on isoform level. In proposed project we will focus on systematically disentangling the factors that determine the protein abundance on gene and isoform level in human cells and tissues by using high throughput technologies and data analysis. The applicants have a track record of addressing these questions both by generating new data and analysing existing datasets, as well as other relevant collaborations (see for instance [Petryszak et al, 2015]). We believe that given the advances in high throughput technology and the increasing availability of relevant datasets, this is the right time for a dedicated project to improve our understanding of this fundamental biology question of relationships between transcript and protein abundance. We plan to generate new proteomics data and use these in combination with the existing large datasets to address this question. Given our complementary expertise we are perfectly placed to make such a project successful. We will particularly focus on datasets generated by differentiation of human iPS cells, as well as primary cells recovered by FACs sorting for which full complement of genomics data are, transcriptomics and proteomics data are captured. We will identify the most pronounced switch events between different cell types or conditions and compare these on transcript and protein level. For the transcripts where the dominant switch‐over is identified we will identify peptides that distinguish between the dominant forms either individually or in combination. As there are not many cases where a single peptide can distinguish between the isoforms unambiguously, we will use Bayesian methods based on sets of peptides. These will enable the development of targeted proteomics assays. We will develop systematic approaches to measure and develop methods for targeted proteomics at the Sanger Proteomics mass spectrometry group. Additionally we will study the variation in expression in transcript and protein level across biological specimens, the effect of personal variation on gene and transcript expression. The fellow will have an opportunity to pursue both experimental and computational work, the emphasis will be on developing new methods as well as on functional validation. While on one hand the fellow will develop novel methods for data analysis devising effective ways to bring together the transcriptome and proteome analysis, on the other hand these methods will be applied to dissect the levels of gene expression regulations. This may lead to an integrated proteomics/ transcriptomics/genomics data analysis pipeline, which is likely to have numerous applications, including the data analysis in the PRIDE database. However, possibly most interesting outcome of the project will be the cases where transcript and protein levels do not correlate, or where the transcript and protein isoforms differ. We will try to validate these experimentally and to look for mechanisms explain these, for instance through RNA or protein stability, or additional regulatory mechanisms. A particular goal of the project will be to understand the functional implications of isoform switching and mechanism of regulation. References. Gonzàlez‐Porta M, Frankish A, Rung J, Harrow J, Brazma A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol. 2013 Jul 1;14(7):R70. Petryszak R, Keays M, Tang YA, Fonseca NA, Barrera E, Burdett T, Füllgrabe A, Fuentes AM, Jupp S, Koskinen S, Mannion O, Huerta L, Megy K, Snow C, Williams E, Barzine M, Hastings E, Weisser H, Wright J, Jaiswal P, Huber W, Choudhary J, Parkinson HE, Brazma A. Expression Atlas update‐an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 2016 Jan 4;44(D1):D746‐52.