Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Selecting the microarray genes that link the specific genes of interest among them. http://ibb.uab.es/revresearch Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform Comput Biol, 6:367-386. . Objectives Provide powerful tools for studying the nonlinear dependences among gene expressions focussed in researcher genes of interest. Taking advantage of the high-throughput capability of microarray technology. Procedure pre-process: – – Obtaining the correlation degree between each pair of genes using the PCOP calculus. Building the minimum-spanning tree among the microarray-genes using the previously calculated pairwise-correlations. Zoom-in operation: – – Selecting the genes which connect the query genes using the minimum-spanning tree calculated in the pre-process. The query genes are provided by the researcher in each new query. Obtaining the intra-set behaviour pattern of the subset generated by the selection algorithm, using the PCOP calculus. This inner pattern relates the expression fluctuations of the selected genes, plus the query ones, among them. Using the minimum spanning tree to select the genes that link the query ones. Considering that the researcher genes of interest have a certain level of correlation as a set (correlation level which the researcher does not wish to lose), the zoom-in operation selects the maximum number of genes which connect the genes of the query set, but which conserve the correlation level of the query set in the new query-plus-selectedgenes set. – When building a hierarchical clustering using a single linkage, we obtain a minimum-spanning tree where each edge of the tree represents the relationship used to add each new gene or gene cluster to the tree (Gower and Ross 1969). In this way, we can apply these clusters, their hierarchy and the properties of the hierarchical clustering to their corresponding minimum-spanning tree. Following the minimum spanning path the genes more far from the query ones are the more correlated among them plus the more correlated with the querygenes best correlated genes. Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform Comput Biol, 6:367-386. Example of microarray analysis. the profiles of 9703 cDNAS representing ~8000 unique genes of 60 cell lines, in relation to the activity profiles of 1400 drugs. They provide a resulting table of 1376 genes and 118 compounds with the most representative substances and genes normalised for the 60 cell lines (a suitable data for knowledge discovery using our tools) Minimum spanning tree among some microarray geneexpressions using the f value provided by the PCOP calculus. Hierarchical cluster corresponding to the previous minimum spanning tree. Example 1: relating cyclin E1 (CNNE1) and TP53 expressions. Selected genes that link the query ones: – the thioredoxin-related protein endothelial protein disulphide isomerase gene, TXNDC5. Minimum spanning tree among some microarray geneexpressions using the f value provided by the PCOP calculus. Query genes Selected genes The non-linear relationship among the expression of the three genes. TXNDC5 TP53 CNNE1 Results analysis Cyclin E1 is a regulatory subunit of the cdc2-related protein kinase CDK2, which is activated shortly before S-phase entry. Lower levels of cyclin E1 imply lower cell-division rates, whereas higher levels of cyclin E1 precede higher rates of cell-division (Hinchcliffe et al 1999). High levels of TP53 induce either apoptosis (in the presence of appropriate mutations) or, alternatively, switch on the mechanisms of DNA repair. At low levels of TP53, less apoptosis is produced and mutations can accumulate more easily. It is known that rapidly dividing cells show a higher mutation rate, whereas slowly dividing cells show lower rates (Bielas and Heddle 2003). It has been reported that constitutive cyclin E1 over-expression, in both immortalized rat embryo fibroblasts and human breast epithelial cells, results in chromosomal instability (Spruck et al 1999; Tissier et al 2004). A slight overproduction (just 5% more is enough) of cyclin E1 has been associated with the malignant phenotype and is strongly correlated with tumour size (Tissier et al 2004). Other authors have also reported the association of the above genes with cell division and apoptosis (Knoblach et al 2003; Sullivan et al 2003). Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform Comput Biol, 6:367-386. Example 2: relating cyclin E1 (CNNE1), TNK2 and CDK6 expressions. Selected genes that link the query ones: The non-linear relationship among the expression of the three genes. CNNE1 TNK2 CDK6 Results analysis CDK6 and cyclin E1 genes show mutual-exclusion expression with respect to the TNK2 gene. When TNK2 is expressed above the control level, CDK6 and cyclin E1 levels are fixed around their minimum expression. When CDK6 and cyclin E1 are over-expressed, TNK2 is at its minimum levels. The 0 value in the curve parameter (abscissa axes in Figure) is positioned when all of the genes are in basal expression. The location of this 0 value shows us that cyclin E1 and CDK6 overexpression is more usual than TNK2 over-expression. Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform Comput Biol, 6:367-386. Conclusions An approach beyond the activation pathways or GO functional annotation: – – the connection between two genes, cyclin E1 and TP53 seen in the example, in the GO tool their functions/biological processes are only indirectly related. In the activation or metabolic pathways, they are not related by their expression. And, in any case, the dependence of their respective expression levels is not shown. The hidden reason for linking TXNDC5 to Cyclin E1 and TP53 is the adaptive response of the cell (TXNDC5) in order to survive in a low oxygen environment due to the cell growing at a high-division rate (cyclin E1). Neither represents the belonging to the same activation pathway nor to the interaction of these proteins. Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform Comput Biol, 6:367-386. Bibliografy Delicado, P. (2001) Another look at principal curves and surfaces. J. Multivariate Anal., 77, 84-116. Delicado, P. and Huerta, M. (2003) Principal curves of oriented points: Theoretical and computational improvements. Computation. Stat., 18, 293-315. Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform Comput Biol, 6:367-386.