Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Selecting the microarray genes that
link the specific genes of interest
among them.
http://ibb.uab.es/revresearch
Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between
expression profiles by the Principal Curves of Oriented-Points approach. J Bioinform
Comput Biol, 6:367-386. .
Objectives


Provide powerful tools for studying the nonlinear dependences among gene
expressions focussed in researcher genes of
interest.
Taking advantage of the high-throughput
capability of microarray technology.
Procedure

pre-process:
–
–

Obtaining the correlation degree between each pair of genes using
the PCOP calculus.
Building the minimum-spanning tree among the microarray-genes
using the previously calculated pairwise-correlations.
Zoom-in operation:
–
–
Selecting the genes which connect the query genes using the
minimum-spanning tree calculated in the pre-process. The query
genes are provided by the researcher in each new query.
Obtaining the intra-set behaviour pattern of the subset generated
by the selection algorithm, using the PCOP calculus. This inner
pattern relates the expression fluctuations of the selected genes,
plus the query ones, among them.
Using the minimum spanning tree to select
the genes that link the query ones.

Considering that the researcher genes of interest have a certain level
of correlation as a set (correlation level which the researcher does not
wish to lose), the zoom-in operation selects the maximum number of
genes which connect the genes of the query set, but which conserve
the correlation level of the query set in the new query-plus-selectedgenes set.
–
When building a hierarchical clustering using a single linkage, we obtain a
minimum-spanning tree where each edge of the tree represents the
relationship used to add each new gene or gene cluster to the tree (Gower
and Ross 1969). In this way, we can apply these clusters, their hierarchy
and the properties of the hierarchical clustering to their corresponding
minimum-spanning tree.

Following the minimum spanning path the genes more far from the query ones
are the more correlated among them plus the more correlated with the querygenes best correlated genes.
Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points
approach. J Bioinform Comput Biol, 6:367-386.
Example of microarray analysis.

the profiles of 9703 cDNAS representing
~8000 unique genes of 60 cell lines, in
relation to the activity profiles of 1400 drugs.
They provide a resulting table of 1376 genes
and 118 compounds with the most
representative substances and genes
normalised for the 60 cell lines (a suitable
data for knowledge discovery using our tools)
Minimum spanning tree among some microarray geneexpressions using the f value provided by the PCOP
calculus.
Hierarchical cluster corresponding to
the previous minimum spanning tree.
Example 1: relating cyclin E1 (CNNE1)
and TP53 expressions.

Selected genes that link the query ones:
–
the thioredoxin-related protein endothelial protein
disulphide isomerase gene, TXNDC5.
Minimum spanning tree among some microarray geneexpressions using the f value provided by the PCOP
calculus.
Query genes
Selected genes
The non-linear relationship among the
expression of the three genes.
TXNDC5
TP53
CNNE1
Results analysis

Cyclin E1 is a regulatory subunit of the cdc2-related protein kinase CDK2,
which is activated shortly before S-phase entry. Lower levels of cyclin E1 imply
lower cell-division rates, whereas higher levels of cyclin E1 precede higher
rates of cell-division (Hinchcliffe et al 1999). High levels of TP53 induce either
apoptosis (in the presence of appropriate mutations) or, alternatively, switch on
the mechanisms of DNA repair. At low levels of TP53, less apoptosis is
produced and mutations can accumulate more easily. It is known that rapidly
dividing cells show a higher mutation rate, whereas slowly dividing cells show
lower rates (Bielas and Heddle 2003). It has been reported that constitutive
cyclin E1 over-expression, in both immortalized rat embryo fibroblasts and
human breast epithelial cells, results in chromosomal instability (Spruck et al
1999; Tissier et al 2004). A slight overproduction (just 5% more is enough) of
cyclin E1 has been associated with the malignant phenotype and is strongly
correlated with tumour size (Tissier et al 2004). Other authors have also
reported the association of the above genes with cell division and apoptosis
(Knoblach et al 2003; Sullivan et al 2003).
Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points
approach. J Bioinform Comput Biol, 6:367-386.
Example 2: relating cyclin E1 (CNNE1),
TNK2 and CDK6 expressions.

Selected genes that link the query ones:
The non-linear relationship among the
expression of the three genes.
CNNE1
TNK2
CDK6
Results analysis

CDK6 and cyclin E1 genes show mutual-exclusion expression with
respect to the TNK2 gene. When TNK2 is expressed above the control
level, CDK6 and cyclin E1 levels are fixed around their minimum
expression. When CDK6 and cyclin E1 are over-expressed, TNK2 is at
its minimum levels. The 0 value in the curve parameter (abscissa axes
in Figure) is positioned when all of the genes are in basal expression.
The location of this 0 value shows us that cyclin E1 and CDK6 overexpression is more usual than TNK2 over-expression.
Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points
approach. J Bioinform Comput Biol, 6:367-386.
Conclusions

An approach beyond the activation pathways
or GO functional annotation:
–
–
the connection between two genes, cyclin E1 and TP53 seen in
the example, in the GO tool their functions/biological processes
are only indirectly related. In the activation or metabolic pathways,
they are not related by their expression. And, in any case, the
dependence of their respective expression levels is not shown.
The hidden reason for linking TXNDC5 to Cyclin E1 and TP53 is
the adaptive response of the cell (TXNDC5) in order to survive in a
low oxygen environment due to the cell growing at a high-division
rate (cyclin E1). Neither represents the belonging to the same
activation pathway nor to the interaction of these proteins.
Huerta, M., Cedano, J. and Querol, E. (2007) Analysis of non-linear relation between expression profiles by the Principal Curves of Oriented-Points
approach. J Bioinform Comput Biol, 6:367-386.
Bibliografy



Delicado, P. (2001) Another look at principal curves
and surfaces. J. Multivariate Anal., 77, 84-116.
Delicado, P. and Huerta, M. (2003) Principal curves
of oriented points: Theoretical and computational
improvements. Computation. Stat., 18, 293-315.
Huerta, M., Cedano, J. and Querol, E. (2007)
Analysis of non-linear relation between expression
profiles by the Principal Curves of Oriented-Points
approach. J Bioinform Comput Biol, 6:367-386.
Related documents