Download Recursive partitioning for tumor classification with gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Recursive partitioning for tumor
classification with gene
microarray data
Heping Zhang, Chang-Yung Yu,
Burton Singer, Momian Xiong
What is Recursive Partitioning?
Basic Idea:
Technical description of recursive partitioning
Example:
Technical description of recursive partitioning
Algorithm:
• Examine all of the available gene expression levels and all
possible thresholds for each of the expression levels
• Select the combination of gene expression level and threshold
that results in the best separation of cancer and normal tissues
on the basis of the node purity function
Quality of the tree classification:
Error rate based on cross-validation
Technical description of recursive partitioning
Node Purity:
A little bit of math 
One example of entropy function:
P log(P) + (1-P) log(1-P), where
P is the probability of a tissue being normal within the node
Note:
• Maximum purity ( =0 )
When all tissues are of the same type within the node ( P = 0 or 1)
• Minimum purity ( = -log2)
When all tissues are of the same type within the node ( P = 0.5)
Example from the article
Expression profiles of 2,000 genes using an
Affimetrix oligonucleotide array in 22 normal and
40 colon cancer
tissues(www.sph.uth.tmc.edu/hgc)
Results: Using 5-fold cross validation, The error
rate is between 6-8%, which is much better than
that obtained by exsiting analysis.
Fig1. Classification trees for tissue types by using expression data from
three genes ( M26383, R15447, M28214)
Correlation among gene expression profiles
Another Tree Based on A Different Set of
Three Genes (Fig.6)
Correlation Matrix among Genes in Fig.1 and Fig. 6
Other clustering classification
1. Hierachical
2. K-means
3. Self-orgnizing maps
4. Coupled two-way clustering
Advantage of recursive partitioning
classification methods
1. Efficient with large number of genes
2. More than two types of tissues simultaneously
3. Automatically selects valuable genes as predictors
4. More precise than other classification methods
Conclusion:
1.It is likely that the information contained in a
large number of genes can be captured by a
small number of genes without significant loss
of information.
2.The precision of classification of recursive
partitioning is important for clinical application.
Related documents