* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan Gene expression profiling provides tremendous info to help unravel complexity cancer. Selection of the most informative genes from huge noise for cancer classification has taken importance. Wang and Gotoh [WnG] -> a novel Variable Precision Rough Sets-rooted robust soft computing method [VPRS] (by introducing an α depended degree). It is a simple, efficient and straightforward method for accurate cancer classification using single genes or gene pairs and subsequently inferred the direct gene regulatory network. U -> universe of discourse. R -> equivalence relation. The degree of dependency of a set of attributes Q on another set of attributes P is denoted by γP(Q) is Where = size of the union of the lower approximation of each equivalence class in U/R(Q) on P in U, |U| = size of U (the set of samples). Q = decision attributes D, P = subset of condition attributes, γP(D) is depended degree = degree to which P can discriminate between the distinct classes of D m= classification power. Greater γP(D) => stronger classification ability of P (basis for selecting informative genes) Canonical depended degree -> excessively rigid definition => difficult to detect the discriminative features, high computational expense, uncertainty of predictive performance and non-uniqueness. Method proposed: 1. 2. 3. To filter redundant info and retain the critical information (i.e. signal). Followed by making decision rules based on core information and classifying the whole dataset. To extract hidden meaningful rules, we sometimes need to lose some rigid definitions -> flexible α depended degree under soft computing consideration. This allows some single genes [or gene pairs] to have strong class discriminatory power. Interestingly, this also enables us to infer the networks and modules! All the gene selection, classification and network construction processes in this method correlate well with biologically meaningful decision rules, such as: tumor vs. normal cells, up- vs. down-regulation, and positive vs. negative regulation. Solution: WnG introduced α depended degree, a generalization form of the depended degree sets in their VPRS model The α depended degree, given P and D is: where |*| => size of set * U/R(•) => set of equivalence classes induced by the equivalence relation R(•). For the selection of high class-discrimination genes, lower limit of α = 0.7 Decision Rule: One decision rule was: “A ⇒ B” meaning “if A, then B”, A -> condition attributes and B -> decision attributes. The confidence of a decision rule A ⇒ B is defined as follows: where support (A) -> proportion of samples satisfying A and support (A ∧ B) -> that satisfying A and B simultaneously. T • Confidence indicates reliability of the rule. • For each determined α value, only the genes with γ P(D,α) = 1 were selected • Sufficient reliability was ensured by setting a high threshold for α. to build decision rules. Dataset • • SAGE breast cancer dataset having ~2.7 million tags and 27 samples. Each described as lymph node [LN(+)] and [LN(-)] primary breast tumors. Results Results… WnG have identified 7 highly discriminative (hub) genes. All identified genes have high classification accuracy (under α = 0.8) These seven hub genes are very interesting and informative for their biological relevance. Example: It is well known that the role of the ATF2/AP1 complex and its network is at the hub of tumorigenesis. This has been reflected by a high classification accuracy of 88.89%. Inference of the Gene Regulatory Network 1. 2. It is expected that a few highly class-discriminative hub genes could greatly enhance the authenticity and confidence of computed gene interaction networks. WnG investigated the gene regulatory network by employing the following: o 1 gene [instead of a class] is used as the decision attribute. o If “GENEI” is substituted for “Class label” in a decision table, GENE-I is regarded as the decision attribute with two distinct values: up-regulation (UR) and down-regulation (DR), and a new derivative table can be obtained. o They implement the discretization of this derivative table to obtain another newly derived table. o Decision rule: if GENE-I is DR, then Gene-II is DR; if Gene-I is UR, then Gene-II is UR. o They are not necessarily true in reverse. Therefore, a directed regulatory relation of GENE-I to GENE-II, a positive one, is established. Modularity of networks: 1. They use the Cytoscape plugin MCODE19 to analyze the network constructed. 2. Detected two significant modules, one forms a feed- forward loop. 3. They conclude that the co- regulation of multiple activators could be at least partly responsible for the occurrence of tumors. Some more observations: 1. Colon cancer dataset: they identified 18 discriminative hub genes for cancer. 2. 10 of these (e.g. DES and ACTA2) belong to DR genes in a tumor, while 8 other genes (e.g. IL8, HSPD1, SRPK1) belong to UR genes in a tumor. 3. The UR genes are regulated by more genes than DR ones, while the DR genes regulate more genes than UR ones. 4. Tumor suppressors inhibit tumor activators and activate as many other tumor suppressors as possible. Whereas, tumor activators activate other tumor activators and inhibit as few tumor suppressors as possible. This method is a new option for cancer classification and direct gene regulatory network inference. User-friendly, Simple Biologically interpretable Cost-effective in a clinical setting with single genes or gene pairs. Relatively easy to understand and follow Availability of programming codes with either open access or GNU general public license (GPL).