Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar [email protected] http://www-users.cs.umn.edu/~kumar/dmbio/ Department of Computer Science and Engineering RECOMB Systems Biology 12/05/2009 Differential Expression (DE) • Differential Expression (DE) – Traditional analysis targets the changes of expression level controls cases cases genes Expression level controls [Kostka & Spang, 2005] Expression over samples in controls and cases [Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc. Differential Coexpression (DC) • Differential Coexpression (DC) – Targets changes of the coherence of expression cases genes controls cases interesting, Question: Is this gene i.e. associated w/ the phenotype? controls Answer: No, in term of differential expression (DE). However, what if there are another two genes ……? Matrix of expression values Yes!& Spang, 2005] [Kostka Expression over samples in controls and cases Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc. [Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc. Differential Coexpression (DC) • Existing work on differential coexpression – Pairs of genes with differential coexpression • [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004] – Clustering based differential coexpression analysis • [Ihmels et al., 2005], [Watson., 2006] – Network based analysis of differential coexpression • [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006], [Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008] – Beyond pair-wise (size-k) differential coexpression • [Kostka and Spang., 2004], [Prieto et al., 2006] – Gene-pathway differential coexpression • [Rosemary et al., 2008] – Pathway-pathway differential coexpression • [Cho et al., 2009] Existing DC work is “full-space” • Full-space differential coexpression Full-space measures: e.g. correlation difference • May have limitations due to the heterogeneity of – Causes of a disease (e.g. genetic difference) – Populations affected (e.g. demographic difference) Motivation: Such subspace patterns may be missed by fullspace models Extension to Subspace Differential Coexpression • Definition of Subspace Differential Coexpression Pattern – A set of k genes – – = {g1, g2 ,…, gk} : Fraction of samples in class A, on which the k genes are coexpressed : Fraction of samples in class B, on which the k genes are coexpressed Given n genes, there are 2n candidates of SDC pattern! How to effectively handle the combinatorial search space? Similar motivation and challenge as biclustering, but here differetial biclustering ! as a measure of subspace differential coexpression Problem: given n genes, find all the subsets of genes, s.t. SDC≥d Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] Direct Mining of Differential Patterns Refined SDC measure: “direct” >> A measure M is antimonotonic if V A,B: A B M(A) >= M(B) ≈ Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] [Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN] An Association-analysis Approach systematic and efficient combinatorial search Refined SDC measure null Disqualified A B C D E A measure M is antimonotonic if V A,B: A B M(A) >= M(B) AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE ABCDE Prune all the supersets [ Agrawal et al. 1994] BCDE Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency Validation • Three lung cancer datasets – [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007] • All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A) – Lung cancer samples & normal samples • Combined dataset – – – – More samples Proper normalizations before combining: (RMA, DWD, XPN) Lung cancer samples (102) normal samples (67) RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008] Statistical Significance Phenotype permutation test (n=1000 ) C B A Could Subspace DC patterns have been discovered in full-space? Subspace DC measures 88 statistically significant size-3 patterns (stars) Can NOT be found in full-space Can also be found in full-space Phenotype permutation based significant cutoff for the full-space measure Full-space DC measures DC (Differential Coexpression) A 10-gene Subspace DC Pattern Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10-5) ≈ 10% Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer≈ 60% www. ingenuity.com: enriched Ingenuity subnetwork Biological Interpretations • Specific interpretation – Enriched cancer-related signaling pathways • TNF-α/NFkB • WNT – Target gene sets of cancer-related microRNA & TFs • microRNA: – miR-101 ({PIK3C2B,TSC22D1} + AKAP12) miR-101 is shown down-regulated in cancer [Friedman et al 2009] • Transcriptional factor (TF): – ATF2 ({ETV4,PTHLH} + CBX5) Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002] Summary & Future Directions • Summary – Proposed the problem definition & a systematic approach for subspace DC – Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space • Potential Biomedical utility – Study the demographic and genetic difference within each class Compare Compare – Phenotype classification with subspace DC patterns • Combine DE and Subspace DC patterns DE (Differential Expression); DC (Differential Coexpression) Acknowledgement • Co-authors at Dept. Computer Science, Univ. of Minnesota Data Mining for Biomedical Informatics Group Gaurav Pandey Michael Steinbach Vipin Kumar • Conference organizers Comp. Bio. Group Comp. Bio. & Func. Genomic Group Rui Kuang Chad Myers NSF grants #CRI-0551551 #IIS-0308264 #ITR-0325949 UMR-IBM-Mayo BICB Fellowship Thanks! • Paper – Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General Approach Proceedings of 15th Pacific Symposium on Biocomputing, 2010 • Source codes: http://vk.cs.umn.edu/SDC • Questions: – Gang Fang: [email protected]