Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar [email protected] http://www-users.cs.umn.edu/~kumar/dmbio/ Department of Computer Science and Engineering 15th PSB 01/08/2010 Differential Expression (DE) • Differential Expression (DE) – Traditional analysis targets the changes of expression level cases Expression level controls Expression over samples in controls and cases [Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc. Differential Coexpression (DC) • Differential Coexpression (DC) – Targets changes of the coherence of expression cases genes controls cases interesting, Question: Is this gene i.e. associated w/ the phenotype? controls Answer: No, in term of differential expression (DE). However, what if there are another two genes ……? Matrix of expression values Yes!& Spang, 2005] [Kostka Expression over samples in controls and cases Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc. [Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc. Differential Coexpression (DC) • Existing work on differential coexpression – Pairs of genes with differential coexpression • [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004] – Clustering based differential coexpression analysis • [Ihmels et al., 2005], [Watson., 2006] – Network based analysis of differential coexpression • [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006], [Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008] – Beyond pair-wise (size-k) differential coexpression • [Kostka and Spang., 2004], [Prieto et al., 2006] – Gene-pathway differential coexpression • [Rosemary et al., 2008] – Pathway-pathway differential coexpression • [Cho et al., 2009] Existing DC work is “full-space” • Full-space differential coexpression Full-space measures: e.g. correlation difference • May have limitations due to the heterogeneity of – Causes of a disease (e.g. genetic difference) – Populations affected (e.g. demographic difference) Motivation: Such subspace patterns may be missed by fullspace models Extension to Subspace Differential Coexpression • Definition of Subspace Differential Coexpression Pattern – A set of k genes – – = {g1, g2 ,…, gk} : Fraction of samples in class A, on which the k genes are coexpressed : Fraction of samples in class B, on which the k genes are coexpressed Problem: given n genes, find all the subsets of genes, s.t. SDC≥d as a measure of subspace differential coexpression Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] Computational Challenge Problem: given n genes, find all the subsets of genes, s.t. SDC≥d null A AB AC AD B AE C D BC BD BE Given n genes, there are 2n E CD CE DE candidates of SDC pattern! How to effectively handle the combinatorial search space? ABC ABD ABE ABCD ACD ABCE ACE ADE ABDE BCD ACDE BCE BCDE BDE CDE Similar motivation and challenge as biclustering, but here differential biclustering ! ABCDE Direct Mining of Differential Patterns Refined SDC measure: “direct” >> A measure M is antimonotonic if V A,B: A B M(A) >= M(B) ≈ Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] [Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN] An Association-analysis Approach systematic and efficient combinatorial search Refined SDC measure null Disqualified A B C D E A measure M is antimonotonic if V A,B: A B M(A) >= M(B) AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE ABCDE Prune all the supersets [ Agrawal et al. 1994] BCDE Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency Validation • Three lung cancer datasets – [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007] • All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A) – Lung cancer samples & normal samples • Combined dataset – – – – More samples Proper normalizations before combining: (RMA, DWD, XPN) Lung cancer samples (102) normal samples (67) RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008] Statistical Significance Phenotype permutation test (n=1000 ) C B A Could Subspace DC patterns have been discovered in full-space? Subspace DC measures 88 statistically significant size-3 patterns (stars) Can NOT be found in full-space Can also be found in full-space Phenotype permutation based significant cutoff for the full-space measure Full-space DC measures DC (Differential Coexpression) A 10-gene Subspace DC Pattern Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10-5) ≈ 10% Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer≈ 60% www. ingenuity.com: enriched Ingenuity subnetwork Biological Interpretations • Specific interpretation – Enriched cancer-related signaling pathways • TNF-α/NFkB • WNT – Target gene sets of cancer-related microRNA & TFs • microRNA: – miR-101 ({PIK3C2B,TSC22D1} + AKAP12) miR-101 is shown down-regulated in cancer [Friedman et al 2009] • Transcriptional factor (TF): – ATF2 ({ETV4,PTHLH} + CBX5) Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002] Summary & Future Directions • Summary – Proposed the problem definition & a systematic approach for subspace DC – Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space • Potential Biomedical utility – Study the demographic and genetic difference within each class Compare Compare – Phenotype classification with subspace DC patterns • Combine DE and Subspace DC patterns – Other types of data, e.g. SNP, metabolites, etc. DE (Differential Expression); DC (Differential Coexpression) Acknowledgement • Co-authors at Dept. Computer Science, Univ. of Minnesota Data Mining for Biomedical Informatics Group Gaurav Pandey Michael Steinbach Vipin Kumar • Conference organizers • NLM/NIH travel award Comp. Bio. Group Comp. Bio. & Func. Genomic Group Rui Kuang Chad Myers NSF grants #IIS0916439 #CRI-0551551 #IIS-0308264 #ITR-0325949 UMR, IBM, Mayo Clinic for BICB Fellowship Thanks! • Paper – Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General Approach Proceedings of 15th Pacific Symposium on Biocomputing, 2010 • Source codes: http://vk.cs.umn.edu/SDC • Questions: – Gang Fang: [email protected]