Download slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Subspace Differential Coexpression Analysis
for the Discovery of Disease-related Dysregulations
Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
Chad L. Myers and Vipin Kumar
[email protected]
http://www-users.cs.umn.edu/~kumar/dmbio/
Department of Computer Science and Engineering
15th PSB 01/08/2010
Differential Expression (DE)
• Differential Expression (DE)
– Traditional analysis targets the changes of
expression level
cases
Expression level
controls
Expression over samples in controls and cases
[Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc.
Differential Coexpression (DC)
• Differential Coexpression (DC)
– Targets changes of the coherence of expression
cases
genes
controls
cases interesting,
Question:
Is this gene
i.e. associated w/ the phenotype?
controls
Answer: No, in term of differential
expression (DE).
However, what if there are
another two genes ……?
Matrix of expression values
Yes!& Spang, 2005]
[Kostka
Expression over samples
in controls and cases
Biological interpretations of DC:
Dysregulation of pathways, mutation of transcriptional factors, etc.
[Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc.
Differential Coexpression (DC)
• Existing work on differential coexpression
– Pairs of genes with differential coexpression
• [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004]
– Clustering based differential coexpression analysis
• [Ihmels et al., 2005], [Watson., 2006]
– Network based analysis of differential coexpression
• [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006],
[Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008]
– Beyond pair-wise (size-k) differential coexpression
• [Kostka and Spang., 2004], [Prieto et al., 2006]
– Gene-pathway differential coexpression
• [Rosemary et al., 2008]
– Pathway-pathway differential coexpression
• [Cho et al., 2009]
Existing DC work is “full-space”
• Full-space differential coexpression
Full-space measures: e.g.
correlation difference
• May have limitations due to the heterogeneity of
– Causes of a disease (e.g. genetic difference)
– Populations affected (e.g. demographic difference)
Motivation:
Such subspace patterns
may be missed by fullspace models
Extension to Subspace Differential Coexpression
• Definition of Subspace Differential Coexpression Pattern
– A set of k genes
–
–
= {g1, g2 ,…, gk}
: Fraction of samples in class A, on which the k genes are coexpressed
: Fraction of samples in class B, on which the k genes are coexpressed
Problem: given n
genes, find all the
subsets of genes,
s.t. SDC≥d
as a measure of subspace differential coexpression
Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]
Computational Challenge
Problem: given n genes, find all the subsets of genes, s.t. SDC≥d
null
A
AB
AC
AD
B
AE
C
D
BC
BD
BE
Given n genes, there are 2n
E
CD
CE
DE
candidates of SDC pattern!
How to effectively handle the
combinatorial search space?
ABC
ABD
ABE
ABCD
ACD
ABCE
ACE
ADE
ABDE
BCD
ACDE
BCE
BCDE
BDE
CDE
Similar motivation and
challenge as biclustering,
but here
differential biclustering !
ABCDE
Direct Mining of Differential Patterns
Refined SDC measure: “direct”
>>
A measure M is antimonotonic
if V A,B: A
B  M(A) >= M(B)
≈
Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]
[Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN]
An Association-analysis Approach
systematic and efficient combinatorial search
Refined SDC measure
null
Disqualified
A
B
C
D
E
A measure M is antimonotonic if
V A,B: A
B  M(A) >= M(B)
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
ABCDE
Prune all the
supersets
[ Agrawal et al. 1994]
BCDE
Advantages:
1) Systematic & direct
2) Completeness
3) Efficiency
Validation
• Three lung cancer datasets
– [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007]
• All are from Affymetrix microarrays (first two: HG-U95A, and the
third: HG-U133A)
– Lung cancer samples & normal samples
• Combined dataset
–
–
–
–
More samples
Proper normalizations before combining: (RMA, DWD, XPN)
Lung cancer samples (102)
normal samples (67)
RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008]
Statistical Significance
Phenotype permutation test (n=1000 )
C
B
A
Could Subspace DC patterns have been
discovered in full-space?
Subspace DC measures
88 statistically significant size-3 patterns (stars)
Can NOT be
found in full-space
Can also be found
in full-space
Phenotype permutation
based significant cutoff for
the full-space measure
Full-space DC measures
DC (Differential Coexpression)
A 10-gene Subspace DC Pattern
Enriched with the TNF-α/NFkB signaling pathway
(6/10 overlap with the pathway, P-value: 1.4*10-5)
≈ 10%
Suggests that the dysregulation of TNF-α/NFkB
pathway may be related to lung cancer≈ 60%
www. ingenuity.com: enriched Ingenuity subnetwork
Biological Interpretations
• Specific interpretation
– Enriched cancer-related signaling pathways
• TNF-α/NFkB
• WNT
– Target gene sets of cancer-related microRNA & TFs
• microRNA:
– miR-101 ({PIK3C2B,TSC22D1} + AKAP12)
miR-101 is shown down-regulated in cancer
[Friedman et al 2009]
• Transcriptional factor (TF):
– ATF2 ({ETV4,PTHLH} + CBX5)
Mutations of ATF2 are shown to be related to cancer
[Woo et al. 2002]
Summary & Future Directions
• Summary
– Proposed the problem definition & a systematic approach for subspace DC
– Subspace DC analysis can identify many statistically significant &
biologically relevant patterns that would have been missed in full-space
• Potential Biomedical utility
– Study the demographic and genetic difference within each class
Compare
Compare
– Phenotype classification with subspace DC patterns
• Combine DE and Subspace DC patterns
– Other types of data, e.g. SNP, metabolites, etc.
DE (Differential Expression);
DC (Differential Coexpression)
Acknowledgement
• Co-authors at Dept. Computer Science, Univ. of Minnesota
Data Mining for Biomedical
Informatics Group
Gaurav
Pandey
Michael
Steinbach
Vipin
Kumar
• Conference organizers
• NLM/NIH travel award
Comp. Bio.
Group
Comp. Bio. & Func.
Genomic Group
Rui
Kuang
Chad
Myers
NSF grants
#IIS0916439
#CRI-0551551
#IIS-0308264
#ITR-0325949
UMR, IBM, Mayo Clinic for BICB Fellowship
Thanks!
• Paper
– Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
Chad L. Myers and Vipin Kumar,
Subspace Differential Coexpression Analysis: Problem Definition
and a General Approach
Proceedings of 15th Pacific Symposium on Biocomputing, 2010
• Source codes: http://vk.cs.umn.edu/SDC
• Questions:
– Gang Fang: [email protected]
Related documents