Download ImplicationNetworks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Implication Networks from Large
Gene-expression Datasets
Debashis Sahoo
PhD Candidate, Electrical Engineering, Stanford University
Joint work with David Dill, Andrew Gentles, Rob Tibshirani, Sylvia Plevritis
ICBP,
Stanford
University
Integrative Cancer
Biology
Program,
Stanford University
1
Motivation
Current approaches
CCNB2
Clustering
Co-expression
Linear regression
Mutual information
BUB1B
ICBP, Stanford University
2
Hidden Relationships
GABRB1
Pearson’s correlation = -0.1
GABRB1 and ACPP are not
linearly related.
There is a Boolean relationship
ACPP high  GABRB1 low
GABRB1 high  ACPP low
ACPP
ICBP, Stanford University
3
Outline
Motivation
Boolean analysis
Boolean implication network
Biological insights
Conserved Boolean network
Conclusion
ICBP, Stanford University
4
Outline
Motivation
Boolean analysis
Boolean implication network
Biological insights
Conserved Boolean network
Conclusion
ICBP, Stanford University
5
Boolean Analysis Workflow
Get data
GEO
[Edgar et al. 02]
Normalize
RMA
[Irizarry et al. 03]
Determine thresholds
Discover Boolean relationships
Biological interpretation
ICBP, Stanford University
6
Determine threshold
High
Intermediate
Low
Threshold
Sorted arrays
A threshold is
determined for
each gene.
The arrays are
sorted by gene
expression
StepMiner is used
to determine the
threshold [Sahoo et al. 07]
ICBP, Stanford University
7
Discovering Boolean
Relationships
4
1
3
GABRB1
2
ACPP
Analyze pairs of genes.
Analyze the four different
quadrants.
Identify sparse
quadrants.
Record the Boolean
relationships.
ACPP high  GABRB1 low
GABRB1 high  ACPP low
ICBP, Stanford University
8
Boolean Relationships
There are six possible Boolean relationships
A low  B low
A low  B high
A high  B low
A high  B high
Equivalent
Opposite
ICBP, Stanford University
9
Four Asymmetric Boolean
Relationships
XIST high  RPS4Y1 low
CD19
RPS4Y1
PTPRC low  CD19 low
PTPRC
XIST
A low  B low
A low  B high
A high  B low
A high  B high
NUAK1
SPARC
FAM60A low  NUAK1 high COL3A1 high  SPARC high
FAM60A
COL3A1
ICBP, Stanford University
10
Two Symmetric Boolean
Relationships
Opposite
EED
CCNB2
Equivalent
BUB1B
XTP7
ICBP, Stanford University
11
Outline
Motivation
Boolean analysis
Boolean implication network
Biological insights
Conserved Boolean network
Conclusion
ICBP, Stanford University
12
Boolean Implication Network
Boolean implications
form a directed graph
Nodes:
A high
For each gene A
A high
A low
B low
C high
Edges:
A high to B low
A high  B low
ICBP, Stanford University
13
Size of The Boolean Networks
Human (208 million)
Mouse (336 million)
Fly (17 million)
70
60
Percentage
50
40
30
20
10
0
lo=>hi
hi=>lo
lo=>lo
hi=>hi
Equivalent
lowhigh highlow lowlow highhigh Equivalent
ICBP, Stanford University
Opposite
Opposite
14
Boolean Networks Are Not
Scale Free
Human
Symmetric
#relationships
Asymmetric
#probesets
#probesets
#probesets
Total
#relationships
ICBP, Stanford University
#relationships
15
Outline
Motivation
Boolean analysis
Boolean implication network
Biological insights
Conserved Boolean network
Conclusion
ICBP, Stanford University
16
Gender Specific
XIST
RPS4Y1
X inactivation specific
transcript
Expressed in female
RPS4Y1
Y-linked gene
Expressed in male only
Boolean relationship
XIST
[Day et al. 07]
XIST highRPS4Y1 low
ICBP, Stanford University
17
Tissue Specific
ACPP
GABRB1
Acid phosphatase,
prostate
Prostate specific gene
GABRB1
GABA A receptor, beta 1
Brain specific
ACPP
Boolean relationship
ACPP highGABRB1 low
ICBP, Stanford University
18
Development
HOXD3
Homeobox D3
Fruit fly antennapedia homolog
HOXA13
HOXA13
Homeobox A13
Fruit fly ultrabithorax homolog
Boolean relationship
HOXD3 high  HOXA13 low
HOXD3
[Rinn et al. 07]
ICBP, Stanford University
19
Differentiation
PTPRC
CD19
protein tyrosine phosphatase,
receptor type, C
B220
Expressed in B cell precursors
and mature B cell
CD19
Expressed in mature B cell
Boolean relationship
PTPRC
PTPRC low  CD19 low
ICBP, Stanford University
20
Biological Insights
Tissue
RPS4Y1
GABRB1
Gender
XIST
ACPP
Differentiation
CD19
HOXA13
Development
HOXD3
ICBP, Stanford University
PTPRC
21
Outline
Motivation
Boolean analysis
Boolean implication network
Biological insights
Conserved Boolean network
Conclusion
ICBP, Stanford University
22
Conserved Boolean Networks
17M
208M
Fly
41K
4M
Human
336M
Find orthologs between
human, mouse and fly using
EUGene database. [Gilbert, 02]
Search for orthologous gene
pairs that have the same
Boolean relationship.
Mouse
ICBP, Stanford University
23
Conserved Boolean
Relationships
Ccnb2
CycB
Bub1
Human
CCNB2
Mouse
Fly
Bub1b
BUB1B
Two largest connected components in the network of equivalent
genes
178 genes: highly enriched for cell-cycle and DNA replication
32 genes: highly enriched for synaptic functions
ICBP, Stanford University
24
Conserved Asymmetric
Boolean Relationships
Gabrb1
Lcch3
Bub1
Human
GABRB1
Mouse
Fly
Bub1b
BUB1B
GABRB1 expressing cells have low cell cycle (BUB1B)
activity.
ICBP, Stanford University
25
Outline
Motivation
Boolean analysis
Boolean implication network
Biological insights
Conserved Boolean network
Conclusion
ICBP, Stanford University
26
Conclusion
Boolean analysis
Boolean relationships are directly visible on the
scatter plot.
Enables discovery of asymmetric relationship.
Can reveal known biological processes.
Has potential for new biological discovery.
Boolean network
Is large
Is not scale free
ICBP, Stanford University
27
Acknowledgements
Leonore A Herzenberg
James Brooks
Joe Lipsick
Gavin Sherlock
Howard Chang
Stuart Kim
The Felsher Lab:
Natalie Wu
Cathy Shachaf
Dean Felsher
Funding: ICBP Program (NIH grant: 5U56CA112973-02)
ICBP, Stanford University
28
The END
ICBP, Stanford University
29
Example
ICBP, Stanford University
30
Determine threshold
Its hard to determine a threshold for this gene.
StepMiner usually puts a threshold in the middle
for this case.
ICBP, Stanford University
31
Statistical Tests
Compute the expected number of points under
the independence model
(expected – observed)
statistic =
√
expected
a01
a11
a00
a10
Compute maximum likelihood estimate of the
error rate
error rate =
1
2
( (a
a00
00+
a01)
+
a00
(a00+ a10)
)
ICBP, Stanford University
32
Related documents