Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Implication Networks from Large Gene-expression Datasets Debashis Sahoo PhD Candidate, Electrical Engineering, Stanford University Joint work with David Dill, Andrew Gentles, Rob Tibshirani, Sylvia Plevritis ICBP, Stanford University Integrative Cancer Biology Program, Stanford University 1 Motivation Current approaches CCNB2 Clustering Co-expression Linear regression Mutual information BUB1B ICBP, Stanford University 2 Hidden Relationships GABRB1 Pearson’s correlation = -0.1 GABRB1 and ACPP are not linearly related. There is a Boolean relationship ACPP high GABRB1 low GABRB1 high ACPP low ACPP ICBP, Stanford University 3 Outline Motivation Boolean analysis Boolean implication network Biological insights Conserved Boolean network Conclusion ICBP, Stanford University 4 Outline Motivation Boolean analysis Boolean implication network Biological insights Conserved Boolean network Conclusion ICBP, Stanford University 5 Boolean Analysis Workflow Get data GEO [Edgar et al. 02] Normalize RMA [Irizarry et al. 03] Determine thresholds Discover Boolean relationships Biological interpretation ICBP, Stanford University 6 Determine threshold High Intermediate Low Threshold Sorted arrays A threshold is determined for each gene. The arrays are sorted by gene expression StepMiner is used to determine the threshold [Sahoo et al. 07] ICBP, Stanford University 7 Discovering Boolean Relationships 4 1 3 GABRB1 2 ACPP Analyze pairs of genes. Analyze the four different quadrants. Identify sparse quadrants. Record the Boolean relationships. ACPP high GABRB1 low GABRB1 high ACPP low ICBP, Stanford University 8 Boolean Relationships There are six possible Boolean relationships A low B low A low B high A high B low A high B high Equivalent Opposite ICBP, Stanford University 9 Four Asymmetric Boolean Relationships XIST high RPS4Y1 low CD19 RPS4Y1 PTPRC low CD19 low PTPRC XIST A low B low A low B high A high B low A high B high NUAK1 SPARC FAM60A low NUAK1 high COL3A1 high SPARC high FAM60A COL3A1 ICBP, Stanford University 10 Two Symmetric Boolean Relationships Opposite EED CCNB2 Equivalent BUB1B XTP7 ICBP, Stanford University 11 Outline Motivation Boolean analysis Boolean implication network Biological insights Conserved Boolean network Conclusion ICBP, Stanford University 12 Boolean Implication Network Boolean implications form a directed graph Nodes: A high For each gene A A high A low B low C high Edges: A high to B low A high B low ICBP, Stanford University 13 Size of The Boolean Networks Human (208 million) Mouse (336 million) Fly (17 million) 70 60 Percentage 50 40 30 20 10 0 lo=>hi hi=>lo lo=>lo hi=>hi Equivalent lowhigh highlow lowlow highhigh Equivalent ICBP, Stanford University Opposite Opposite 14 Boolean Networks Are Not Scale Free Human Symmetric #relationships Asymmetric #probesets #probesets #probesets Total #relationships ICBP, Stanford University #relationships 15 Outline Motivation Boolean analysis Boolean implication network Biological insights Conserved Boolean network Conclusion ICBP, Stanford University 16 Gender Specific XIST RPS4Y1 X inactivation specific transcript Expressed in female RPS4Y1 Y-linked gene Expressed in male only Boolean relationship XIST [Day et al. 07] XIST highRPS4Y1 low ICBP, Stanford University 17 Tissue Specific ACPP GABRB1 Acid phosphatase, prostate Prostate specific gene GABRB1 GABA A receptor, beta 1 Brain specific ACPP Boolean relationship ACPP highGABRB1 low ICBP, Stanford University 18 Development HOXD3 Homeobox D3 Fruit fly antennapedia homolog HOXA13 HOXA13 Homeobox A13 Fruit fly ultrabithorax homolog Boolean relationship HOXD3 high HOXA13 low HOXD3 [Rinn et al. 07] ICBP, Stanford University 19 Differentiation PTPRC CD19 protein tyrosine phosphatase, receptor type, C B220 Expressed in B cell precursors and mature B cell CD19 Expressed in mature B cell Boolean relationship PTPRC PTPRC low CD19 low ICBP, Stanford University 20 Biological Insights Tissue RPS4Y1 GABRB1 Gender XIST ACPP Differentiation CD19 HOXA13 Development HOXD3 ICBP, Stanford University PTPRC 21 Outline Motivation Boolean analysis Boolean implication network Biological insights Conserved Boolean network Conclusion ICBP, Stanford University 22 Conserved Boolean Networks 17M 208M Fly 41K 4M Human 336M Find orthologs between human, mouse and fly using EUGene database. [Gilbert, 02] Search for orthologous gene pairs that have the same Boolean relationship. Mouse ICBP, Stanford University 23 Conserved Boolean Relationships Ccnb2 CycB Bub1 Human CCNB2 Mouse Fly Bub1b BUB1B Two largest connected components in the network of equivalent genes 178 genes: highly enriched for cell-cycle and DNA replication 32 genes: highly enriched for synaptic functions ICBP, Stanford University 24 Conserved Asymmetric Boolean Relationships Gabrb1 Lcch3 Bub1 Human GABRB1 Mouse Fly Bub1b BUB1B GABRB1 expressing cells have low cell cycle (BUB1B) activity. ICBP, Stanford University 25 Outline Motivation Boolean analysis Boolean implication network Biological insights Conserved Boolean network Conclusion ICBP, Stanford University 26 Conclusion Boolean analysis Boolean relationships are directly visible on the scatter plot. Enables discovery of asymmetric relationship. Can reveal known biological processes. Has potential for new biological discovery. Boolean network Is large Is not scale free ICBP, Stanford University 27 Acknowledgements Leonore A Herzenberg James Brooks Joe Lipsick Gavin Sherlock Howard Chang Stuart Kim The Felsher Lab: Natalie Wu Cathy Shachaf Dean Felsher Funding: ICBP Program (NIH grant: 5U56CA112973-02) ICBP, Stanford University 28 The END ICBP, Stanford University 29 Example ICBP, Stanford University 30 Determine threshold Its hard to determine a threshold for this gene. StepMiner usually puts a threshold in the middle for this case. ICBP, Stanford University 31 Statistical Tests Compute the expected number of points under the independence model (expected – observed) statistic = √ expected a01 a11 a00 a10 Compute maximum likelihood estimate of the error rate error rate = 1 2 ( (a a00 00+ a01) + a00 (a00+ a10) ) ICBP, Stanford University 32