* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Smooth Response Surface - University of British Columbia
Biology and consumer behaviour wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Pathogenomics wikipedia , lookup
Population genetics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Point mutation wikipedia , lookup
Human genetic variation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
The Selfish Gene wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome evolution wikipedia , lookup
Gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Patching the Puzzle of Genetic Network Grace S. Shieh Institute of Statistical Science, Academia Sinica [email protected] Outline What is Genetic Network? Why the area is one of the frontiers? How Statistical modeling/computational algorithms simplify the complex puzzle? Applications Dogma of biology DNA -> mRNA -> Protein Proteins: the elements that function in organisms, e.g. yeast and human. Somatic mutations affect key pathways in Lung adenocarcinoma Nature, Oct.2008 Science, Sept, 2008 Complex human disease l Digenic effects may underlie: Type II diabetes Schizophrenia Retinitis pigmentosa Glaucoma Tong et al., Science 2004 Complex human disease These diseases may have similar synthetic effect in the yeast genetic interaction map Elements of genetic network derived from model organism, e.g. yeast, are likely to be conserved The topology of the genetic network of neighborhood of SGS1 (Tong et al., 2004) Experimental method to reveal genetic interactions Systematic Genetic Analysis with ordered Arrays of Yeast Deletion Mutants Tong et al., 2001, Science Global mapping of the Yeast Genetic interaction network Tong et al., 2004, Science Genome landscape of a cell Costanzo et al. 2010, Science Costanzo et al., Science 2010 Synthetic sick or lethal (SSL) gene pairs: when both genes are mutated, the organism will die, but neither lethal SSL is important for understanding how an organism tolerates genetic mutations Hartman, Garvik and Hartwell, 2001, Science Scenarios resulting in synthetic interaction Partially redundant genes A 3 partially redundant pathways, 2 required 2 partially redundant pathways A E A E J B F B F K C1 C2 C G C G L D D H D H M B E < 2% I Protein complex tolerating 1 but not 2 destabilizing mutations A B D C E F I SSL < 4% * A Pattern Recognition Approach to Infer Gene Networks Grace S. Shieh joined with C.-L. Chuang, C.-H. Jen and C.-M. Chen Bioinformatics 2008 Excerpted from Tong et al. (2001) Science Transcriptional Compensation (transcription reverse compensation) interactions (Lesage et al. 2004; Wong & Roth, 2005, Genetics; Kafri et al.,2005, Nature Genetics): among paralogues or SSL gene pairs, when one gene is mutated, its partner gene’s expression increases (decreases) Goal: to predict TC and TRC interactions among SSL gene pairs Four sets of Yeast (Sachromyces cerevisiae) micro-array gene expression data (Spellman, et al, 1998) were used. The red channel R: intensities of synchronized yeast by alpha factor arrest, arrest of a cdc 15 or cdc 28 mutant and Elutration; The Green channel G: average of nonsynchronized. Cell cycles of CLN2 gene qRT-PCR experiments For a given pair of SSL genes, Experimental group: gene A’s expression, gene B been knocked out Control group: gene A’s expression, gene B wildtype if A >> B => A& B may be TC if A << B => A& B may be TRC Gene expression of Transcription Compensation (TC) pairs Gene expression of Transcription Reverse Compensation (TRC) pairs The dependence of patterns and their associated interactions Assumption for PARE: the dependence of CP (SP) and TC (TD) interactions is significant. To test this hypothesis: Fisher’s exact test The Proportion of Complementary Pattern (CP) in TC Screen genes with significant changes over time by maxt Gi (t ) mint Gi (t ) 1.5 resulted in 35 gene pairs CP SP Total TC 13 9 22 TD 2 11 13 Total 15 20 35 Fisher’s exact test: p-value < 0.02 significant at 95% level PARE The gene expression of the regulating gene is treated as object contour, and the lagged-1 expression of the target gene the boundary of interest in image segmentation algorithm 2 2 def def G j t G t Gi t G j t i D2 D1 Ei , j Ei , j 2 2 t t t t t t EiArea ,j t t 1, def 1 gi t g j t 2 t t gi (t ), g j (t ) 90o Discrete Signals Because gene expression is discrete signal, the 1st- and 2ndorder partial differential terms can be modified as follows: Gi (t ) Gi (t 1) Gi (t ) t t 2 Gi (t ) Gi (t 2) 2Gi (t 1) Gi (t ) 2 t (t ) 2 the interaction S i , j can be determined as weighted sum of the internal and external energies: S i , j Ei , j Ei , j D1 D2 E Area i, j PARE In this study, each gene is represented by a node in a graphical model, which is denoted by Gi , where i = 1, 2, …, N. The edge Si , j represents the gene-gene interaction between Gi and G j , where the enhancer gene Gi plays a key role in activating or repressing the target gene G j . Training set vs test set Leave-one-out cross validation: among n pairs, use n-1 pairs to train PARE, then predict the left 1 pair, iteratively for n. 3-fold cross validation: among all pairs, use 2/3 pairs to train, then predict the left 1/3, from all combinations iterative this for N times Experimental Results (TC/TRC) alpha data set (18 time points) – Table 1. The prediction results, checked against the qRTPCR experiments Training TPR FPR Test TPR Lagged Corr. 46% EB-GGMs 52% n-fold 76% 20% 73% 3-fold 78%* 18%* 71%* Std FPR 23% PARE 3% 23%* *Since 500 times 3-fold CVs were performed, only averages of TPRs are reported. Experimental Results (TC/TRC) For the alpha dataset, PARE yields 71-73% of true-positive rate prediction accuracy 81% FPR for predicting TC (TD) interaction was bounded by 12% (10%) genome-wide. Experimental Results (TC/TRC) Checking against published literature These genetic interactions are consistent with the following experimental results: Sgs1 and Srs2 are known redundant pathways in replication (Ira et al., 1999; Lee et al., 1999) Ex: Srs2 and Sgs1-Top3 suppress crossovers during double stand break repair in yeast. Sgs1/Top3/Rmi1 and Mus81/Mms4 complex are involved in both double-strand break repair and homologous recombination (Frabe et al., 2002). This indicates that Sgs1/Top3/Rmi1 and Mus81/Mms4 are alternative pathways to resolve recombination intermediates. Inferring transcriptional interactions 132 pairs of Activator-target gene (AT) and Repressor-target (RT) gene interactions were collected from published literatures (MIPS, Mewes et al, 1999, Nucleic Acids Research; Gancedo, 1998, Microbiology & Molecular Biology; Draper et al., 1994, Molecular & Cellular Biology, etc) Test for CP (SP) associatied with RT (AT) pairs in the data Chi-Squared test Experimental Results (AT/RT) Table 2. The prediction results using Elu data set, checked against the 132 TIs from literatures. Training Test TPR FPR TPR Lagged Corr. 51% EB-GGMs 59% n-fold 79% 16% 77% 3-fold 81%* 16%* 74%* Std FPR 17% PARE 3% 19%* *the average of 500 times repeats FPRs for genome-wide TIs predictions, and they are bounded by 21%. Conclusions The proposed PARE learns gene expression patterns, then it can predict similar genetic interactions using microarray data. TPRs of PARE applied to the alpha (Elu) dataset are about 73% (77%) for inferring TC/TD interactions (TI), respectively. Inferring genesis of obesity in human (join w. Karine & Jean-Daniel MGED from Adipocytes cells that primarily compose adipose tissue specialized in storing energy as fat Time-course MGED C/EBP alpha (time-course) 2 Human adipocyte-derived cell lines expression level (log ) 2 1 0 -1 0 2 4 6 8 day C/EBP alpha (MGED in ratio) 10 PARE to infer genesis of obesity in human Training stage: MGED of human adipocytes-derived cell lines 70 known transcriptional interactions (TIs) from iHOP Prediction results: 40+ pairs of TIs and some genetic interactions predicted Some are consistent with existing experimental results, some novel ones Inferring TIs Data preparation: Select significantly expressed genes: P-value < 0.01 Significantly expressed in at least 1 time point (5 time points in total) ->36 genes with a function of interest Interact with 14 genes of interest (AP2, CCL2, CCL5, LEP, etc…) -> 504 gene pairs WebPARE: webcomputing service of PARE (Chuang+, Wu+, Cheng and Shieh*, 2010, Bioinformatics) To provide a simple web-interface for users to infer GIs/TIs using time course gene expression data and existing knowledge, e.g. pre-stored validated TIs in yeast, mouse, human, etc (TRANSFAC) 45 An example: A list of genes involved in cell cycle and a data set (e.g. Elu) were uploaded to WebPARE, TIs of these pairs were of interest. Using integrated (pre-stored) pairs of TIs in yeast, PARE correctly predicted 118 out of 176 TIs, mTPR=67% e.g. The significant predicted network from 66 pairs -> 46 WebPARE html www.stat.sinica.edu.tw/WebPARE Demo WebPARE can be assessed at: http://www.stat.sinica.edu.tw/WebPARE Acknowledgement Dr. Ting-Fang Wang and Da-Yow Huang, Inst. of Biological Chemistry, Academia Sinica Drs. Karine Clement and J-D. Zucker, INSERM & IRD, France Cheng-Long Chuang, Chin-Yuan Guo, Chia-Chang Wang, Dr. Shi-Fong Guo, Yu-Bin Wang, Jia-Hung Wu Inst. of Statistical Science Thank you for your attention! Wanted (誠徵) 兼任 PhD students Research assistants to work at Shieh lab.(謝叔蓉老師實驗室) 統計所中研院 Parameter estimation Next, we estimate parameters via the particle swarm optimization (PSO) algorithm (Kennedy and Eberhart, 1995) is a stochastic optimization technique that simulate the behavior of a flock of birds. Example (finding largest gradient) Evolutionary Process of PSO Gene expression of ActivatorTarget (AT) gene pairs Gene expression of RepressorTarget (RT) gene pairs