* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ExPlain: Causal Analysis of Gene Expression Data from Promoter
Molecular evolution wikipedia , lookup
Genome evolution wikipedia , lookup
Gene desert wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Expression vector wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Gene expression wikipedia , lookup
Transcription factor wikipedia , lookup
Paracrine signalling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Biochemical cascade wikipedia , lookup
Signal transduction wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
ExPlain: Causal Analysis of Gene Expression Data from Promoter Models to Signaling Pathways Alexander Kel1 [email protected] Holger Karas1 [email protected] Nico Voss1 [email protected] Tagir Waleev2 [email protected] Edgar Wingender1,3 [email protected] 1 2 3 BIOBASE GmbH, Halchtersche Str. 33, D-38304 Wolfenbüttel, Germany A.P. Ershov's Institute of Informatics Systems, 6, Lavrentiev ave., 630090 Novosibirsk, Russia Dept. Bioinformatics, UKG/Univ. Göttingen, Goldschmidstr. 1, 37077 Göttignen, Germany Keywords: gene expression, microarray analysis, promoter models, transcription factors, binding sites, genetic algorithm 1 Introduction Cellular signal transduction networks of multicellular organisms are enormously complex though very robust in providing fast and appropriate response to any extracellular signal. This is achieved through combinatorial usage of a rather limited set of signaling molecules and pathways. These combinatorics must be mirrored by the structure of gene promoters as combinations of transcription factor binding sites (composite modules). Different signal transduction pathways leading to the activation of transcription factors converge at key molecules that master the regulation of certain cellular processes. Such crossroads of signaling networks often appear as “Achilles Heels” causing a disease when not functioning properly. Several methods were developed for the analysis of signal transduction and gene regulatory networks associated with gene expression data. However, these approaches often underestimate the role of molecular processes that occur in the cell on the post- (or pre-) transcriptional level – the “iceberg” of signal transduction that cannot be seen on the level of gene expression changes. In addition, the promoter structure which is the key component linking the gene regulation with signal trandusction network through the multiple interaction of transcription factors to their DNA binding sites are purely understood. Eventually, all the afore mentioned approaches to network analysis operate with the final products of differentially expressed genes as well as their effects on the physiology of the cell, but are not focused on the molecular mechanisms that cause the observed changes in gene expression. 2 Method and Results We developed an integrated computational tool, ExPlain for causal interpretation of gene expression data. It analyzes microarray data and proposes complexes of transcription factors as well as “upstream” key signaling molecules that master the observed gene expression profile. The method utilizes data from three databases (TRANSFAC® (Matys et al., 2006), TRANSPATH® (Krull et al., 2006) and HumanPSD http://www.biobase-international.com/ ) and integrates two programs: 1) Composite Module Analyst (CMA) analyzes 5’-upstream regions of co-expressed genes and applies a genetic algorithm to reveal composite modules (CMs) consisting of co-occurring single TF binding sites and composite elements (Waleev, at al., 2006; Kel et al., 2006a; ); 2) ArrayAnalyzer (Kel et al., 2006b) is a fast network search engine that analyzes signal transduction networks controlling the activities of the corresponding TFs and seeks key molecules responsible for the observed concerted gene activation. In the Figure 1 we show the user interface of ExPlain system and present the results of applying the system to a set of microarray data on a skin disease. A set of 150 promoters of differentially expressed genes in human fibroblasts of the patients having the skin disease has been compared to the set of 300 promoters of genes that did not showed any significant change of expression. Site frequency analysis showed that promoters of differentially expressed genes have significantly higher frequency of sites for such trancription factors as NF-kappaB, IRF-1, EGR-2 and some others (Figure 1a). Analysis of composite modules has revealed a highly significant combination of single matrices and matrix pairs that include also matrixes for such factors as AML, OCT and pairs: SP-1/ERG-1 and AP-1/OCT (Figure 1b). This composite promoter model was able to discriminate more then 60% of the differentially expressed promoters from the background promoters (Figure 1c). Finally, the analysis of the signal transduction pathways upstream of these transcription factors helps to identify several potential key molecules such as the ActR-II, which is an important factor of the Atrophin-1 (DRPLA) pathway (Figure 1d). a) b) d) c) Figure 1: UI of ExPlain with the results of analysis of site overrepresentation in promoters of genes differentially regulated in a human skin disease: a) promoter model; b) the histogram of the corresponding promoter composite score for differentially expressed genes (red) versus non-changed genes (blue) and c) the identified key-node molecule, ActR-II, with the corresponding signaling network leading to the regulation of activity of the transcription factors found in the promoter model. References [1] Waleev, T., Shtokalo, D., Konovalova, T., Voss, N., Cheremushkin, E., Stegmaier, P., Kel-Margoulis, O., Wingender, E. and Kel, A., Composite module analyst: Identification of transcription factor binding site combinations using genetic algorithm, Nucleic Acids Res., 34:W541–W545, 2006. [2] Kel, A., Konovalova, T., Waleev, T., Cheremushkin, E., Kel-Margoulis, O., and Wingender, E., Composite module analyst: A fitness-based tool for identification of transcription factor binding site combinations, Bioinformatics, 22:1190–1197, 2006. [3] Krull, M., Pistor, S., Voss, N., Kel, A., Reuter, I., Kronenberg, D., Michael, H., Schwarzer, K., Potapov, A., Choi, C., Kel-Margoulis, O., Wingender, E., TRANSPATH®: An Information resource for storing and visualizing signaling pathways and their pathological aberrations, Nucleic Acids Res., 34:D546–D551, 2006. [4] Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A.E., Wingender, E., TRANSFAC® and its module TRANSCompel®: Transcriptional gene regulation in eukaryotes, Nucleic Acids Res., 34: D108–D110, 2006. [4] Kel, A., Voss, N., Jauregui, R., Kel-Margoulis, O., and Wingender, E., Beyond microarrays: Find key transcription factors controlling signal transduction pathways, BMC Bioinformatics, 7(Suppl 2):S13, 2006.