* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Network motif wikipedia , lookup
Non-coding RNA wikipedia , lookup
Genomic imprinting wikipedia , lookup
RNA silencing wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epitranscriptome wikipedia , lookup
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Transcription factor wikipedia , lookup
Gene desert wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene regulatory network wikipedia , lookup
Modeling Promoter and Untranslated Regions in Yeast YUJING LIANG BORIS BABENKO AARON STONESTROM JAMAL BENHAMIDA ELEAZAR ESKIN Computer Science and Engineering University of California, San Diego Computer Science and Engineering University of California, San Diego Division of Biology University of California, San Diego Computer Science and Engineering University of California, San Diego Computer Science and Engineering University of California, San Diego Abstract Comparative Genomics Transcriptional regulation is the primary form Expression Analysis Positional Analysis Transcription Factor Binding Sites of gene regulation in eukaryotes. Approaches to identifying functional regions based on comparative genomics and microarray expression data have recently been applied in promoter and 3'-untranslated region (UTR) sequences in the yeast genome. Here we combine these approaches to construct a robust set of motifs active in the yeast genome. With this set we consider the combinatorial actions of these motifs and apply a linear model to explain observed expression. A deeper understanding of gene regulation in yeast is the first step toward understanding gene regulation and complex disease in higher organisms. The YKL182W gene promoter, with highlighted Transcription Factor Binding Sites: AAGTTATAGGGGAAAACTAAAAATATAAGAAAAAAAAAGGTATTGATTGATAAGGAAAAAGAACCAAGGGAAAAAT ATAAAAAAGTACATTGGGCCTTTTCATACTTGTTATCACTTACATTACAAAGAAGAACAAACAACTTTTTTAAACG AATTTTCTTTCTTCCTTTTTCAATTTATTAATTCTTTTTTTCCATACAATTCAAGGTCAAATATATTCTTATATGC TCTTTGAATATTTCTGAAAAATATATAAAGAAAAGAAACTACAAGAACAT Comparative genomics method uses aligned sequences of several closely related species to find patterns that are conserved across multiple genomes. A high rate of conservation implies that the pattern is functional and important. Speceies1: Speceies2: Speceies3: Speceies4: Speceies5: Speceies6: Speceies7: TAATATCAAAATCAATCTCAAAATTACCACCGGTTAGAACTTGG TAATGTCAAAATCAATCTCAAAGTTACCACCGGTTAGAACTTGG TAATATCAAAATCAATCTCAAAATTACCACCAGTTAGAACCTGA TAATATCGAAATCAATCTCAAAATTACCACCGGTTAGAACTTGG TGATGTCAAAATCGATCTCGAAATTACCACCAGTCAGGACTTGG TAATCTCAAAATCAATTTCAAAATTACCACCCGTCATAACTTGA TAATTTCAAAGTCAATTTCAAAGTTACCACCGGTCAAGACTTGA Purpose Our goal is to understand how the combinations of various Transcription Factor Binding Sites (TFBS) on a gene affect it’s expression in different experimental conditions. Linear Model To predict the contributions of motifs to a gene’s expression level. Each gene contains zero or more motifs Each motif (assumed to be a TFBS) has an “expression factor” score (+/-) for each experiment The expression of a gene is the sum of the scores of the motifs it contains Transcription factor binding sites are not distributed uniformly in promoter regions The motif CGATGAG most frequently occurs between 60 and 100 nucleotides away from the transcription start site (where the code for a protein begins) Significant Motifs Found Data Set 7 Yeast strains: Saccharomyces cervisiae Saccharomyces bayanus Saccharomyces castellii Saccharomyces kudriavzevii Saccharomyces mikatae Saccharomyces kluyveri Saccharomyces paradoxus 5769 promoters analyzed 1,730,700 DNA nucleotides analyzed per strain Expression data come from heat-shock microarray experiment (Stanford Microarray Database) http://smd.stanford.edu/ Pattern CGGTGGCAA appeared 15 times, and was conserved 15 times, MCS: 100.0 Pattern is conserved on: [YJL001W, YNL155W, YOR052C, YOR259C, YOR260W, YBL022C, YCL043C, YCL042W, YCR092C, YCR093W, YDL148C, YDL147W, YDL070W, YDR427W, YER012W] Pattern AGCTCATCGC appeared 29 times, and was conserved 27 times, MCS: 93.10344827586206 Pattern is conserved on: [YJL109C, YKL191W, YKR024C, YKR025W, YKR081C, YKR082W, YLR014C, YLR015W, YLR106C, YLR107W, YLR336C, YMR049C, YNL248C, YNL247W, YOL125W, YPL094C, YPL093W, YCR057C, YCR072C, YCR087C-A, YDR449C, YHR052W, YHR147C, YHR148W, YHR170W, YIL127C, YIL126W] Grouping Motifs Annotating the Genes Some of the discovered motifs are minor variants or exact reverse compliments of each other. Thus, the motifs were grouped, and each group was assignment a unique id: We can now annotate the genes with the Motif Groups that were discovered: M0 M1 M2 M3 Gene Name : Motif Groups … : : : : CGGTGGCAA, GGTGGCAAG, CGTGGC AGCTCATCGC, AGCTCATAGC GCTCATCG, CGATGAGC AGCTCATCG YPR111W: M248, M319, M74 YPR148C : M12, M153, M25 YPR194C : M127, M202, M41 YAL044W-A : M255, M27, M270, M49 Assumption We assume every motif is independent to each other. The same motif is bound by the same transcription factor and has the same affect on the expression. Limitations Finds only transcription factors activated or deactivated in an experimental condition relative to the control. Calculating the Expression Factor Gene Expression Level Motifs Y01 = 0.456 = M1 + M2 + M3 Y02 = 0.745 = M2 + M4 + M16 Y03 = 0.834 = M1 + M3 + M10 … Using a system of linear equations, we can find the value of unknowns (M1, M2…) using any linear regression technique such as least squares. Results 331 motifs are found. Using linear regression, 22 significant active motifs are found by heat-shock expression data. Some motifs and their scores: M66 : CCCCTT(AAGGGG), 1.2460824979780836 M218 : CAGGGG, 1.209783124842816 M259 : CCCTTAA(TTAAGGG), 1.1325379612649848 M264 : TAGGGG(CCCCTA), 0.8571825629506061 …