* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Identification of Transcription Factor Binding Sites
Non-coding DNA wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Microevolution wikipedia , lookup
Epigenetics of depression wikipedia , lookup
Transcription factor wikipedia , lookup
Gene desert wikipedia , lookup
Designer baby wikipedia , lookup
Epitranscriptome wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genome evolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Identification of Transcription Factor Binding Sites Presenting: Mira & Tali March 03 Goal AGCCA AGCCA AGCCA AGCCA Regulatory regions AGCCA Motif – AGCCA Binding site??? Why Bother? UNDERSTAND Gene expression regulation Co-regulation Difficulties Multiple factors for a single gene Variability in binding sites The nature of variability is NOT well understood Usually Transitions Insertions and deletions are uncommon Location, location, location… Experimental methods EMSA – Electrophoretic mobility shift assay Nuclease protection assay NOT ENOUGH!!!!! So, what can we do? Find conserved sequences in regulation regions 1. Define what you want to find 2. Define what is a good result 3. Decide how to find it… Principal Methods: Global optimum Enumerative methods Going over ALL possibilities Taking the best one Advantage : Disadvantage : Certainty Limited to small search spaces Principal Methods: Local optimum Gibbs sampling, AlignACE Start somewhere (arbitrary) Next step direction – proportional to what we “gain” from it We can get anywhere with some probability Advantage : Disadvantage : Basically good results, faster You can never know… Articles Overview Identifying motifs Expression patterns Phylogenetic footprinting Identifying networks Common motifs in expression clusters Combinatorial analysis Discovery of novel trancription factor binding sites by statistical overrepresentation S. Sinha, M. Tompa Goal: Identify binding sites in yeast Use sets of coregulated genes Enumeration YMF algorithm Identify overrepresented upstream sequences What constitutes a motif? (tailored for S.cerevisiae) In S.cerevisiae typically 6-10 conserved bases – The motif Spacers varying in length (1-11bp) Usually located in the middle ACCNNNNNNGTT Taken from SCPD – S.cerevisiae promoter database How do we measure motifs? Z-score – Motif over-representation Pmax(X) – Probability of Zscore >= X YMF algorithm Yeast Motif Finder INPUT: A set of promoter regions Transition Matrix Motif length - l • modest values 6 Maximum number of spacers allowed - w 11 YMF algorithm Post Processing: FindExplanators: artificial overrepresentation TCACGCT (motif) CACGCTA (artifact) Co-expression score W-score Experiments Validate YMF results Running YMF on regulons with known binding sites (SCPD) Run YMF on MIPS catalogs (MIPS - Munich Information center for Protein Sequences) Functional Mutant phenotype Validation New binding sites or false positives? A novel site candidate Further research Validation of novel binding sites and transcription factors Modification of the algorithm to be applicable for other organisms Systematic determination of genetic network architecture Saeed Tavazoie, Jason D. Hughes, Michael J. Campbell, Raymond J. Cho, George M. Church Goal: Identify co- regulated networks of genes in yeast Cluster by expression patterns AlignACE Identify upstream sequence patterns Aligns Nucleic Acid Conserved Elements Clusters Cluster – a group of genes with a similar expression pattern Cluster’s members Tend to participate in common processes Tend to be co-regulated Clusters 10-54 Identifying motifs Using AlignACE 18 motifs from 12 clusters were found. 7 of the found motifs were identified experimentally And what about the others???? Scanning for more binding sites Once a significant motif was found the whole genome was scanned for it Most motifs were cluster specific Why so few motifs? Too stringent rules for defining a “significant” motif Post transcriptional regulation (mRNA stability) Some clusters represent “noise” “Tightness” “Tightness” of a cluster how close are the cluster members of a particular cluster to its mean A strong correlation between the presence of significant motifs and the “tightness” of a cluster Things to remember Discovering regulons and motifs using expression based clustering Minimal biases Validation as a methodology for new organisms Identifying expected cis-regulatory motif EACH TIME!! Identifying regulatory networks by combinatorial analysis of promoter elements by Yitzhak Pilpel, Priya Sudarsanam & George M.Church Goals: Identify motif combinations affecting expression patterns in yeast Understand transcriptional network Basic definitions Expression coherence score- Synergistic motifs – EC(a&b) > EC(a\b) , EC(b\a) Methods: A database of motifs Gene sets Calculating EC score Significant synergistic combinations Understanding the effect of individual and combination of motifs Visualizing the transcriptional network GMC GMC – Gene Motif Combination. Motif numbers: (m1, m2, m3, m4, m5) = (1,0,1,1,0) Synergistic motif combinationEC(n motifs) > max(EC(n-1 motifs)) GMC – what is it good for? Combinograms Clustering GMCs Combinograms – what is it good for? They help visualizing the “single motif - specific expression pattern” connection They also show which motif is more critical in determining expression pattern. Motif synergy map visualizing transcription networks conclusion The combinogram importance The motif synergy map importance Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes Lee Ann McCue, William Thompson, C.Steven Carmack, Michael P.Ryan, Jun S.Liu, Victoria Derbyshire and Charles E.Lawrence Goals: Identifying novel TF binding sites in E.coli Finding orthologs Describing transcription regulatory network Identify upstream sequence patterns Local optimum Gibbs sampling algorithm Methods: One E.coli gene and orthologs Data set Gibbs sampling algorithm MAP score – a measure of overrepresentation of motif Motif Applying the method in a small scale – Validation Choosing 190 E.coli genes. Creating 184 data sets. Running Gibbs sampling algorithm. More than 67% success in the prediction for the most probable motif. Motif Model Identification of the YijC binding sites A strongly predicted site was upstream of the fabA, fabB and yqfA genes. Chromatography – identifying the factor. Identifying the YijC binding sites and predicting gene function Mass spectrometry identification – YijC Predicting a function for yqfA. Applying the method genome wide Choosing 2113 E.coli ORFs. For 2097 a TF-binding site was predicted. Map scores- ortholog distribution Study set Full set Adding binding sites for known TFs Building a TF binding site model for known TFs. Scanning E.coli upstream regions. 187 new probable sites. Building a regulatory network Required steps: Identifying motif models Clustering the models Problem: Specifity Conclusion What have we gained so far? A better prediction of gene function. New possibilities for identification of TF binding site and the TF which binds them!!!