* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transcription – Gene regulation
Nucleic acid analogue wikipedia , lookup
Gene desert wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epitranscriptome wikipedia , lookup
Genome evolution wikipedia , lookup
Non-coding RNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene expression profiling wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Molecular evolution wikipedia , lookup
Gene regulatory network wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transcription factor wikipedia , lookup
Point mutation wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Transcription – Gene regulation The machine that transcribes a gene is composed of perhaps 50 proteins, including RNA polymerase, the enzyme that converts DNA code into RNA code. A crew of transcription factors grabs hold of the DNA just above the gene at a site called the core promoter, while associated activators bind to enhancer regions farther upstream of the gene to rev up transcription. Working as a tightly knit machine, these proteins transcribe a single gene into messenger RNA. The messenger RNA winds its way out of the nucleus to the factories that produce proteins, where it serves as a blueprint for production of a specific protein. http://www.berkeley.edu/news/features/1999/12/09_nogales.html a 3. Lecture WS 2004/05 Bioinformatics III 1 Transcription in E.coli and in Eucaryotes Procaryotes Eucaryotes Genes are grouped into operons Genes are not grouped in operons mRNA may contain transcript of several genes (poly-cistronic) each mRNA contains only transcript of a single gene (mono-cistronic) Transcription and translation are coupled. Transcript is translated already during transcription. Transcription and translation are NOT coupled. Transcription takes place in nucleus, translation in cytosol. Gene regulation takes place by modification of transcription rate Gene regulation via transcription rate AND by RNA-processing, RNA stability etc. 3. Lecture WS 2004/05 Bioinformatics III 2 Promoter prediction in E.coli To analyze E.coli promoters, one may align a set of promoter sequences by the position that marks the known transcription start site (TSS) and search for conserved regions in the sequences. E.coli promoters are found to contain 3 conserved sequence features - a region approximately 6 bp long with consensus TATAAT at position -10 - a region approximately 6 bp long with consensus TTGACA at position -35 - a distance between these 2 regions of ca. 17 bp that is relatively constant a 3. Lecture WS 2004/05 Bioinformatics III 3 Gene regulatory promoter network In E.coli, 240 transcription factors have been verified that regulate 3000 genes. Binding site matrics are available for more than 55 E.coli TFs (Robison et al. 1998) In S. cerevisae, genome-wide binding analysis of 106 transcription factors indicates that more than one-third of the promoter regions that were bound by regulators were bound by 2 or more regulators. Highly connected network of transcriptional regulators. 3. Lecture WS 2004/05 Bioinformatics III 4 Feasibility of computational motif search? Computational identification of transcription factor binding sites is difficult because they consist of short, degenerate sequences that occur frequently by chance. The problem is not easy to define (therefore: it is „complex“) because - the motif is of unknown size - the motif might not be well conserved between promoters - the sequences used to search for the motif do not necessarily represent the complete promoter - genes with promoters to be analyzed are in many cases grouped together by a clustering algorithm which has its own limitations. 3. Lecture WS 2004/05 Bioinformatics III 5 Strategy 1 Arrival of microarray gene-expression data. Group of genes with similar expression profile (e.g. those that are activated at the same time in the cell cycle) one may assume that this profile ist, at least partly, caused by and reflected in a similar structure of the regions involved in transcription regulation. Search for common motifs in < 1000 base upstream regions. Sofar used: detection of single motifs (representing transcription-factor binding sites) common to the promoter sequences of putatively co-regulated genes. Better: search for simultaneous occurrence of 2 or more sites at a given distance interval! Search becomes more sensitive. 3. Lecture WS 2004/05 Bioinformatics III 6 Motif identifaction A flowchart to illustrate the two different approaches for motif identification. We analyzed 800 bp upstream from the translation start sites of the five genes from the yeast gene family PHO by the publicly available systems MEME (alignment) and RSA (exhaustive search). MEME was run on both strands, one occurrence per sequence mode, and found the known motif ranked as second best. RSA Tools was run with oligo size 6 and noncoding regions as background, as set by the demo mode of the system. The wellconserved heptamer of the motifs used by MEME to build the weight matrix is printed in bold. Ohler, Niemann Trends Gen 17, 2 (2001) 3. Lecture WS 2004/05 Bioinformatics III 7 Strategy 2: Exhaustive motiv search in upstream regions Exploit the finding that relevant motifs are often repeated many times, possibly with small variations, in the upstream region for the regulatory action to be effective. Search upstream region for overrepresented motifs (1) Group genes based on the overrepresented motifs (2) Analyze sets of genes that share motifs for coregulation in microarray exp. (3) Consider overrepresented motifs labelling sets of co-regulated genes as candidate binding sites. Cora et al. BMC Bioinformatics 5, 57 (2004) 3. Lecture WS 2004/05 Bioinformatics III 8 Exhaustive motiv search in upstream regions Exploit Cora et al. BMC Bioinformatics 5, 57 (2004) 3. Lecture WS 2004/05 Bioinformatics III 9 Exhaustive motiv search in upstream regions Cora et al. BMC Bioinformatics 5, 57 (2004) 3. Lecture WS 2004/05 Bioinformatics III 10 Exhaustive motiv search in upstream regions Cora et al. BMC Bioinformatics 5, 57 (2004) 3. Lecture WS 2004/05 Bioinformatics III 11 Recently published tools for promoter finding Ohler, Niemann Trends Gen 17, 2 (2001) 3. Lecture WS 2004/05 Bioinformatics III 12 Position-specific weight matrix Popular approach when list of genes available that share TF binding motif; Good multiple sequence alignment available. Alignment matrix: lists # of occurrences of each letter at each position of an alignment Hertz, Stormo (1999) Bioinformatics 15, 563 3. Lecture WS 2004/05 Bioinformatics III 13 Position-specific weight matrix Examples of matrices used by YRSA http://forkhead.cgb.ki.se/YRSA/matrixlist.html 3. Lecture WS 2004/05 Bioinformatics III 14 Exp. Identification of TF binding site: DNase 1 Footprinting A protein bound to a specific DNA sequence will interfere with the digestion of that region by DNase I. * * An end-labelled DNA probe is incubated with a protein extract or a purified DNA-binding factor. * * The unprotected DNA is then partially digested with DNase I such that on average every DNA molecule is cut once. Digestion products are then resolved by electrophoresis. Denaturing PAGE Footprint Comparison of the DNase I digestion pattern in the presence and absence of protein will allow the identification of a footprint (protected region) 3. Lecture WS 2004/05 Bioinformatics III 15 Gel retardation assays Gel Shifts Electro Mobility Shift Assay (EMSA) Band Shift Incubating a purified protein, or a complex mixture of proteins e.g. nuclear or cell extract, with a 32P end-labelled DNA fragment containing the putative protein binding site (from promoter region). Reaction products are then analysed on a nondenaturing polyacrylamide gel. The specificity of the DNA-binding protein for the putative binding site is established by competition experiments using DNA fragments or oligonucleotides containing a binding site for the protein of interest, or other unrelated DNA sequences. 3. Lecture WS 2004/05 Bioinformatics III No protein * add protein * Non-denaturing PAGE Retarded mobility due to protein binding Free DNA probe 16 3D structures of transcription factors 1A02.pdb 1AU7.pdb 1AM9.pdb TFs bind with very different binding modes. Some are sensitive for DNA conformation. 2 TFs bound! 1CIT.pdb 1GD2.pdb 1H88.pdb http://www.rcsb.org 3. Lecture WS 2004/05 Bioinformatics III 17 DNA conformation Canonical and mechanically distorted forms of helical DNA (from left to right: A-DNA, B-DNA, overstretched S-DNA,32 overtwisted P-DNA33). Conformational fluctuations of a BDNA oligomer with an alternating GA sequence. The snapshots (100 ps intervals) from a simulation at 300 K using explicit solvent and counterions show axis and backbone fluctuations E. Giudice, R. Lavery (2002) Acc. Chem. Res. 35, 350-357. 3. Lecture WS 2004/05 Bioinformatics III 18 DNA conformation Induced base opening within B-DNA. Images show the conformational changes associated with moving thymine (bold) into the major groove of an oligomer with an alternating GA sequence. E. Giudice, R. Lavery (2002) Acc. Chem. Res. 35, 350-357. 3. Lecture WS 2004/05 Bioinformatics III 19 EM low-resolution structure of TF machinery Single particle images 3D reconstruction of TFIID Nogales et al. Science (1999) 3. Lecture WS 2004/05 Bioinformatics III 20 Identification of individual components Position of IIB and IIA on the TFIID structure and mapping of the TBP. The blue mesh corresponds to the holo-TFIID, with the A, B, and C lobes indicated. (A) The green mesh corresponds to the density difference between the holo-TFIID and the TFIID-IIB complex. (B) The magenta and green meshes show the density difference between the holo-TFIID and the trimeric complex TFIID-IIA-IIB. The density depicted in light green can be attributed to TFIIB by comparison with (A), and the magenta density therefore corresponds to IIA. (C) The yellow mesh shows the density difference between the holo-TFIID and TFIID that is bound to the TBP antibody. Nogales et al. Science (1999) 3. Lecture WS 2004/05 Bioinformatics III 21 database for eukaryotic transcription factors: TRANSFAC BIOBase / TU Braunschweig / GBF Relational database 6 flat files: FACTOR interaction of TFs SITE their DNA binding site GENE through which they regulate these target genes CELL factor source MATRIX TF nucleotide weight matrices CLASS classification scheme of TFs Wingender et al. (1998) J Mol Biol 284,241 3. Lecture WS 2004/05 Bioinformatics III 22 database for eukaryotic transcription factors: TRANSFAC BIOBase / TU Braunschweig / GBF Matys et al. (2003) Nucl Acid Res 31,374 3. Lecture WS 2004/05 Bioinformatics III 23 MatchTM Search for putative TF binding sites in DNA sequences based on weight matrices. Use 2 values to score putative hits: Matrix similarity score: quality of a match between the sequence and the whole matrix [0,1] Core similarity score: quality of a match between the sequence and the core sequence of a matrix which consists of the five most conserved consecutive positions in a matrix [0,1] Profile: set of matrices and their cut-offs designed for function-driven searches Special profiles available for immune-cells, muscle cells, liver cells, and for cellcycle. Matys et al. (2003) Nucl Acid Res 31,374 3. Lecture WS 2004/05 Bioinformatics III 24 database for eukaryotic transcription factors: TRANSFAC BIOBase / TU Braunschweig / GBF Matys et al. (2003) Nucl Acid Res 31,374 3. Lecture WS 2004/05 Bioinformatics III 25 TRANSFAC classification 1 Superclass basic domains 1.1 Leuzine zipper factors (bZIP) 1.2 Helix-loop-helix factors (bHLH) 1.3 bHLH-bZIP 1.4 NF-1 1.5 RF-X 1.6 bHSH 3 Superclass: Helix-turn-helix 4 Superclass: beta-Scaffold Factors with Minor Groove Contacts 5 Superclass: others 2 Superclass: Zinc-coordinating DNA-binding domains 2.1 Cys4 zinc finger of nuclear receptor type 2.2 diverse Cys4 zinc fingers 2.3 Cys2His2 zinc finger domains 2.4 Cys6 cysteine-zinc cluster 2.5 Zinc fingers of alternating composition http://www.gene-regulation.com/pub/databases/transfac/cl.html 3. Lecture WS 2004/05 Bioinformatics III 26 TRANSFAC classification Eintrag für 1.1 Leuzine-Zippers http://www.gene-regulation.com 3. Lecture WS 2004/05 Bioinformatics III 27 TRANSFAC classification http://www.gene-regulation.com 3. Lecture WS 2004/05 Bioinformatics III 28 TRANSFAC classification http://www.gene-regulation.com 3. Lecture WS 2004/05 Bioinformatics III 29 Summary Large databases available (e.g. TRANSFAC) with information about promoter sites. Information verified experimentally. Microarray data allows searching for common motifs of coregulated genes. Also possible: common GO annotation etc. TF binding motifs are frequently overrepresented in 1000 bp upstream region. Clear function of this is unknown. (Same as in proline-rich recognition sequences.) Relatively few TFs regulate large number of genes. Complex regulatory network, Thursday lecture. http://www.gene-regulation.com 3. Lecture WS 2004/05 Bioinformatics III 30