* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How do proteins recognize DNA
Protein–protein interaction wikipedia , lookup
RNA silencing wikipedia , lookup
Biochemistry wikipedia , lookup
Community fingerprinting wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Metalloprotein wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
DNA supercoil wikipedia , lookup
Proteolysis wikipedia , lookup
Paracrine signalling wikipedia , lookup
Gene regulatory network wikipedia , lookup
Genetic code wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Messenger RNA wikipedia , lookup
Polyadenylation wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epitranscriptome wikipedia , lookup
Biosynthesis wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Transcription factor wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Gene expression wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
How do proteins recognize DNA Modes: Single-stranded Binding Proteins Double-stranded Binding Proteins Sequence-Independent Binding Proteins Sequence-dependent Binding Proteins Lesion-specific (mismatch, 8-oxoG, thymidine dimers, etc) Binding Proteins Information Content: Major Groove versus Minor Groove DNA Packaging in Bacteria Procaryotic HU protein HU protein (bacteria) is a small (9kDa), basic (cationic), abundant (60,000 copies per cell) protein with two subunits; HUα and HUβ. HU protein condenses and packages DNA and regulates DNA replication. See Hu Pymol Script DNA Packaging in Eukaryotes: Histones and Nucleosomes Image from wikipedia The nucleosome core particle is around 147 base pairs of DNA wrapped around a histone octamer (8 proteins) with 2 copies each of histones H2A, H2B, H3, and H4. Nucleosome core particles are connected by "linker DNA", that is around 80 bp in length. See nucleosome Pymol Script Models for the 30 nm fiber The genetic switch of the lambda bacteriophages (lambda, 434, P22). The regulation of phage λ is reasonable well-understood at a mechanistic and structural level. The phage in a infected bacterium decides between two alternative lifestyles, the lytic or the lysogenic pathways. Lysogenic: dormant, integrated into the host bacteria's chromosome Lytic: lyse the host bacterium - favored under conditions of low temperature, starvation, high multiplicity of infection, SOS. The genes that express the proteins Repressor and Cro are on opposite sides of a common operator region. Repressor protein stimulates transcription of Repressor and blocks transcription of Cro, and maintains the lysogenic state. Cro protein, does the opposite. It stimulates transcription of Cro and blocks transcription of Repressor and turns the switch to the lytic state. The operator region contains three three operators (OR1, OR2, OR3). Operators are DNA sequences. The Repressor and Cro genes are transcribed in opposite directions. The two promoters (RNA polymerase initiation binding sites) overlap in the central operator, OR2. The helix-turn-helix (HTH) DNA binding motif contains two α helices (helices 3 and 4 on this figure) joined by a short linker. The HTH motif is seen in Cro, CAP, and λ repressor. Recognition and binding takes place in the major groove, Helix 3 (figure) contributes most to DNA recognition and is called the "recognition helix". The recognition helix binds by a combination of hydrogen bonds and/or van der Waals interactions with the edges of bases. The other α helix locks the recognition helix into position. See P22 and lambda repressor pymol script Transcription Factors A transcription factor is a protein that binds to specific DNA sequences, thereby controlling transcription. Transcription factors act alone or with other proteins, by promoting (activator), or blocking (repressor) the binding of RNA polymerase to specific genes. A transcription factors contains one or more DNA-binding domains (DBDs), which bind to specific sequences of DNA adjacent to the genes that they regulate. constitutively-active conditionally-active Systematic DNA-Binding Domain Classification of Transcription Factors. Stegmaier P, Kel AE, Wingender, E Genome Inform 15:276-286 (2004) . 1 Superclass: Basic Domains (Basic-helix-loop-helix) 1.1 Class: Leucine zipper factors (bZIP) 1.1.1 Family: AP-1(-like) components; includes (c-Fos/c-Jun) 1.1.2 Family: CREB 1.1.3 Family: C/EBP-like factors 1.1.4 Family: bZIP / PAR 1.1.5 Family: Plant G-box binding factors 1.1.6 Family: ZIP only 1.2 Class: Helix-loop-helix factors (bHLH) 1.2.1 Family: Ubiquitous (class A) factors 1.2.2 Family: Myogenic transcription factors (MyoD) 1.2.3 Family: Achaete-Scute 1.2.4 Family: Tal/Twist/Atonal/Hen 1.3 Class: Helix-loop-helix / leucine zipper factors (bHLH-ZIP) 1.3.1 Family: Ubiquitous bHLH-ZIP factors; includes USF (USF1, USF2); SREBP (SREBP) 1.3.2 Family: Cell-cycle controlling factors; includes c-Myc 1.4 Class: NF-1 1.4.1 Family: NF-1 (A, B, C, X) 1.5 Class: RF-X 1.5.1 Family: RF-X (1, 2, 3, 4, 5, ANK) 1.6 Class: bHSH 2 Superclass: Zinc-coordinating DNA-binding domains 2.1 Class: Cys4 zinc finger of nuclear receptor type 2.1.1 Family: Steroid hormone receptors 2.1.2 Family: Thyroid hormone receptor-like factors 2.2 Class: diverse Cys4 zinc fingers 2.2.1 Family: GATA-Factors 2.3 Class: Cys2His2 zinc finger domain 2.3.1 Family: Ubiquitous factors, includes TFIIIA, Sp1 2.3.2 Family: Developmental / cell cycle regulators; includes Krüppel 2.3.4 Family: Large factors with NF-6B-like binding properties 2.4 Class: Cys6 cysteine-zinc cluster 2.5 Class: Zinc fingers of alternating composition 3 Superclass: Helix-turn-helix 3.1 Class: Homeo domain 3.1.1 Family: Homeo domain only; includes Ubx 3.1.2 Family: POU domain factors; includes Oct 3.1.3 Family: Homeo domain with LIM region 3.1.4 Family: homeo domain plus zinc finger motifs 3.2 Class: Paired box 3.2.1 Family: Paired plus homeo domain 3.2.2 Family: Paired domain only 3.3 Class: Fork head / winged helix 3.3.1 Family: Developmental regulators; includes forkhead 3.3.2 Family: Tissue-specific regulators 3.3.3 Family: Cell-cycle controlling factors 3.3.0 Family: Other regulators 3.4 Class: Heat Shock Factors 3.4.1 Family: HSF 3.5 Class: Tryptophan clusters 3.5.1 Family: Myb 3.5.2 Family: Ets-type 3.5.3 Family: Interferon regulatory factors 3.6 Class: TEA ( transcriptional enhancer factor) domain 3.6.1 Family: TEA (TEAD1, TEAD2, TEAD3, TEAD4) 4 Superclass: beta-Scaffold Factors with Minor Groove Contacts 4.1 Class: RHR (Rel homology region) 4.1.1 Family: Rel/ankyrin; NF-kappaB 4.1.2 Family: ankyrin only 4.1.3 Family: NFAT (Nuclear Factor of Activated T-cells) (NFATC1, NFATC2, NFATC3) 4.2 Class: STAT 4.2.1 Family: STAT 4.3 Class: p53 4.3.1 Family: p53 4.4 Class: MADS box 4.4.1 Family: Regulators of differentiation; includes (Mef2) 4.4.2 Family: Responders to external signals, SRF (serum response factor) (SRF) 4.5 Class: beta-Barrel alpha-helix transcription factors 4.6 Class: TATA binding proteins 4.6.1 Family: TBP 4.7.1 Family: SOX genes, SRY 4.7.2 Family: TCF-1 (TCF1) 4.7.3 Family: HMG2-related, SSRP1 4.7.5 Family: MATA 4.8 Class: Heteromeric CCAAT factors 4.8.1 Family: Heteromeric CCAAT factors 4.9 Class: Grainyhead 4.9.1 Family: Grainyhead 4.10 Class: Cold-shock domain factors 4.10.1 Family: csd 4.11 Class: Runt 4.11.1 Family: Runt 0 Superclass: Other Transcription Factors 0.1 Class: Copper fist proteins 0.2 Class: HMGI(Y) (HMGA1) 0.2.1 Family: HMGI(Y) 0.3 Class: Pocket domain 0.4 Class: E1A-like factors 0.5 Class: AP2/EREBP-related factors 0.5.1 Family: AP2 0.5.2 Family: EREBP 0.5.3 Superfamily: AP2/B3 0.5.3.1 Family: ARF 0.5.3.2 Family: ABI 0.5.3.3 Family: RAV The Basic Leucine Zipper Domain (bZIP domain) contains basic peptide sequences that mediate DNA binding and leucine zipper dimerization region. GCN4 Fos Max Jun GCN4 AA sequence, b-zip protein Methyl GCN4 Show 1YSA vmd Basic Helix-Loop-Helix Two α-helical regions connected by a loop. The smaller helix is the dimerization region. The larger helix contains basic amino acid residues that interact with DNA. bHLH proteins generally bind to a consensus sequence called an E-box, CANNTG. homodimer of Max Recognizes the same consensus sequence as MyoD (i.e. CAC), but with different AA. Transcription During transcription, an RNA complement (a transcript) of a DNA sequence is synthesized. If the DNA templete (antisense) sequence is '5 ...GGGCATT... 3', then the RNA transcript has sequence 5' ...AAUGCCC... 3'. http://biology.unm.edu/ccouncil/Biology_124/Summaries/T&T.html Transcription A DNA’ transcription unit’ can contain (1) the sequence that will eventually be directly translated into the protein (the coding sequence). (2) Introns - that will be removed by splicing. (3) regulatory sequences - that direct and regulate the synthesis of the protein. The regulatory sequence before (upstream from) the coding sequence is called the 5'UTR and followin (downstream from) the coding sequence is called the (3'UTR). UTRs can contain riboswitches, etc. All RNA is made by transcription. There are many types of RNA produced by transcription. 1) Messenger RNAs (mRNA) are coding RNAs. mRNAs carry information contained within DNA to the ribosome, where they direct the sequence of amino acids during protein synthesis, according to the mRNA sequence and the 'genetic code'. The sequence of codons (nucleotide triplets) in an mRNA determines the amino acid sequence in a protein. Some mRNAs contain cis regulatory elements, such as riboswitches, in the untranslated regions (either 3' or 5' UTRs). 2) Ribosomal RNAs (rRNA) are structural and catalytic components of the ribosome, the large RNA-protein assembly where protein is synthesized in all living systems. In the ribosome, amino acids are transfered from tRNAs to a nascent (growing) polypeptide chain, with the amino acid sequence controlled by the mRNA. The peptidyl transferase center, which is the catalytic site of the ribosome, is all rRNA. So technically the ribosome is a ribozyme, not a protein enzyme. 3) Transfer RNAs (tRNA) are RNAs that become covalently linked to amino acids (activating the amino acids). tRNAs contain anti-codons that interact with condons on mRNAs. tRNAs transfer amino acids to a nascent polypeptide chain in the ribosome. The covalent linkage between a given amino acid and the correct (cognate) tRNA is catalyzed by a specific aminoacyl-tRNA synthetase (one for each amino acid). The aminoacyl-tRNA synthetases establish and enforce the genetic code. 4) Regulatory RNAs MicroRNAs (miRNAs) are ~22 nucleotides in length that down regulate and silencing of gene expression (mRNA degradation & sequestering and translational suppression) CRISPR RNAs These work in the Prokaryotic immune system RNA polymerase (RNAP) is an enzyme that produces RNA using DNA as a template. RNAP is essential to modern life and is found in all living systems. RNAP is a nucleotidyl transferase that adds a ribonucleotide to the 3' hydroxyl group of an RNA molecule. The reaction is driven by release of PPi hydrolysis of PPi to Pi + Pi. RNAP can initiate without a primer. α: yellow and green β: cyan β : pink ω: gray Bacterial RNA polymerase: Taq RNAP core enzyme * α: two α subunits help with assembly and bind with regulatory elements. * β: the polymerase activity (catalyzes the synthesis of RNA). * β': binds to DNA (nonspecifically). * ω: promotes assembly The RNAP is large (~450 kDa). The core enzyme is α2ββ ω (5 subunits) * α: two α subunits help with assembly and bind with regulatory elements. * β: the polymerase activity (catalyzes the synthesis of RNA). * β': binds to DNA (nonspecifically). * ω: promotes assembly Only one strand of DNA is transcribed (unlike replication). The sense strand has the same sequence as the transcribed RNA. The antisense strand is the DNA template. Bacterial genes are found in operons. The transcription of many genes with related functions can be controlled by a single control elements. An operon is a cluster of genes under the control of a single regulatory signal or promoter. Eukaryotic genes are controlled independently (generally). Steps in transcription (1) Initiation Starts at the +1 position, usually a purine. RNAP does not require a primer. Initiation involves a DNA Promoter, Transcription Factors, DNA Helicases, RNAP, Activators and Repressors. RNAP binds very tightly to the promotor (KD=10-14M ). A promoter is a DNA sequence from -1 to around -40 (i.e., on the 5 side of the sense strand of gene). RNAP binds to the Pribnow box during initiation which is the first region where base pairs are disrupted. The Pribnow box (TATAAT) is the most conserved part of the promoter. Expression of various genes (or operons) is controlled by various σ factors that recognize the -10 to -35 region. Different σ factors recognize different sequences. The α subunit recognizes an upstream element (-40 to -70 base pairs, TTGACA) of the DNA. Steps in transcription (2) Promoter clearance After initiation the RNAP has a tendency to release truncated RNA transcripts (abortive initiation). Abortive initiation continues to occur until the σ factor rearranges and is released. Steps in transcription (3) Elongation RNA polymerase traverses the template (antisense) strand, and following the rules of Watson-Crick complementarity with the antisense strand, creates an RNA copy of the sense (coding) strand. Polymerization is processive (without dissociation). Transcripts can be thousands or even millions of nucleotides. The rate of polymerization is around 50 nucleotides/second, slower than replication. The error rate of transcription is around 1 in 4000. RNA polymerase traverses the template strand from 3' → 5'. Polymerization occurs in the 5' → 3' direction. The resulting RNA transcript is a copy of the sense (coding, non-template) strand, except that thymines are replaced with uracils, and deoxyriboses are replaced by riboses. A second RNAP can quickly reinitiate from the same site. Topology issues during elongation. topoisomerase I re-initiation Figure 26-7 gyrase Steps in transcription (4) Termination Transcription terminates at specific sites. Bacteria use two strategies for termination. In Rho-independent termination, the newly synthesized RNA molecule forms a G-Crich stem-loop followed by a run of A's and U's. It seems the stem loop causes the RNAP to pause and ultimately to dissociate. In the "Rho-dependent" termination, a protein "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex. Rho-independent termination at a GC-rich palindrome. Figure 26-9a rho Figure 26-10 Eukaryotic Transcription: Multiple RNAPs for different kinds of RNAs. Eukaryotes have more complicated polymerases and control mechanisms RNAPs + control proteins = ~100 proteins for recognition and initiation. RNAP I: rRNAs (except 5S rRNA), requires promoters and upstream promoters. (between -107 and -187) RNAP II: mRNAs. long diverse promoters. Around ¼ of human genes are regulated by a TATA box (position -27). TATA is observed in both archaea and eukarya and is thought to be evolutionarily ancient. Enhancers: promoters that are remote from the start site. RNAP III: 5S rRNA, tRNA, small RNAs Core promoter sequences. A given promoter can contain all, some, or none of these. Figure 26-14 Transcription Factors TBP: (part of TFIID) Figure 26-15 Figure 26-17b Transcription Factors Figure 26-17a yeast RNAP II [bacteria homolog: α: yellow and green] Rpb2 (forms clamp, wall) [bacteria homolog: β: cyan] Rpb1, catalysis [bacteria homolog: β : pink] [bacteria homolog: ω: gray] yeast RNAP II Rpb1 C-terminal domain: heptapeptide repeats -(Pro-Thr-Ser-Pro-Ser-Try-Ser)26(yeast)-52(mammel)- unphosphorylated: initiation phosphorylated: elongation Figure 26-11 Trapped elongation (minus rU)) RNAP II Zinc Fingers - small (~25 aa), independently-folding motifs that coordinate zinc ions with some combination of cysteine and histidine residues, - bind to DNA, RNA, proteins, or small molecules. - at least 1000 mamalian zinc figure proteins. Type by zinc coordination: Cys2His2, Cys4, Cys6, etc. Type by protein fold: classic zinc finger, treble clef, and zinc ribbon. Cys2His2 (most frequent) β-strand – turn – β-strand – turn – α-helix (F/Y)-X-C-X2-C-X3-(F/Y)-X5-y-X2-H-X3-H, where X represents any amino acid and y is a hydrophobic residue, Pavletich NP, Pabo CO (1991) Zinc Finger-DNA Recognition: Crystal Structure of a Zif268-DNA Complex at 2.1 A. Science 252:809-817. Amino acid sequence of zinc finger domains of Zif 268 Basic structure predicted by Klug group from sequence and zinc requirement. The classic zinc finger motif