* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transcription Factors (from Wray et al Mol Biol Evol 20:1377)
Human genome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Histone acetyltransferase wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Polyadenylation wikipedia , lookup
Transposable element wikipedia , lookup
Genome evolution wikipedia , lookup
RNA silencing wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Messenger RNA wikipedia , lookup
History of RNA biology wikipedia , lookup
Microevolution wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Epigenetics of depression wikipedia , lookup
Gene expression programming wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Point mutation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Non-coding DNA wikipedia , lookup
Non-coding RNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transcription factor wikipedia , lookup
Transcription The Gene • Complex collection of sequences that o Controls a phenotype Individually OR Complexed with action of other genes • Size varies • Structural features vary • Encode for a protein(s) that is translated from a mRNA • Expression o Requires many associated factors Transcription - the synthesis of RNA from a DNA template Transcription, whether prokaryotic or eukaryotic, has three main events. 1. Initiation • Binding of RNA polymerase to double-stranded DNA • This step involves a transition to single-strandedness in the region of binding • RNA polymerase binds at a sequence of DNA called the promoter Initiation is the most important step in gene expression!!! 2. Elongation • The covalent addition of nucleotides to the 3' end of the growing polynucleotide chain • Involves the development of a short stretch of DNA that is transiently single-stranded 3 Termination • The recognition of the transcription termination sequence and • Release of RNA polymerase Eukaryotic RNA Polymerase 1. Three types exist Type of Polymerase RNA Polymerase I RNA Polymerase II RNA Polymerase III Product rRNA hnRNA tRNA Location nucleolus nucleoplasm nucleoplasm 2.Protein is greater than 500 kd in size 3. Multiple components • Two large subunits • <10 small subunits 4. Many non-polymerase factors required for binding of the enzyme to DNA Product of Transcription Transcription Unit • Extends from the promoter to the termination sequences Upstream Sequences • Sequences before the start site Downstream Sequences • Sequences after the start site Primary Transcript. • Immediate transcription product Steps in Model Eukaryotic Transcription Adenovirus late promoter • Requires four accessory factors and RNA Polymerase II added in a defined manner Order Factor Length of promoter covered (bp) 1. TFIID -42 to -17 (binds TATA box) 2. TFIIA -80 to -17 3. TFIIB -80 to -17 and -10 to +10 4. RNA Polymerase II -80 to +15 5. TFIIE -80 to +30 The Transcription Product Heterogeneous nuclear RNA • hnRNA • Complexity of hnRNA is 4x the mRNA pool o Why?? Splicing of introns from the primary transcript o Average size 8000 - 10,000 nucleotides o Range 2000 - 14,000 nucleotides Splicing • Removes the introns from the hnRNA • Alternate splicing o An intron is skipped; or o Uses signals other than the GT/AG associated with introns signals o Result Multiple transcripts and proteins from a single gene sequence Finishing the mRNA A. 5' Capping • Protects the transcript o Occurs so quickly that we rarely see the original 5' base of the message • Unique nucleotide o 5' methyl guanosine • Added immediately after the start of transcription • Sequence linkage o 5' methyl guanosine 5'-5' linkage Not the typical 5'-3' linkage • Enzyme o Guanylyl transferase. B. 3' Polyadenylation • Adds a Poly-A tail (many adenines) to end of transcript • Found in all eukaryotic mRNA • Sequence signal for adding the poly-A tail o 5'-AAUAAA-3' • Located o About 10-30 bp upstream of the poly A tail. Transcription Factors General Terms and Concepts Promoter • Difficult to define • General definition o All the DNA sequences containing binding sites for RNA polymerase and the transcription factors necessary for normal transciption Transcription Factor • Any protein other than RNA polymerase that is required for transcription Functions of Transcription Factors • Bind to RNA Polymerase • Bind another transcription factor • Bind to cis-acting DNA sequences Basal Transcription Apparatus • RNA polymerase + General transcription factors • Both needed to initiate transcription Upstream Transcription Factors • Ubiquitous factors that increase the efficiency of transcription initiation • Set of factors unique to each promoter Inducible Transcription Factors • Act in the same manner as an upstream factor o BUT Their synthesis is regulated in a temporal or spatial manner Transcription Factors (from Wray et al Mol Biol Evol 20:1377) Phenotype is affected by mutations in: 1. Structural region of a gene • Function of a protein is modified (structure/function relationship) 2. Regulatory region of a gene • When the protein is expressed (gene regulation) Considerations of gene regulations 1. Change the regulation pattern of a gene can change phenotype 2. One transcription factor (TF) can affected multiple genes in a pathway 3. TF orhtologs will regulate different organisms differently 4. Promoter contain module that can be changed to affect expression Approaches to studying gene regulation 1. Mutants Do induced mutants represent natural variation? 2. Expression patterns • Expression patterns of orthologs can differ among species 3. Expression levels • Phenotypic differences can result from changes in the amount of a protein Effect of varying expression level 1. Spatial effects • Varying the amount of expression in a tissue can change phenotype 2. Cis-effects • Variation in expression level often related to changes in ciselements 3. Inducibility • Alleles can be induced differentially Levels of expression can vary at the: 1. mRNA level 2. Protein level What amount of the gene expression variation is the result of “controlling region” variation??? 1. Natural variation exists in promoters • Associated with phenotypic changes 2. Artificial selection of promoter sequences can change expression • Maize tb locus is an example 3. Promoter “elements” are conserved among species • Specific sequences important for gene expression 4. Variation in promoter sequence related to human disease susceptibility • Susceptibility to specific pathotypes related to promoter sequences Transcription patterns are variable 1. Transcription initiation is the most important step in phenotypic expression 2. Regulation is at the gene not gene family level • Paralogs are independently regulated 3. Transcription is dynamic • Expression levels vary • Expression can fluctuate rapidly • Expression in neighboring cells can differ 4. Expression profiles vary among genes • Regulatory gene expression profile is inducible and highly variable • Housekeeping gene expression is generally constitutive but varies in response to stimuli and by cell type Role of Controlling Regions (=Promoters) in Gene Expression 1. Promoters • Contain sequence motifs that bind factors that modulate gene expression 2. Constitutive (housekeeping) promoters • On by default, • Turned off in response to stimuli 3. Inducible promoters • Off by default • Turned on in response to stimuli 4. TF determine if genes are turned on or off Promoters 1. Universal conserved features are not found 2. Common sequence motifs not found Basal Gene Expression 1. Basal promoter • RNA polymerase complex binding site • Contains TATA box or initiator element • Null promoter exist • Lacks TATA box or initiator element • Multiple basal promoters exist for some genes 2. TATA-box binding protein (TBP) • First protein to bind the basal promoter • Other proteins guide TBP to the binding site 3. RNA polymerase holoenzyme complex • Complex interactions of proteins builds the transcription complex 4. Transcription start site • Begins about 30 bp downstream of site where the transcription complex 5. Translation start site • Begins about 10 – 10,000 bp from transcription start site 6. Basal promoters provides for minimal, low level of expression • Expression mediated by constitutively expressed general transcription factors Modifying Basal Gene Expression Levels 1. TF binding to controlling regions required for full gene expression • TF are specific to cell types and stimuli conditions • Interaction of controlling regions and TF controls gene expression Controlling Region TF Binding Sites 1. Binding sites are isolated in controlling region • Binding sites are embedded in regions to which no TF bind 2. Binding sites numbers • 10 – 50 binding sites for 5 –15 TF 3. Role of other sequences • Local, sequence-specific conformational changes can affect TF binding • AT-rich regions • Z-DNA 4. Spacing of binding sites • Partial overlap • 10s of kilobases Features of TF Binding Sites 1. Size • Footprint (sequences covered by TF) is 10-20 bp • Direct binding site is 5-8 bp • Essential sequence is 4-6 bp 2. Site definition • Consensus sequence (although not all consensus sequences bind TF) • Biochemical activity (required to define a functional sequence) 3. Binding sites can overlap • TF pool determines which site is bound • Binding sites compete for a limited TF pool 4. Location • 100 basepairs to 100 kilobases from transcription start site 5. Functional location000 • >30 kb 5’ of basal promoter • few kb of basal promoter • in 5’ UTR • introns • >30 kb 3’ of basal promoter • exon • other side of adjacent gene Features of TF Binding Sites (cont.) 6. Location constraints • Some sites are constrained to specific positions relative to transcription start site 7. Isolating binding sites effects • Insulator sequences limit TF interactions to specific basal promoters • TATA or TATA-less TF interaction specificity • Specific recruitment of TF at a specific sequence to interact with basal promoter 8. Multiple control • On set of binding sets controls paralogs on opposite strands in opposite orientation • Cross regulated sites share common binding sites Abundance of Transcription Factors 1. TF are members of small to large multi-gene families • Arabidopsis • (CCAAT-DR1 Family) to 164 (C3H Family) paralog families 3. Result from gene duplication events 4. 12-15 unique DNA binding domains • Evolutionary conservation Modular Nature of Transcription Factors 1. DNA binding domain • Localized o MADS-box or homeo domains • Dispersed o Zn-finger or leucine zipper domains 2. Protein-protein interaction domain • Binding to other proteins necessary for activation 3. Intracellular trafficking domains • Nuclear localization signal 4. Ligand binding domain • Steroid or hormone-binding domains 5. Evolutionary domain shuffling has occurred • Protein-protein interaction domain lost but DNA binding domain maintained Transcription Factor DNA Binding Domain 1. Most bind the major groove of DNA 2. Domain sequence is highly conserved • Single amino acid mutations can alter significantly TF binding 3. TF binding specificity ranges from 3-5bp 4. Specificity may be increased by • Multiple binding domains • Domains that bind minor groove • Dimerization of two proteins, either homomeric or heteromeric 5. Binding is strong and highly specific • 5000 – 20,000 copies of TF needed for high binding specificity 6. Cofactor interactions increase specificity • Phosophorylation 7. Paralogs may have unique binding specificities Transcription Factor Protein-Protein Interactions Modulate Gene Expression 1. Increase (or decrease) the frequency in which the transcription apparatus is built • Can recruit (or prevent recruitment) of apparatus components 2. Specific interactions necessary for effects to be realized • As homodimers • As heterodimers • As solo proteins 3. Neighboring effects • TF at one site can prevent cofactor from interacting with a neighboring site 4. Altering chromatin structure • Recruit other complexes that • Acetylate, deacetylate, methylate, or demethylate histones • Methylate or demethylate DNA 5. Create physical bends • Facilitates binding of other TF 6. Cofactors can bring TF and transcriptional apparatus together Transcription Factor Activation or Repression of Transcription 1. Activation or repressor domains exists in TF 2. Action can be mediated through direct (or indirect via TAF) interaction with TBP Transcription Factor Activation Depends Upon Specific Modifications and Interactions 1. Post-translational modifications such as phosphorylation necessary 2. Activation and repression domains may reside in same protein • Specific functional activity depends upon cofactor involvement 3. A TF can act as a repressor if it blocks the binding site of a TF activator • This interaction can have a downstream effect on other expression steps Cooperative-binding and Interaction of Transcription Factors 1. Precise spacing required for some interactions involving TF • Nucleosome (40 bp multiples) or decondensed DNA (10 bp multiples) distances • Interactions with chromatin remodeling complexes may have a moderate distance requirement 2. Bending and looping supports interactions • Removes distance specificity requirement Role of Functional Modules 1. Functional modules can have several functions • Initiate transcription • Boost transcription rate • Mediate extracellular signals • Repress transcription • Insulate on module from another (insulator function) • Bring other modules into contact with basal promoter • Integrate other module status into a global expression pattern Additive and Epistatic Interactions of Transcription Factors 1. Modifying one TF and its module interaction can additively reduce the phenotype 2. Modifying insulator, tethering, or inegrator TF functions is epistatic 3. Proper expression, recruitment, and modular association of TF is necessary for full phenotypic expression A Transcription Family Has Multiple Target Genes 1. Because of the limited number of TF, a single TF may interact with 10s to 100s of genes 2. Drosophila eve and ftz regulate the majority of genes in the genome 3. The function of TF networks may genes 4. Mutations can be modulated by the effects of other downstream genes The Genome Is Significantly Involved in Gene Regulation 1. The number of promoter sequences is equal to the number of protein coding sequences 2. Transcription regulation a major function of the genome Distribution of Transcription Factors Among Dicot Genomes (family assignment rules from: http://planttfdb.cbi.pku.edu.cn/) Family AP2 ARF ARR-B B3 BBR-BPC BES1 C2H2 C3H CAMTA CO-like CPP DBB Dof E2F/DP EIL ERF FAR1 G2-like GATA GRAS GRF GeBP HB-PHD HB-other HD-ZIP HRT-like HSF LBD LFY LSD M-type MIKC MYB MYB_related NAC NF-X1 NF-YA NF-YB NF-YC NZZ/SPL Nin-like RAV S1Fa-like SAP SBP SRS STAT TALE TCP Trihelix VOZ WOX WRKY Whirly YABBY ZF-HD bHLH bZIP Total Grape (3x) 19 17 12 29 5 6 64 43 4 6 6 7 22 7 2 80 18 40 19 43 8 1 2 7 33 1 19 44 1 3 18 36 138 57 71 3 7 17 8 1 8 1 2 1 19 5 1 21 15 26 2 11 59 2 7 10 115 47 1276 Papaya (3x) 17 10 12 34 3 6 76 28 4 9 4 6 20 6 4 77 19 51 23 42 7 4 1 8 29 2 18 35 1 2 225 20 98 51 82 1 5 11 4 1 6 2 1 2 11 4 1 11 22 29 2 11 49 2 9 10 105 46 1379 Arabidopsis (3x + 2x) 30 37 21 77 17 14 116 66 10 22 9 14 47 16 6 139 26 64 41 37 9 23 3 11 58 2 25 50 1 12 70 76 168 97 138 2 21 27 21 1 17 7 4 1 30 16 4 33 33 34 3 18 90 4 8 18 225 127 2296 Tomato (3x + 3x) 27 22 21 73 6 9 99 48 7 13 4 10 33 8 9 137 28 59 30 54 13 11 2 16 58 1 26 47 1 3 67 32 140 79 101 2 10 29 20 1 10 3 1 3 17 9 1 21 36 31 2 10 81 2 9 22 161 70 1845 Soybean (3x + 2x + 2x) 76 85 42 112 22 19 267 136 23 32 19 36 93 28 12 330 103 164 70 139 31 11 11 31 140 1 61 111 2 17 88 160 369 265 247 8 57 46 35 0 45 5 4 2 73 33 1 101 71 93 20 42 233 13 34 54 480 266 5069 Distribution of Transcription Factors Among Monocot Genomes (family assignment rules from: http://planttfdb.cbi.pku.edu.cn/) Family AP2 ARF ARR-B B3 BBR-BPC BES1 C2H2 C3H CAMTA CO-like CPP DBB Dof E2F/DP EIL ERF FAR1 G2-like GATA GRAS GRF GeBP HB-PHD HB-other HD-ZIP HRT-like HSF LBD LFY LSD M-type MIKC MYB MYB_related NAC NF-X1 NF-YA NF-YB NF-YC NZZ/SPL Nin-like RAV S1Fa-like SAP SBP SRS STAT TALE TCP Trihelix VOZ WOX WRKY Whirly YABBY ZF-HD bHLH bZIP Total Japonica rice (2x 22 48 11 65 7 6 135 74 7 21 20 13 37 10 11 163 133 62 32 69 19 13 1 17 61 1 38 39 2 12 35 61 130 106 170 2 25 16 19 0 15 4 2 0 29 6 1 45 23 40 2 17 128 2 15 15 211 140 2408 Brachypodium (2x) 29 36 9 45 4 7 93 53 10 14 11 11 27 7 6 120 69 61 30 48 14 15 5 12 43 1 26 24 1 7 24 51 98 77 109 1 12 17 15 0 15 4 2 0 18 5 1 30 21 32 2 9 87 2 13 15 158 95 1751 Sorghum (2x) 32 33 13 86 6 9 122 55 10 14 12 11 35 13 10 165 62 56 34 86 11 15 3 8 47 1 25 36 1 6 46 47 132 116 141 3 16 16 18 0 16 4 2 0 22 6 1 28 21 36 2 12 110 2 10 18 233 123 2198 Corn (2x + 2x) 54 62 13 77 9 16 179 111 10 18 17 20 51 24 9 205 25 89 54 104 32 29 4 28 97 0 49 60 4 20 47 90 203 169 190 4 36 28 25 0 23 3 5 0 55 11 2 52 52 59 10 30 163 6 31 26 308 218 3316 Arabidopsis (3x + 2x) 30 37 21 77 17 14 116 66 10 22 9 14 47 16 6 139 26 64 41 37 9 23 3 11 58 2 25 50 1 12 70 76 168 97 138 2 21 27 21 1 17 7 4 1 30 16 4 33 33 34 3 18 90 4 8 18 225 127 2296 Defining Transcription Factors – Based on Conserved Pfam Sequence Motifs (mostly) (Pfam: accepted motif sequence definitions; http://pfam.sanger.ac.uk/) Distribution of Transcription Factor Families between Common Bean (Phaseolus vulgaris) and Soybean (Glycine max). Soybean has undergone a genome duplication since its split from common bean. (family assignment rules from: http://plntfdb.bio.uni-potsdam.de/) TF family ABI3VP1 Alfin-like AP2-EREBP ARF ARID ARR-B AUX/IAA BBR/BPC BES1 bHLH BSD bZIP C2C2-CO-like C2C2-Dof C2C2-GATA C2C2-YABBY C2H2 C3H CAMTA CCAAT Coactivator p15 CPP CSD DBP DDT E2F-DP EIL FAR1 FHA G2-like GeBP GNAT GRAS GRF HB HMG HRT HSF IWS1 Jumonji LFY LIM Pv count 41 24 179 27 12 15 30 5 7 155 10 78 8 42 32 8 10 44 8 55 3 6 5 2 11 7 7 25 19 49 5 38 55 10 119 9 1 30 10 21 1 9 Gm count 90 38 363 60 26 31 66 18 16 359 24 204 26 81 64 18 62 153 15 253 9 20 8 4 20 16 12 80 39 131 19 58 119 24 203 24 1 52 22 40 8 20 Ratio 2.2 1.6 2.0 2.2 2.2 2.1 2.2 3.6 2.3 2.3 2.4 2.6 3.3 1.9 2.0 2.3 6.2 3.5 1.9 4.6 3.0 3.3 1.6 2.0 1.8 2.3 1.7 3.2 2.1 2.7 3.8 1.5 2.2 2.4 1.7 2.7 1.0 1.7 2.2 1.9 8.0 2.2 TF family LOB LUG MADS MBF1 MED6 MED7 mTERF MYB MYB-related NAC NOZZLE OFP PBF-2-like PHD PLATZ Pseudo ARR-B RB Rcd1-like RWP-RK S1Fa-like SAP SBP SET Sigma70-like SNF2 SOH1 SRS SWI/SNF-BAF60b SWI/SNF-SWI3 TAZ TCP Tify TIG TRAF Trihelix TUB ULT VARL VOZ WRKY zf-HD Zn-clus Total Pv count 49 5 78 3 1 1 34 141 68 90 5 20 3 32 14 6 1 2 12 3 1 23 44 9 37 1 10 18 5 4 27 13 5 22 41 10 1 3 5 90 19 0 2188 Gm count 95 12 180 4 1 3 58 291 314 186 6 47 7 270 34 12 3 8 28 12 2 47 82 13 64 2 22 31 9 5 56 33 1 56 73 24 11 6 8 186 57 0 5225 Ratio 1.9 2.4 2.3 1.3 1.0 3.0 1.7 2.1 4.6 2.1 1.2 2.4 2.3 8.4 2.4 2.0 3.0 4.0 2.3 4.0 2.0 2.0 1.9 1.4 1.7 2.0 2.2 1.7 1.8 1.3 2.1 2.5 0.2 2.5 1.8 2.4 11.0 2.0 1.6 2.1 3.0 Structural variation in Arabidopsis thaliana basic helix-loop-helix transcription factors (The Plant Cell (2003) 15:1749) Figure 3. Intron Distribution within the bHLH Domains of the AtbHLH Proteins. Scheme of the intron distribution patterns (color coded and designated A to I) within the bHLH domains of the AtbHLH proteins. Introns are indicated by triangles and numbered (1 to 3) based on those present in the bHLH region of PIF3, which is shown at top. When the position of the intron coincides with that found in PIF3, the intron number is given above the triangle. For patterns F, G, and H, no intron number above the triangle indicates that the location of the intron within the bHLH domain is different from that found in PIF3. The percentage of proteins with each pattern is given at right. The correlation of intron distribution patterns and phylogenetic subfamilies is provided in Figure 2 (central column, color coded), and the chromosomal distribution of intron patterns is provided in Figure 4 (colored ovals adjacent to each entry number). Distribution of the basic helix-loop-helix transcription factory family in Arabidopsis thaliana (The Plant Cell (2003) 15:1749) Figure 4. Chromosomal Locations, Intron Distribution Patterns, and Duplication Events for AtbHLH Genes. Deduced chromosomal positions of the AtbHLH genes are indicated by EN (assigned in Figure 1). Segmentally duplicated regions in the chromosomes (Chr I to V) are indicated by boxes of the same color (adapted from TIGR). The total number of bHLH genes per chromosome is indicated at the top of each chromosome in parentheses. The scale is in megabases (Mb) and is adapted from the scale available on the TIGR database (see Methods). The small colored ovals at left of the ENs indicate the intron distribution patterns within each gene. The color code corresponds to the intron patterns shown in Figure 3. Connecting lines (blue and pink) mark the specific cases in which there is a strong correlation between duplicated genomic regions and the presence of bHLH genes with both closely related predicted amino acid sequence (close ENs) and the same intron pattern. The blue lines link cases associated with apparent intrachromosomal duplications (see supplemental Figure 7B online), and the pink lines link cases associated with apparent interchromosomal duplications (for more details, see supplemental Figure 7C online). Multiple members of a gene family (rbcS: small subunit of RUBISCO) have different cis-acting elements (The Plant Cell (1991) 3:1305) Conserved cis-acting element Family member specific cis-acting elements Transcription factors bind to different domains of a promoter in different tissues. (The Plant Cell (1991) 3:1305)