* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene Expression
Nucleic acid double helix wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Histone acetyltransferase wikipedia , lookup
DNA vaccination wikipedia , lookup
RNA interference wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
DNA supercoil wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Messenger RNA wikipedia , lookup
Epigenetics wikipedia , lookup
Polyadenylation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Designer baby wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Transcription factor wikipedia , lookup
History of genetic engineering wikipedia , lookup
RNA silencing wikipedia , lookup
Microevolution wikipedia , lookup
Point mutation wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
History of RNA biology wikipedia , lookup
Non-coding DNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Non-coding RNA wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Epitranscriptome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene Expression Gene Expression • Gene expression: 2 basic steps: transcription and translation. – Transcription: making an RNA copy of a region of chromosome (a gene) – Translation: using the information encoded in messenger RNA (mRNA) to produce a polypeptide. • Between transcription and translation: the primary RNA transcript of the gene must be converted to mRNA, and then transported out of the nucleus to a ribosome in the cytoplasm. • RNA-only genes: – for example, ribosomal RNA or snRNA (used for intron splicing). More types described frequently. – Need to be processed be become functional. • After translation: polypeptides need to converted to proteins – activation/inactivation by phosphorylation, etc. – protein degradation Levels of Regulation • Gene expression is regulated by several different levels • transcriptional control: Control over whether RNA polymerase will transcribe the gene or not. – Caused by the binding of proteins to control regions adjacent to the gene that allow (or prevent) RNA polymerase to transcribe the gene. – The most important level of control (probably). – Each gene is controlled separately. • post-transcriptional regulation: control of RNA after it has been transcribed, control of translation, and control of the protein itself. • RNA: splicing out of introns, transport of mRNA to specific parts of the cell, RNA stability and destruction • translation: can ribosomes translate the mRNA molecule into protein or not? • protein: processing of polypeptides into functional proteins, protein stability. • Regulation of chromatin conformation: changes in chromatin structure that allow or prevent access to groups of genes – inherited between cells within an individual's lifetime, but not inherited between generations. • epigenetic mechanisms: control that is inherited between generations (from parent to child) but which doesn’t involve altering the DNA base sequence. Transcriptional Control • Proteins that bind to DNA regulatory sequences and affect transcription are called transcription factors. – Act in trans: they can affect any gene on any chromosome in the same nucleus that has a matching binding site. – Proteins are translated in the cytoplasm and migrate back into the nucleus to function. • DNA regulatory sequences are adjacent to the gene are said to – Act in cis: they only affect the gene they are attached to (and not other copies of the gene in the cell). • Classifying transcription factors: – general transcription factors: involved in all transcription complexes, – tissue-specific transcription factors: only used in certain tissues or with certain external stimuli. Cis vs. trans Transcription in General • • RNA polymerase is an enzyme that transcribes DNA into RNA: it polymerizes RNA out of nucleotides (NTPs). Most RNA polymerases are composed of several different polypeptide subunits. There are 3 RNA polymerases in the nucleus (plus another for the mitochondria). – RNA polymerase 1 (pol1): ribosomal RNA (tandem arrays of 18S, 28S, 5.8S) – RNA polymerase 2 (pol2): protein-coding genes, snoRNA (short nucleolar RNA), miRNA (micro RNA) – RNA polymerase 3 (pol3): 5S ribosomal RNA, transfer RNA, a few others • We will mostly discuss pol2 transcription. • Basic concept: 3 steps. – Initiation: RNA polymerase binds to a promoter sequence on the DNA, opens up the DNA double helix and starts making RNA – Elongation: RNA polymerase moves down the DNA strand creating a RNA copy of the gene, one nucleotide at a time. – Termination: RNA polymerase stops transcription and falls off the DNA Cis-Acting DNA Sequences • • • • The most important DNA regulatory sequence is the promoter, the place where RNA polymerase binds and starts transcription. There is no one single defined promoter sequence. Each gene has a different promoter sequence, with various conserved elements. Five short sequences are conserved in eukaryotic promoters, but not all are found with all genes. All are close to the transcription start point, with some upstream and some downstream of it. The best known is the TATA box, located about 25 bp upstream from the transcription initiation point. Like all these elements, the TATA box is a consensus sequence, and it is not present in all genes. (One count shows only32% of human genes have a TATA box). Initiation and Elongation • Initiation: The first step in transcription initiation is the binding of the general transcription factor TFIID to the promoter region. – – – • After that, several other transcription factors bind, as does RNA polymerase 2, forming the initiation complex. At this point, the polymerase transcribes a very short RNA, but doesn’t move away from the promoter. The transcription initiation complex is stalled Elongation: The transcription complex switches to the elongation phase when the helicase subunit (TFIIH) unwinds the DNA and then activates the RNA polymerase by phosphorylating it – The activated polymerase then moves down the template strand, making an RNA copy. – The original transcription factors stay at the promoter, and new ones bind to polymerase during elongation. – Rate: about 20 nucleotides per second Termination • • Pol2 genes end at a polyadenylation signal, a short sequence that causes an enzyme to cut the RNA and add ~ 100 adenines (poly A tail) to the 3' end. – Consensus sequence similar to AAUAAA • • • However, RNA polymerase keeps on transcribing the DNA. An exonuclease starts chewing up the excess RNA. It's faster than RNA polymerase: when the exonuclease catches up to the RNA polymerase, transcription stops. Possible function of this excess RNA? Tissue-specific Transcription Factors • • Tissue-specific transcription factors activate transcription in specific cell types, or in response to specific signals. They bind to short DNA sequences that are near the promoter. – Used to be thought promoters were upstream from the promoter, but it is now known they can be either upstream or downstream from the promoter (but near it). – they consist of short consensus sequences: 48 bp long. Lots of potential sites, but most aren’t used. • • Lots of protein interactions between transcription factors and initiation complex A few picky details: – – Some transcription factors bind at places distant from the promoter (to enhancers and silencers) Co-activators and co-repressors bind to other proteins and not to the DNA (a somewhat artificial distinction). Position of 3 transcription factor binding sites relative to transcriptions start. Transcription Factors • Transcription factors generally have two functional sections (domains): – DNA-binding domain : attaches to the specific DNA sequence, – Activation domain : works by binding to other proteins to create the transcription complex. • The DNA-binding domains fall into several general types, and proteins that have one of these domains are usually assumed to be transcription factors. – Leucine zipper motif. An alpha helix that has a leucine every 7 amino acids, so all the leucines are on the same side of the molecule. This allows the protein to form a dimer by hydrophobic interactions. This dimer grips the DNA double helix – Zinc finger motif: binds a Zn2+ ion between two cysteines and two histidines (C2H2 proteins) or between four cysteines (C4 proteins). Sometimes a zinc finger protein will have more than one zinc finger motif. – Helix-turn-helix motif consists of two alphahelices connected by a short region of other amino acids. The two helices bind the DNA major groove. This is a common motif in homeobox gene regulation. – Helix-loop-helix motif, which is different from the HTH motif. HLH has a much longer connecting loop that allows more flexibility in the molecule. Leucine zipper Zinc finger Helix-loop-helix Yeast Two-Hybrid System • • • • • • The yeast two-hybrid system is a way to detect interactions between proteins. It is often used to find proteins that interact with the protein you are studying. Based on the two domains of a transcription factor. A transcription factor that regulates the GAL4 gene (involved in galactose utilization) was split into separate DNA binding and activation domains. The “bait” protein (the protein you are studying) is fused to the binding domain. A large number of other protein-coding genes are fused to activation domains: a library of “prey” sequences. Each individual prey sequence is cotransformed into yeast along with the bait. If the bait and prey proteins interact in the cell, the attached DNA binding and activation domains will be brought together at the GAL4 gene, causing it to be transcribed. This event can be detected using a chromogenic (colorgenerating) substrate. Enhancers and Silencers • Enhancers and silencers are tissuespecific cis-acting DNA sequences that increase or decrease transcription regardless of their position (within limits, but can be several Mbases away) or orientation: they can be either 5’ or 3’ to the gene itself. – “locus control regions” are groups of enhancers; roughly, this is a different name for the same type of element. • Transcription factors bind to these elements. • Enhancers and silencers work because the DNA can bend and allow the transcription factors to interact with the promoter. • Often discovered by chromosome breaks that separate the enhancer from its target gene (see next slide). Acheiropodia • • Chromosome breakpoints used to locate enhancers (A).Translocation break points (vertical dashed arrows) downstream from PAX6 inactivate the gene in aniridia (absence of iris in the eye). Possible enhancers are the red boxes, some in the introns of another gene (ELP4). (B). Various chromosome changes affecting the sonic hedgehog (SHH) gene, which controls limb development. Post-transcriptional Regulation • • • • • At least half of all human genes are expressed in different ways in different tissues. Different transcriptional start sites, different intron splicing patterns, and different poly A addition sites can give quite a few different proteins from the same gene. Different proteins from the same gene are called isoforms. Isoforms are produced in different tissues, different times in development, different subcellular locations (soluble vs. membrane-bound, for instance), etc. Dystrophin, the Duchenne muscular dystrophy protein, has at least 7 different transcription start sites, used in different tissues. (B, brain; M, muscle; P, Purkinje; R, retina; B,K, brain and kidney; S, Schwann cells; G, general) A good example of alternate splicing patterns in different tissues is tropomyosin, which has 5 optional exons. Tropomyosin is a protein in striated muscle that binds to actin and prevents it from interacting with myosin: thus it regulates muscle movements. Control of Alternative Splicing • RNA splicing is performed by snRNPs, small nuclear ribonucleoprotein complexes, which are RNA/protein hybrids. • Variations in snRNPs (as well as other proteins) occur in different cells and recognize slightly different splicing signals. • Some of the splicing proteins also assist in transporting mRNA out of the nucleus. Messenger RNA Stability and Translatability • • • • • • micro RNAs (miRNA) are a major cause of messenger RNA decay in the cell. They can also prevent mRNA from being translated by the ribosome. miRNAs are produced from RNAonly genes. The RNA forms a stemloop structure. the Dicer enzyme processes the double-stranded region, incorporating one strand of the RNA into the RISC complex. The miRNA in the RISC complex is complementary to (antisense) the 3’ region of a specific messenger RNA. The RISC complex binds to the messenger RNA and degrades it. Usually if the miRNA is a perfect match to the mRNA. Alternatively, the RISC complex can inhibit translation of the messenger RNA, especially if the match between miRNA and mRNA isn’t perfect. An important finding: large scale studies have shown that the presence or absence of any given miRNA changes the amount of protein by 2-fold or less in most cases. Translational Control • • • • • Regulation of whether the messenger RNA is translated or not. The best studied example is ferritin, a protein that stores up to 4500 iron atoms (as iron hydroxyphosphate) in its center. The ferritin mRNA contains an ironresponse element in the 5’ UTR. The IRE folds up into a hairpin loop, which can bind to the IRE-binding protein. When iron levels are low, IRE-BP binds and prevents translation of the mRNA. This allows the ferritin mRNA to remain intact while preventing any further sequestration of iron atoms. Transferrin is the major iron-carrying protein in the blood serum. The transferin mRNA contains 3 IREs in the 3’ UTR. RNA degradation is prevented by IRE-BP binding. Control of Protein Degradation • • • • To react quickly to the environment, a cell must be able to remove outdated signals quickly. Many proteins, especially regulatory signaling proteins, are degraded by ubiquitinmediated proteolysis. Ubiquitin is a small protein that is highly conserved in evolution. In this system, multiple copies of ubiquitin are covalently attached to the target protein in long chains. The complex is then transported to the proteosome, a large multi-subunit barrel-shaped structure. The proteosome degrades the target protein to amino acids and recycles the ubiquitin. Target specificity is provided by the enzyme that attaches ubiquitin to the target proteins: there are hundreds of different E3-ubiquitin ligases. – – One target is hydrophobic amino acids that are normally buried in the protein’s interior or within membranes. N-end rule: On average, a protein's half-life correlates with its N-terminal residue. • • Proteins with N-terminal Met, Ser, Ala, Thr, Val, or Gly have half lives greater than 20 hours. Proteins with N-terminal Phe, Leu, Asp, Lys, or Arg have half lives of 3 min or less. The proteosome also re-folds misfolded proteins if the proteins are protected from degradation by chaperone proteins. Misfolding is a common result of heat shock. Ubiquitin plays a number of other roles in the cell, including cell signaling and X chromosome inactivation. Chromatin Conformation • • • • • Recall that chromosomal DNA is wrapped up in nucleosomes: 8 histone proteins with about 150 bp of DNA wrapped around them. Higher level packaging also exists. All of this structure makes it difficult for RNA polymerase and transcription factors to reach the target DNA. When it is tightly packed, chromatin is said to be “closed” and unavailable for transcription. We see this as heterochromatin. Euchromatin is in an open conformation, accessible to transcription factors and capable of being transcribed. Facultative heterochromatin is DNA that is euchromatin in some tissues but heterochromatin in others. This is very typical of genes needed for the functioning of specific cell types. We will look at several mechanisms that affect chromatin structure: chromatin remodeling, histone modification, and DNA methylation. There are also some alternate histone proteins that can affect chromatin structure. Also, the position of the gene in the interphase nucleus affects gene activity. Histone Acetylation • • • • Histones are basic proteins: lysines have a + charge that is attracted to the – charges on DNA phosphates. Histone acetylases add acetate (CH3COOH) to the NH2 at the end of lysine. This removes the + charge, and in consequence the histones are less tightly bound to the DNA. Genes in the region of acetylated histones are active; non-acetylated histones are associated with inactive genes. The chromatin in areas of acetylated histones is less condensed. Histone acetylases and deacetylases can be part of transcriptional complexes, helping to activate specific genes. The Histone Code • • • Histones bind tightly to DNA because the negative charges on the DNA phosphates form ionic bonds with the positively charged lysines and arginines in the histones. Histone proteins have exposed N and C termini. The lysines in these tails are frequently modified, which changes how tightly histones bind to DNA and to the many other proteins found in chromatin. The histone code is a theoretical concept that proposes that specific sets of histone modifications define the chromatin conformation and the activity of the DNA. Probably things aren’t as clearly defined as this concept implies: there are many factors involved in chromatin conformation. More Histone Code H3 and H4 are histones; K stands for lysine; the number following the K is the position on the protein. Chromatin Remodeling • • • • Moving nucleosomes around to allow transcription factors to reach the cis acting regulatory sites is accomplished by large protein structures called chromatin remodeling complexes. Remodeling slides nucleosomes along the DNA, away from the region of the promoter. The process requires energy, so it uses ATP. The DNA exposed by moving histones away is more accessible for restriction enzymes and DNase in the lab: DNase hypersensitive sites are a sign of active genes. Remodelling often occurs during development, as cells differentiate. DNA Methylation • DNA methylation is the addition of methyl groups to cytosine, creating 5-methyl cytosine. In mammalian DNA this almost always occurs when the C is followed by a G: CpG. • DNA methylation is associated with inactive genes, especially when it is near the promoter. Specific proteins recognize and bind to it, which alters the chromatin configuration. • The methylation state of DNA is maintained through mitosis: daughter cells are methylated in the same way as the parent cell. Methylation changes are thus epigenetic changes: heritable changes that don’t alter the DNA base sequence. When DNA replicates, an enzyme called maintainence methylase recognizes methylated cytosines on the old strand (in a CpG dinucleotide), and methylates the corresponding C on the new strand. • DNA Methylation in Development • DNA from sperm and egg are both heavily methylated, but at different sites. • Almost all methylation is removed in the early embryo (morula and early blastocyst). • As early development proceeds, new methylation patterns are imposed on different cell lineages. These patterns permanently inactivate some of the genes (at least for the life of the individual). CpG Islands • • • In human DNA, the dinucleotide CpG is quite rare. – Note that this is a C followed by a G on the same DNA strand, not C paired with G on the opposite strand. The “p” stands for a phosphodiester bond. The rarity of CpG is tied up with DNA methylation. And, areas where there are many CpG dinucleotides are often associated with the promoter regions of genes. These areas are CpG islands. Why CpG is rare • Cytosine spontaneously loses its amino group, which converts it to uracil (deoxyuracil actually). • DNA repair enzymes notice this and repairs it back to cytosine. • However, when 5-methyl-cytosine is deaminated, it is converted to thymidine. • Since T is a legitimate base in DNA, this change is not corrected. • In human DNA, most CpGs are methylated. So over evolutionary time scales, most CpGs have been converted to TpG. • However, CpGs near promoters is usually not methylated, so deaminations are corrected back to CpG, and thus CpG is more common near promoters than elsewhere in the genome. Epigenetics and Imprinting • Epigenetics is the study of differences that are inherited between generations but don’t involve changes in the DNA sequence. – Sometimes epigenetics is used for changes that persist between cell generations (mitosis), but I will use the term more strictly here, to mean changes that are transmitted from parent to child. • The concept predates our knowledge of DNA, but these days most epigenetic changes involves DNA methylation. • Imprinting refers to epigenetic changes where the activity of a gene depends on whether it came from the father or the mother. • Imprinting seems to be the major reason why uniparental diploid (UPD) embryos do not produce viable offspring: some genes require an active, unmethylated gene from the father while others need an active gene from the mother. – UPD from father is a hydatiform mole: extra-embryonic membranes but no embryo, UPD from mother is an ovarian teratoma: a mass of disorganized tissue that usually includes hair, teeth and bones. Non-genic structural inheritance Structures on the surface of ciliates can be altered and inherited asexually for many generations. This is a form of epigenetic inheritance: DNA mutations are not involved. These changes were created artificially. A. Inversion of a row of cilia (BB=basal body). B. Siamese-twin paramecium, with two contracile vacuole pores (CVP). C. Mirror image oral apparatuses (OA). Frankel, J.. 2008. Eukaryotic Cell 7(10):1617-1639 Methylation and Imprinting • • • Prader-Willi syndrome and Angelman syndrome are both caused by deletions or uniparental disomy of 15q. Most are caused by unequal crossing over between two repeated sequences that are 4.2 Mbp apart. Prader-Willi results when only the maternal gene is active, and Angelman when only the paternal is active. – PWS is characterized by obesity due to an insatiable appetite, small hands and feet, short stature, and hypogonadism. In addition, there is a common behavioral phenotype, including temper tantrums, stubbornness, and controlling and manipulative behavior. – AS is characterized by severe mental retardation, severe speech impairment, and unsteady gait and/or tremulousness of the limbs. In addition, individuals with AS present with inappropriate laughter and excitability. Olfactory Receptor Genes • Some genes have only one allele expressed, but not affected by which parent they came from. – An important case: immunoglobulins. We will discuss them later. • • • • We have about 900 olfactory receptor genes: it is the largest gene family in humans. Found in clusters on many chromosomes. In each receptor cell, only one copy of 1 gene is active. This works by having a single copy of a necessary enhancer (the copy on the other chromosome is inactivated by methylation). The enhancer randomly associates with the promoter of one receptor gene, allowing it to be transcribed.