Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genomics and Gene Recognition Transcription and the Regulation of Gene Expression • • 1958 – Francis Crick enunciated the “central dogma of molecular biology” this scheme outlined the residue-by-residue transfer of biological information as encoded in the primary structure of the informational biopolymers, nucleic acids and proteins • • DNA → RNA → protein, postulated that RNA was information carrier between DNA and proteins 1961 – Francois Jacob and Jacques Monod extended this hypothesis to predict that the RNA intermediate, which they called messenger RNA (mRNA) would have the following properties: o its base composition would reflect base composition of DNA (so, genes remain protein coding units) o it would be very heterogeneous with respect to molecular mass o it would be able to associate with ribosomes because ribosomes are the site of protein synthesis o it would have a high rate of turnover, that is, mRNA would be rapidly degraded all RNAs (mRNA, rRNA, tRNA) participate in protein synthesis and are also synthesized from DNA templates by DNA-dependent RNA polymerases in the process known as transcription (not all genes encode proteins, some encode rRNAs and tRNAs) protein synthesis occurs via the process of translation transcription is highly regulated in all cells o in prokaryotes, only about 3% of the genes are undergoing transcription at any given time o in eukaryotes, only 0.01% of the genes are transcribed at any given time (for differentiated cells, cells that perform their specific function) o which gene is transcribed? It is determined by the growth status of the cell, metabolic condition etc. • • • “Central dogma of molecular biology” 1/10 Transcription in Prokaryotes • • all RNA is synthesized by a single species of DNA-dependent RNA polymerase (well, there is one exception when short RNA primers are formed by primase during DNA replication) the RNA polymerase of E. Coli, so called RNA polymerase holoenzyme is a complex multimeric protein large enough to be visible in the electron microscope o it consists of four subunits α2ββ’σ, where β’ is the largest subunit o σ can be any of the remaining subunits and it recognizes promoters that identify the location of transcription start site (site where transcription starts) o β and β’contribute to formation of a catalytic site o two α subunits are essential for assembly of an enzyme Terminology in DNA transcription • • • • the strand of duplex DNA that is read by RNA polymerase is termed template strand (the other one is called nontemplate strand) RNA polymerase is moving 3’ to 5’, the RNA product, the so-called transcript, grows in the 5’ to 3’ direction the RNA transcript will eventually be translated into the amino acid sequence of a protein by a process in which successive triplets of bases (termed codons) read 5’ to 3’, specify a particular amino acid. by convention, when the order of nucleotides in DNA is specified, it is 5’ to 3’ sequence of nucleotides in the nontemplate strand that is presented. Consequently, if convention is followed, DNA sequences are rendered in terms that correspond directly to mRNA sequences, which correspond in turn to the amino acid sequences as read beginning with the N-terminus Transcription and translation 2/10 The steps of transcription in Prokaryotes Transcription can be divided into four stages: 1. 2. 3. 4. binding of RNA polymerase holoenzyme at promoter sites initiation of polymerization chain elongation chain termination Sequences of events in the initiation and elongation phases of transcription in prokaryotes 3/10 Promoter Identification Promoters are identified in vitro by a technique called DNA footprinting: • • • • RNA polymerase holoenzyme is bound to a putative promoter sequence in a DNA duplex DNA:protein complex is treated with DNase I DNase I cleaves the DNA at sites not protected by bound protein the set of DNA fragments left after DNase I digestion reveals the promoter (by definition, promoter is the RNA polymerase holoenzyme binding site) DNA footprinting 4/10 Prokaryotic Promoters • • • • • +1 site is defined as the transcription start site (that base is the first base in the RNA transcript) +2 base is the second base in RNA transcript bases upstream from the initiation site are −1, −2 etc. There is no zero! RNA polymerase binding is typically spanning −40 to +20 the transcript site on the template side is almost always a pyrimidine, so the transcript almost always begins with a purine • promoters vary in size, from 20 to 200bp, but typically consist of a 40-bp region located on a 5’ side of the transcription start site within a promoter are two consensus sequence elements o a consensus is defined as the bases that appear with highest frequency at each position when a series of sequences believed to have common function are compared o the two sequences are: Prinbow box (after David Prinbow) near −10, whose consensus sequence is TATAAT, and the sequence in the −35 region containing the TTGACA o the two elements are separated by about 17bp of nonconserved sequence o the more closely −35 region resembles consensus the greater the efficiency of transcription o rRNA promoters have a third upstream element at about −55 (recognized by the α subunit) • Nucleotide sequences of representative E.Coli promoters (the numbers below consensus sequence show percent of the occurrence of the consensus base) 5/10 Elongation and Termination Elongation: • elongation of the RNA transcript is catalyzed by the core polymeraze • elongation does not proceed at a constant rate but varies between 20 to 50 nucleotides per second • RNA polymerase slows down and almost stops at G:C rich regions due to the greater difficulty in unwinding G:C base pairs • as the RNA polymerase moves along the template, the DNA double helix is unwound ahead of it and recloses after the polymerase has passed by • as the RNA polymerase moves along the template, the DNA double helix is unwound ahead of it and recloses after the polymerase has passed by Termination: • there are two types of termination in bacteria o one that is dependent on a specific protein termination factor called ρ (occurs rarely) o one that is not dependent on this protein (transcription stop is determined by specific stop sites called termination sites) Termination of transcription 6/10 Prokaryotic Genomes • • • • • • prokaryotes are simplest free-living organisms therefore, studying prokaryotes can give us a sense what is the minimum number of genes for survival great opportunities because of genome sequencing it seems that about 300 genes is minimum (must contain genes for replication, and genes to obtain and store energy) currently, about 120 genomes have been finished • • the basic methodology of DNA sequencing typically singles out sequences of about 1,000 nucleotides these nucleotides overlap, and the contiguous parts that can be inferred are called contigs thus, we need many more 1,000 long pieces to cover the whole genome • probability that a nucleotide is covered is 1,000 4,600,000 • probability that a nucleotide is NOT covered is 1− 1,000 4,599,000 = = 0.9998 4,600,000 4,600,000 • if N clones are made, the probability that a nucleotide is missed is ( • 4,599,000 N ) = 0.9998 N 4,600,000 to set the probability of missing a nucleotide to 5%, we should make 4,599,000 N ) = 0.05 4,600,000 4,599,000 N log( ) = log 0.05 4,600,000 log 0.05 N= ≈ 13779 4,599,000 log( ) 4,600,000 ( This means that 13,779,000 subclones will have to be made, this is about 3 genome equivalents 7/10 Figures/Tables from the textbook: 8/10 Prokaryotic gene structure • a gene consists of both promoters and coding parts What is gene expression? • • • • the process by which a gene’s information is converted into its product (say, a protein) consists of transcription and translation followed by o folding o post-translational modification o targeting the amount of protein that is “expressed” depends on the tissue, developmental stage of the cell and metabolic and physiologic state of the cell Promoter Elements • • • E. Coli has 7 different σ factors some other bacteria have as many as 10 each σ factor recognizes different promoters 9/10 What is an operon? • • • • • • in bacteria, genes encoding the enzymes of a particular metabolic pathway are often grouped adjacent to one another in a cluster on a chromosome such clusters, together with their regulatory sequences are called operons through this mechanism, a group of genes is expressed at the same time using one mRNA these genes have an operator, a regulatory element located next to the promoter interaction of a regulatory protein with the operator controls transcription of the operon by governing the accessibility of RNA polymerase to the promoter still, many prokaryotic genes do not have operators What is an open reading frame (ORF)? • • • • • ribosomes translate triplets of nucleotides into amino acids start codon is usually AUG (which is encoded into methionine) stop codons are UAA, UAG, UGA it is important to determine from which nucleotide to start open reading frame is any nucleotide sequence that contains a string of codons that is uninterrupted by the presence of a stop codon in the same reading frame Example (at a DNA level): 5' 3' atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa M P K L N S V E G F S S F E D D V * 2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat C P S * I A * R G F H H L R T M Y 3 gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata A Q A E * R R G V F I I * G R C I ----Sources: Biochemistry by Reginald H. Garrett and Charles M. Grisham Fundamental Concepts of Bioinformatics by Dan E. Krane and Michael L. Raymer http://bioweb.uwlax.edu/GenWeb/Molecular/Seq_Anal/Translation/translation.html 10/10