* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Young Scholars Program - 1996
RNA silencing wikipedia , lookup
Gene desert wikipedia , lookup
RNA interference wikipedia , lookup
Epitranscriptome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Non-coding RNA wikipedia , lookup
Genetic code wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Gene expression wikipedia , lookup
Community fingerprinting wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Molecular evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
A PROKARYOTIC GENE Escherichia coli contains in its genome about 4000 protein-coding genes and 100 RNA genes. To be exact, there are 4289 ORFs (open reading frames), 86 tRNA genes, 22 rRNA genes and seven small molecular weight-RNA genes. This makes a grand total of 4404 genes in E. coli. Of the more than 4000 protein-coding genes, about 60% have known function. Before the genome was sequenced there were 1853 characterized genes, and since the sequence has been completed another 750 ORFs have been assigned a function based on the comparison of the ORF sequence to already known genes from other genomes or to other genes in E. coli’s genome. Within E. coli there are gene families of “paralogous” genes, which are not identical, but which have related sequence and function. For example there are 80 ABC transporter genes (genes involved in group translocations, i.e., PEP:PTS). The RNA genes code for a variety of products, most of which have known functions. Examples are the three ribosomal RNA genes which code for the 16S, 23S and 5S rRNAs found in all bacterial ribosomes, and the 50 or more different transfer RNA (tRNA) genes that are transcribed into the tRNAs that function as the adapter molecules in protein synthesis. Since the genome of E. coli has been completely sequenced, all of these genes are known. For example, there are 86 tRNA genes coding for 47 different tRNAs. Another of the RNA genes commonly found is the M1 RNA gene, rnpB, which codes for the enzymatic portion of Ribonuclease P, the prototypical ribozyme. Below (page 2) is a diagram of a typical protein-coding gene of E. coli (not unlike all bacteria). This single gene has a typical promoter, operator region, the ORF or CDS (coding sequence) with typical start and stop codons and a rho-independent terminator. All sequences are consensus sequences including the Shine-Delgarno sequence and the promoter regions at -35 and -10. References for the E. coli Genetic Map: 1. 2. 3. 4. Berlyn, M. K. B., 1998. Linkage map of Escherichia coli K-12, edition 10: The traditional map. Microbiol. Molec. Biol. Rev. 62:814-984 Rudd, K. E., 1998. Linkage map of Escherichia coli K-12, Edition 10: The physical map. Microbiol. Molec. Biol. Rev. 62:985-1019. CGSC: E. coli Genetic Stock Center, maintained by Mary Berlyn. http://cgsc.biology.yale.edu/ Blattner, F. R., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1462. A PROKARYOTIC GENE (cont.) The nucleotide sequence shown represents the “sense” strand, which is complimentary to and in the opposite direction of the template strand. In other words the given sequence of this DNA is the same as the mRNA sequence. The sequence is given in GenBank format; it is presented in lines of 60 nucleotides, separated in groups of ten and numbered on the left for easy identification. 1 ggtacagtcc aatatctgct attactacct ttccatcccg ggactactga ccatgactaa 61 gactaccatc atatactacg ccatatgcag tactgcaaag gtactgatcg ccatgctagg 121 gcacttgaca ataccctacc gggactagct ataatcagtc tcgttctaga tctagaacga 181 ggatcacagg ttaagcgttt tacttcaagg aggctggtca tgcgccatcg taagagtggt 241 cgtcaactga accgcaacag cagccatcgc caggctatgt tccgcaatat ggcaggttca 301 ctggttcgtc atgaaatcat caagacgact ctgcctaaag cgaaagagct gcgccgcgta 361 gttgagccgc tgattactct tgccaagact gatagcgttg ctaatcgtcg tctggcattc 421 gcccgtactc gtgataacga gatcgtggca aaactgttta acgaactggg cccgcgtttc 481 gcgagccgtg ccggtggtta cactcgtatt ctgaagtgtg gcttccgtgc aggcgacaac 541 gcgccgatgg cttacatcga gctggttgat cgttcagaga aagcagaagc tgctgcagag -35 -10 +1 S D 10 Operator start ORF 20 30 40 50 60 70 80 90 100 110 120 Stop Terminator 601 taactactta ttactacgac tgacgtagta cccgtacccg ggtactattt tttttagact 661 ctgagactac atacggtttt actactaccc atatggggca tttactacct taccctgata promoter: 125-163 [-35 box to +1] operator: 160-181 [a 22-basepair inverted repeat overlapping the promoter] Shine-Delgarno: 207-213 [the consensus S-D sequence: aaggagg-6n-atg] Start codon: 220-222 [atg AUG] ORF or CDS: 220-600 [This is the rplQ gene of E. coli, GenBank Accession Number J01685, 127 amino acids] Stop codon: 601-603 [taa UAA, ochre, most commonly used stop codon] Terminator: 626-655 [inverted repeat followed by t's]