Download The Young Scholars Program - 1996

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA silencing wikipedia , lookup

Lac operon wikipedia , lookup

Gene desert wikipedia , lookup

RNA interference wikipedia , lookup

Epitranscriptome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Non-coding RNA wikipedia , lookup

Genetic code wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene expression wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
A PROKARYOTIC GENE
Escherichia coli contains in its genome about 4000 protein-coding
genes and 100 RNA genes. To be exact, there are 4289 ORFs (open
reading frames), 86 tRNA genes, 22 rRNA genes and seven small
molecular weight-RNA genes. This makes a grand total of 4404 genes in
E. coli.
Of the more than 4000 protein-coding genes, about 60% have known
function. Before the genome was sequenced there were 1853
characterized genes, and since the sequence has been completed another
750 ORFs have been assigned a function based on the comparison of the
ORF sequence to already known genes from other genomes or to other
genes in E. coli’s genome. Within E. coli there are gene families of
“paralogous” genes, which are not identical, but which have related
sequence and function. For example there are 80 ABC transporter genes
(genes involved in group translocations, i.e., PEP:PTS).
The RNA genes code for a variety of products, most of which have
known functions. Examples are the three ribosomal RNA genes which
code for the 16S, 23S and 5S rRNAs found in all bacterial ribosomes,
and the 50 or more different transfer RNA (tRNA) genes that are
transcribed into the tRNAs that function as the adapter molecules in
protein synthesis. Since the genome of E. coli has been completely
sequenced, all of these genes are known. For example, there are 86
tRNA genes coding for 47 different tRNAs. Another of the RNA genes
commonly found is the M1 RNA gene, rnpB, which codes for the enzymatic
portion of Ribonuclease P, the prototypical ribozyme.
Below (page 2) is a diagram of a typical protein-coding gene of
E. coli (not unlike all bacteria). This single gene has a typical
promoter, operator region, the ORF or CDS (coding sequence) with
typical start and stop codons and a rho-independent terminator. All
sequences are consensus sequences including the Shine-Delgarno
sequence and the promoter regions at -35 and -10.
References for the E. coli Genetic Map:
1.
2.
3.
4.
Berlyn, M. K. B., 1998. Linkage map of Escherichia coli K-12,
edition 10: The traditional map. Microbiol. Molec. Biol. Rev.
62:814-984
Rudd, K. E., 1998. Linkage map of Escherichia coli K-12, Edition
10: The physical map. Microbiol. Molec. Biol. Rev. 62:985-1019.
CGSC: E. coli Genetic Stock Center, maintained by Mary Berlyn.
http://cgsc.biology.yale.edu/
Blattner, F. R., et al. 1997. The complete genome sequence of
Escherichia coli K-12. Science 277: 1453-1462.
A PROKARYOTIC GENE (cont.)
The nucleotide sequence shown represents the “sense” strand,
which is complimentary to and in the opposite direction of the
template strand. In other words the given sequence of this DNA is the
same as the mRNA sequence.
The sequence is given in GenBank format;
it is presented in lines of 60 nucleotides, separated in groups of ten
and numbered on the left for easy identification.
1
ggtacagtcc aatatctgct attactacct ttccatcccg ggactactga ccatgactaa
61
gactaccatc atatactacg ccatatgcag tactgcaaag gtactgatcg ccatgctagg
121
gcacttgaca ataccctacc gggactagct ataatcagtc tcgttctaga tctagaacga
181
ggatcacagg ttaagcgttt tacttcaagg aggctggtca tgcgccatcg taagagtggt
241
cgtcaactga accgcaacag cagccatcgc caggctatgt tccgcaatat ggcaggttca
301
ctggttcgtc atgaaatcat caagacgact ctgcctaaag cgaaagagct gcgccgcgta
361
gttgagccgc tgattactct tgccaagact gatagcgttg ctaatcgtcg tctggcattc
421
gcccgtactc gtgataacga gatcgtggca aaactgttta acgaactggg cccgcgtttc
481
gcgagccgtg ccggtggtta cactcgtatt ctgaagtgtg gcttccgtgc aggcgacaac
541
gcgccgatgg cttacatcga gctggttgat cgttcagaga aagcagaagc tgctgcagag
-35
-10
+1
S D
10
Operator
start ORF
20
30
40
50
60
70
80
90
100
110
120
Stop
Terminator
601
taactactta ttactacgac tgacgtagta cccgtacccg ggtactattt tttttagact
661
ctgagactac atacggtttt actactaccc atatggggca tttactacct taccctgata
promoter:
125-163
[-35 box to +1]
operator:
160-181
[a 22-basepair inverted repeat
overlapping the promoter]
Shine-Delgarno:
207-213
[the consensus S-D sequence:
aaggagg-6n-atg]
Start codon:
220-222
[atg  AUG]
ORF or CDS:
220-600
[This is the rplQ gene of E. coli,
GenBank Accession Number J01685,
127 amino acids]
Stop codon:
601-603
[taa  UAA, ochre, most commonly
used stop codon]
Terminator:
626-655
[inverted repeat followed by t's]