* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Bz gene identification
Transfer RNA wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Gene expression programming wikipedia , lookup
Epitranscriptome wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Transposable element wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Gene desert wikipedia , lookup
Genomic library wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Molecular cloning wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Gene therapy wikipedia , lookup
DNA supercoil wikipedia , lookup
DNA vaccination wikipedia , lookup
Epigenomics wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Gene expression profiling wikipedia , lookup
Metagenomics wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
Frameshift mutation wikipedia , lookup
Human genome wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Genetic engineering wikipedia , lookup
Primary transcript wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Expanded genetic code wikipedia , lookup
Non-coding DNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Microsatellite wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Genome editing wikipedia , lookup
Microevolution wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Bronze Gene Prediction Instructions and Worksheet
Save this worksheet to your desktop and complete it on the computer!
Complete this worksheet in MS Word on your computer. If you have this document in print,
open it online http://www.dnai.org/media/bioinformatics/genefinding/bzgeneprediction_ws.doc.
If you opened this document in an Internet browser click File, click Save as, and save it to a
directory on your C- or A-drives. Then, close the browser, open the document in MS Word, and
follow the instructions to answer the questions. In doing so, you will discover where in the
sequence the bz gene is locatied, it’s structure and location in the maize genome, as well as the
3D structure of the bz protein product. Along the way you will become familiar with
bioinformatics routines such as locating and extracting information and sequences about/for
genes, genomes, and proteins from databases.
Try to find gene in DNA by determining the Open Reading Frames (ORFs) it contains
Assuming the bronze gene could be an ORF gene, try to find it by identifying and
analyzing the ORFs in the DNA sequence.
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
Open this worksheet on your computer, save it, and open it in MS Word.
Go to http://www.bioservers.org.
Find SEQUENCE SERVER, click ENTER.
Click MANAGE GROUPS.
Find Sequence sources, click Classes, then Public.
Find Jumping Genes Across Kingdoms, check the box to the left, click OK.
Click the title for the first entry and set it to corn, purple endosperm; wt.
Click Open, highlight and copy the entire sequence. Click Done.
Open Gene Boy at http://www.dnai.org/geneboy.
In the Sequences panel click Your Sequence.
aste the sequence into the central window.
Optional: replace the header Your Sequence with a name of your choosing (i.e.
corn bronze gene.
Click Save Sequence.
How long is the sequence? _____________ bp
In the Operations panel click Find Genes, then ORFs.
Click Reverse.
Record the ORFs indicated by Gene Boy in the table below and determine the
length of the amino acid sequence each could potentially encode.
ORF
ORF 1
ORF 2
ORF 3
ORF 4
ORF 5
RF
1 _
_
_
_
_
From – To
247-834
_
_
_
_
_
Length [bp]
588 bp _
_
_
_
_
Protein length [aa]
195 aa
_
_
_
_
_
The protein sequencing lab provides you with the amino acid for the protein product of
the bronze gene (see Attachment 1).
o
o
o
o
How many amino acids long is it? _____________________aa_
How many nucleotides are needed to encode a protein of this length? _______nt_
Could this protein be encoded by any of the ORFs determined above? _ yes/no _
What do you think might be going on? At what point may we have made a wrong
assumption?
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
Confirm the potential of the DNA sequence to encode the BZ protein by using the DNA to
search DNA databases for similar sequences
(This search can be conducted by using Gene Boy, Sequence Server, or any Internet site that
provides access to a Blast search.)
Go back to Gene Boy, click Clear, click your sequence.
Under Operations, click WWW Tools, click ORF.
Find Redraw, change the number next to it from 100 to 300, click Redraw.
Compare the ORFs indicated with the results you recorded in the table above.
Click on an ORF and submit the deduced amino acid sequence to a blastp search by
clicking blast.
Record the Request-id: ____________________________________________
Click Format.
The E Value is the most meaningful indicator for the quality of a hit; the lower the E
Value, the better the hit. Usually, E Values of less than 0.1 indicate meaningful hits. (For
further explanations click the link to Blast FAQ in the upper part of the NCBI Blast result
page.)
Read the titles listed for acceptable search hits and determine the nature of the gene.
Record the gi-number for an entry you wish to examine in more detail: ______________
Click the gi-link.
What protein does the GenBank entry contain? _________________________________
How long is it? __________________________________________________________
Does any of the ORFs listed in the table above encode a protein of this length? yes/no
Determine the model for the gene using protein evidence
The BZ protein has been sequenced (Attachment 1) and so has the DNA sequence (Sequence
Server, Attachment 2). Attachment 2 also provides a translation of this DNA sequence (deduced
amino acid sequence generated using the electronic DNA sequence translation tool at
http://www.dnalc.org/bioinformatics/2003/2003_dnalc_nucleotide_analyzer.htm#translator; see
Attachment 2). Detect within the deduced amino acid sequences in Attachment 2 the amino acid
sequence for the bz protein product provided in Attachment 1. Find in the translated sequences
the amino acid stretches that are entailed in the protein sequence and determine the coding
portion in the DNA.
In order to identify the bz gene in the DNA sequence highlight the nucleotide stretches
that correspond to the highlighted amino acid stretches. If necessary consult the genetic
code table in Attachment 3.
Discuss the structure of the gene:
o What is the structure of the bronze gene? ________________________________
o Describe the gene model for the bz gene:
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
o Concatenate the coding sequences. How long is the resulting sequence? Would it
be able to encode a protein of the right length? ___________________________
Use the Internet sites at http://wwwmgs.bionet.nsc.ru/mgs/programs/bdna/tata_bdna.html
and http://rulai.cshl.org/tools/polyadq/polyadq_form.html for the prediction of TATAboxes and PolyA Signal, respectively.
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
Finally, run the sequence through the two gene prediction programs listed in Gene Boy
under WWW Tools Gene Prediction.
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
Discuss the results by comparing them with the annotation for the gene at:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=22361
_________________________________________________________________
_________________________________________________________________
Discuss characteristics of spliced genes
… by deleting from the table below all wrong answers:
Begin with start codon
End with stop codon
Nucleotide number is multiple of 3
Contain coding sequence (CDS)
Contain stop codons
CDS can change reading frame
Exons
_True / False_
_True / False_
_True / False_
_True / False_
_True / False_
_True / False_
Introns
_True / False_
_True / False_
_True / False_
_True / False_
_True / False_
_True / False_
Determine the location of the gene in the maize genome
Click Map Viewer.
Click Zea mays.
Click Blast search plant genome.
Enter the sequence into the search window, click Blast.
Record the Request Id: _______________________________
Click Format.
Click Genome View.
How many chromosomes does maize have? ____ What chromosome is the gene on? ___
To view the gene in its environment click the number underneath the chromosome.
Zoom into the chromosome until the gene model for this gene becomes discernable.
Attachment 1:
Zea mays bronze gene product; 471 amino acids
---------+---------+---------+---------+---------+---------+
MAPADGESSPPPHVAVVAFPFSSHAAVLLSIARALAAAAAPSGATLSFLSTASSLAQLRK 60
---------+---------+---------+---------+---------+---------+
ASSASAGHGLPGNLRFVEVPDGAPAAEETVPVPRQMQLFMEAAEAGGVKAWLEAARAAAG 120
---------+---------+---------+---------+---------+---------+
GARVTCVVGDAFVWPAADAAASAGAPWVPVWTAASCALLAHIRTDALREDVGDQAANRVD 180
---------+---------+---------+---------+---------+---------+
GLLISHPGLASYRVRDLPDGVVSGDFNYVINLLVHRMGQCLPRSAAAVALNTFPGLDPPD 240
---------+---------+---------+---------+---------+---------+
VTAALAEILPNCVPFGPYHLLLAEDDADTAAPADPHGCLAWLGRQPARGVAYVSFGTVAC 300
---------+---------+---------+---------+---------+---------+
PRPDELRELAAGLEDSGAPFLWSLREDSWPHLPPGFLDRAAGTGSGLVVPWAPQVAVLRH 360
---------+---------+---------+---------+---------+---------+
PSVGAFVTHAGWASVLEGLSSGVPMACRPFFGDQRMNARSVAHVWGFGAAFEGAMTSAGV 420
---------+---------+---------+---------+---------+ATAVEELLRGEEGARMRARAKELQALVAEAFGPGGECRKNFDRFVEIVCRA 471
Attachment 2: bronze gene, Zea mays, 2221 nucleotides
1--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: GGTCCCCAAACTCCACGGCACCAACAGCTAAGCCCGATGCGCTGCGTGCGCGGCGATCCAACCGCCGGCTCACCTAAAAATTTCGGCACGTCTAACTGCGAC
+1: G P Q T P R H Q Q L S P M R C V R G D P T A G S P K N F G T S N C D
+2: V P K L H G T N S * A R C A A C A A I Q P P A H L K I S A R L T A T
+3:
S P N S T A P T A K P D A L R A R R S N R R L T * K F R H V * L R L
102
------------------------------------------------------------------------------------------------------------------------------103----+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: TGGCAGGTGCGCACGCGTGGTCGCGCGGAATAAAGCGGACACGTTGCGCCCCCAGCGAAGCCCGCACGCATCGCATTCGCATCGCATCGCAGGTCGCATCCG
+1: W Q V R T R G R A E * S G H V A P P A K P A R I A F A S H R R S H P
+2: G R C A R V V A R N K A D T L R P Q R S P H A S H S H R I A G R I R
+3:
A G A H A W S R G I K R T R C A P S E A R T H R I R I A S Q V A S D
204
------------------------------------------------------------------------------------------------------------------------------205--+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: ACGCTAGCGGCTAGCCTAGCCGAACAGCCTGAGCGCGCGAAGATGGCGCCCGCCGACGGCGAGTCCTCCCCGCCGCCGCACGTGGCCGTGGTCGCCTTCCCG
+1: T L A A S L A E Q P E R A K M A P A D G E S S P P P H V A V V A F P
+2: R * R L A * P N S L S A R R W R P P T A S P P R R R T W P W S P S R
+3:
A S G * P S R T A * A R E D G A R R R R V L P A A A R G R G R L P V
306
------------------------------------------------------------------------------------------------------------------------------3--+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: TTCAGCTCCCACGCGGCGGTGCTGCTCTCCATCGCGCGCGCCCTGGCTGCCGCCGCGGCGCCGTCCGGGGCCACGCTCTCGTTCCTCTCCACCGCGTCCTCC
+1: F S S H A A V L L S I A R A L A A A A A P S G A T L S F L S T A S S
+2: S A P T R R C C S P S R A P W L P P R R R P G P R S R S S P P R P P
+3:
Q L P R G G A A L H R A R P G C R R G A V R G H A L V P L H R V L P
408
------------------------------------------------------------------------------------------------------------------------------409--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
DNA: CTCGCGCAGCTCCGCAAGGCCAGCAGCGCCTCCGCCGGGCACGGGCTCCCGGGGAACCTGCGCTTCGTCGAGGTACCGGACGGCGCGCCCGCGGCCGAGGAG
+1: L A Q L R K A S S A S A G H G L P G N L R F V E V P D G A P A A E E
+2: S R S S A R P A A P P P G T G S R G T C A S S R Y R T A R P R P R R
+3:
R A A P Q G Q Q R L R R A R A P G E P A L R R G T G R R A R G R G D
510
------------------------------------------------------------------------------------------------------------------------------511------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: ACCGTGCCGGTGCCGCGGCAGATGCAGCTGTTCATGGAGGCCGCGGAGGCCGGCGGGGTGAAGGCCTGGCTGGAGGCGGCCCGCGCCGCGGCGGGCGGCGCC
+1: T V P V P R Q M Q L F M E A A E A G G V K A W L E A A R A A A G G A
+2: P C R C R G R C S C S W R P R R P A G * R P G W R R P A P R R A A P
+3:
R A G A A A D A A V H G G R G G R R G E G L A G G G P R R G G R R Q
612
613----+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: AGGGTGACCTGCGTGGTGGGCGACGCGTTCGTGTGGCCGGCGGCGGACGCGGCCGCCTCCGCGGGGGCGCCGTGGGTGCCGGTGTGGACGGCCGCGTCGTGC
+1: R V T C V V G D A F V W P A A D A A A S A G A P W V P V W T A A S C
+2: G * P A W W A T R S C G R R R T R P P P R G R R G C R C G R P R R A
+3:
G D L R G G R R V R V A G G G R G R L R G G A V G A G V D G R V V R
714
------------------------------------------------------------------------------------------------------------------------------715--+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: GCGCTCCTGGCGCACATCCGCACCGACGCGCTCCGGGAGGACGTTGGCGACCAGGGTGCGTTGGATTCTACTACTACTACTTCTCTCCCTTCCTTGTCCCTT
+1: A L L A H I R T D A L R E D V G D Q G A L D S T T T T S L P S L S L
+2: R S W R T S A P T R S G R T L A T R V R W I L L L L L L S L P C P F
+3:
A P G A H P H R R A P G G R W R P G C V G F Y Y Y Y F S P F L V P S
816
------------------------------------------------------------------------------------------------------------------------------817+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: CATTGCGCGCGGGTTTGATGATCGAATGGCTGTTGCATTTCCATCGTTCGCAGCAGCAAACAGGGTGGACGGGCTACTGATCTCCCACCCGGGCCTCGCCAG
+1: H C A R V * * S N G C C I S I V R S S K Q G G R A T D L P P G P R Q
+2: I A R G F D D R M A V A F P S F A A A N R V D G L L I S H P G L A S
+3:
L R A G L M I E W L L H F H R S Q Q Q T G W T G Y * S P T R A S P A
918
------------------------------------------------------------------------------------------------------------------------------919--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
DNA: CTACCGCGTCCGTGACCTCCCAGACGGCGTCGTCTCCGGCGACTTCAACTACGTCATCAACCTCCTCGTCCACCGCATGGGGCAGTGCCTCCCGCGCTCTGC
+1: L P R P * P P R R R R L R R L Q L R H Q P P R P P H G A V P P A L C
+2: Y R V R D L P D G V V S G D F N Y V I N L L V H R M G Q C L P R S A
+3:
T A S V T S Q T A S S P A T S T T S S T S S S T A W G S A S R A L P
1020
------------------------------------------------------------------------------------------------------------------------------1021-----+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: CGCCGCCGTGGCACTCAACACGTTCCCAGGCCTGGACCCGCCCGACGTCACCGCGGCGCTCGCGGAGATCCTGCCCAACTGCGTCCCGTTCGGCCCCTACCA
+1: R R R G T Q H V P R P G P A R R H R G A R G D P A Q L R P V R P L P
+2: A A V A L N T F P G L D P P D V T A A L A E I L P N C V P F G P Y H
+3:
P P W H S T R S Q A W T R P T S P R R S R R S C P T A S R S A P T T
1122
------------------------------------------------------------------------------------------------------------------------------1123---+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: CCTCCTCCTCGCCGAGGACGACGCCGACACCGCCGCACCAGCCGACCCGCACGGCTGCCTCGCCTGGCTGGGCCGCCAACCCGCGCGCGGCGTCGCGTACGT
+1: P P P R R G R R R H R R T S R P A R L P R L A G P P T R A R R R V R
+2: L L L A E D D A D T A A P A D P H G C L A W L G R Q P A R G V A Y V
+3:
S S S P R T T P T P P H Q P T R T A A S P G W A A N P R A A S R T S
1224
1225-+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: CAGCTTCGGCACGGTGGCGTGCCCGCGGCCCGACGAGCTCCGCGAGCTGGCGGCCGGGCTGGAGGACTCGGGCGCGCCGTTCCTGTGGTCGCTGCGCGAGGA
+1: Q L R H G G V P A A R R A P R A G G R A G G L G R A V P V V A A R G
+2: S F G T V A C P R P D E L R E L A A G L E D S G A P F L W S L R E D
+3:
A S A R W R A R G P T S S A S W R P G W R T R A R R S C G R C A R T
1326
------------------------------------------------------------------------------------------------------------------------------1327---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: CTCGTGGCCGCACCTCCCGCCGGGTTTCCTGGACCGCGCCGCGGGCACCGGGTCCGGGCTCGTGGTGCCCTGGGCGCCGCAGGTGGCCGTGCTGCGCCACCC
+1: L V A A P P A G F P G P R R G H R V R A R G A L G A A G G R A A P P
+2: S W P H L P P G F L D R A A G T G S G L V V P W A P Q V A V L R H P
+3:
R G R T S R R V S W T A P R A P G P G S W C P G R R R W P C C A T L
1428
------------------------------------------------------------------------------------------------------------------------------1429-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
DNA: TTCCGTGGGCGCGTTCGTGACGCACGCCGGGTGGGCGTCGGTGCTGGAGGGCTTGTCCAGCGGGGTGCCCATGGCGTGCCGCCCCTTCTTCGGCGACCAGCG
+1: F R G R V R D A R R V G V G A G G L V Q R G A H G V P P L L R R P A
+2: S V G A F V T H A G W A S V L E G L S S G V P M A C R P F F G D Q R
+3:
P W A R S * R T P G G R R C W R A C P A G C P W R A A P S S A T S G
1530
------------------------------------------------------------------------------------------------------------------------------1531-----+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: GATGAACGCGCGGTCCGTGGCGCACGTGTGGGGGTTCGGCGCCGCGTTCGAGGGCGCTATGACGAGCGCCGGAGTGGCCACGGCCGTGGAGGAGCTGCTGCG
+1: D E R A V R G A R V G V R R R V R G R Y D E R R S G H G R G G A A A
+2: M N A R S V A H V W G F G A A F E G A M T S A G V A T A V E E L L R
+3:
* T R G P W R T C G G S A P R S R A L * R A P E W P R P W R S C C A
1632
------------------------------------------------------------------------------------------------------------------------------1633---+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: CGGGGAGGAAGGGGCGCGGATGAGGGCAAGGGCCAAGGAGCTGCAGGCCTTGGTGGCCGAGGCGTTCGGGCCAGGCGGTGAGTGCAGGAAGAACTTCGACAG
+1: R G G R G A D E G K G Q G A A G L G G R G V R A R R * V Q E E L R Q
+2: G E E G A R M R A R A K E L Q A L V A E A F G P G G E C R K N F D R
+3:
G R K G R G * G Q G P R S C R P W W P R R S G Q A V S A G R T S T G
1734
------------------------------------------------------------------------------------------------------------------------------1735-+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: GTTCGTCGAGATAGTCTGTCGCGCGTGAAAGGTCGTCTTGCTGTTCAGAGGTTTTACCAACAGAAGAACATAATGAATTGGATGGCATGCTACGTCGTATTC
+1: V R R D S L S R V K G R L A V Q R F Y Q Q K N I M N W M A C Y V V F
+2: F V E I V C R A * K V V L L F R G F T N R R T * * I G W H A T S Y S
+3:
S S R * S V A R E R S S C C S E V L P T E E H N E L D G M L R R I L
1836
1837---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: TCTTTTTTTGTTGATCCCTGAGTTGATACATTTTGTACTTGATACATGAGTTGCAGCAGCAGCAGCAACAGCCTTCTGTACCTTGGCTTTGGATCTGTATTC
+1: S F F V D P * V D T F C T * Y M S C S S S S N S L L Y L G F G S V F
+2: L F L L I P E L I H F V L D T * V A A A A A T A F C T L A L D L Y S
+3:
F F C * S L S * Y I L Y L I H E L Q Q Q Q Q Q P S V P W L W I C I L
1938
------------------------------------------------------------------------------------------------------------------------------1939-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
DNA: TTGTCACCAGTTATCTGAAAGCATCAATAACCTTCTGTCTTCTAGCAGTTGCCTCTCCAGATTGCCAAAATAGCATTTATTATAAGGTCTTATGCAATGTTT
+1: L S P V I * K H Q * P S V F * Q L P L Q I A K I A F I I R S Y A M F
+2: C H Q L S E S I N N L L S S S S C L S R L P K * H L L * G L M Q C F
+3:
V T S Y L K A S I T F C L L A V A S P D C Q N S I Y Y K V L C N V F
2040
------------------------------------------------------------------------------------------------------------------------------2041-----+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: TCAGATTGTTCCGATTAAATCTACGATTAGCATTTTAGCCCAGCAGTCCAGCCCATTGAAGGCTTATTCAGTTATTTTTAATCCATATAAATCAAAAAAGAT
+1: S D C S D * I Y D * H F S P A V Q P I E G L F S Y F * S I * I K K D
+2: Q I V P I K S T I S I L A Q Q S S P L K A Y S V I F N P Y K S K K I
+3:
R L F R L N L R L A F * P S S P A H * R L I Q L F L I H I N Q K R L
2142
------------------------------------------------------------------------------------------------------------------------------2143---+---------+---------+---------+---------+---------+---------+---------+DNA: TGATATAGATTAGAAAATATTTTAGTTTACTAGGAATTAAAACCCCTCAATTTTTCTTAATCCATATAAATTGTGGCAG
+1: * Y R L E N I L V Y * E L K P L N F S * S I * I V A
+2: D I D * K I F * F T R N * N P S I F L N P Y K L W Q
+3:
I * I R K Y F S L L G I K T P Q F F L I H I N C G
2221
-------------------------------------------------------------------------------------------------------------------------------
Attachment 3: Genetic Code (from http://psyche.uthct.edu/shaun/SBlack/geneticd.html)
Second Position of Codon
T
T
C
A
G
TTT Phe [F]
TTC Phe [F]
TTA Leu [L]
TTG Leu [L]
TCT Ser [S]
TCC Ser [S]
TCA Ser [S]
TCG Ser [S]
TAT Tyr [Y]
TAC Tyr [Y]
TAA Ter [end]
TAG Ter [end]
TGT Cys [C]
TGC Cys [C]
TGA Ter [end]
TGG Trp [W]
CCT Pro [P]
CCC Pro [P]
CCA Pro [P]
CCG Pro [P]
CAT His [H]
CAC His [H]
CAA Gln [Q]
CAG Gln [Q]
CGT Arg [R]
CGC Arg [R]
CGA Arg [R]
CGG Arg [R]
ACT Thr [T]
ACC Thr [T]
ACA Thr [T]
ACG Thr [T]
AAT Asn [N]
AAC Asn [N]
AAA Lys [K]
AAG Lys [K]
AGT Ser [S]
AGC Ser [S]
AGA Arg [R]
AGG Arg [R]
GCT Ala [A] GAT Asp [D]
GCC Ala [A] GAC Asp [D]
GCA Ala [A] GAA Glu [E]
GCG Ala [A] GAG Glu [E]
GGT Gly [G]
GGC Gly [G]
GGA Gly [G]
GGG Gly [G]
F
i
CTT Leu [L]
r
s
CTC Leu [L]
t C CTA Leu [L]
CTG Leu [L]
P
o
ATT Ile [I]
s
i A ATC Ile [I]
ATA Ile [I]
t
i
ATG Met [M]
o
GTT Val [V]
n
GTC Val [V]
G
GTA Val [V]
GTG Val [V]
T
C
A
G T
h
T i
C r
A d
G P
T o
s
C i
A t
G i
o
T n
C
A
G
An explanation of the Genetic Code: DNA is a two-stranded molecule. Each strand is a polynucleotide
composed of A (adenosine), T (thymidine), C (cytidine), and G (guanosine) residues polymerized by
"dehydration" synthesis in linear chains with specific sequences. Each strand has polarity, such that the 5'hydroxyl (or 5'-phospho) group of the first nucleotide begins the strand and the 3'-hydroxyl group of the final
nucleotide ends the strand; accordingly, we say that this strand runs 5' to 3' ("Five prime to three prime") . It is
also essential to know that the two strands of DNA run antiparallel such that one strand runs 5' -> 3' while the
other one runs 3' -> 5'. At each nucleotide residue along the double-stranded DNA molecule, the nucleotides
are complementary. That is, A forms two hydrogen-bonds with T; C forms three hydrogen bonds with G. In
most cases the two-stranded, antiparallel, complementary DNA molecule folds to form a helical structure
which resembles a spiral staircase. This is the reason why DNA has been referred to as the "Double Helix".
One strand of DNA holds the information that codes for various genes; this strand is often called the template
strand or antisense strand (containing anticodons). The other, and complementary, strand is called the coding
strand or sense strand (containing codons). Since mRNA is made from the template strand, it has the same
information as the coding strand. The table above refers to triplet nucleotide codons along the sequence of the
coding or sense strand of DNA as it runs 5' -> 3'; the code for the mRNA would be identical but for the fact
that RNA contains U (uridine) rather than T.
An example of two complementary strands of DNA would be:
(5' -> 3') ATGGAATTCTCGCTC
(Coding, sense strand)
(3' <- 5') TACCTTAAGAGCGAG (Template, antisense strand)
(5' -> 3') AUGGAAUUCUCGCUC
(mRNA made from Template strand)
Since amino acid residues of proteins are specified as triplet codons, the protein sequence made from the
above example would be Met-Glu-Phe-Ser-Leu... (MEFSL...).
Practically, codons are "decoded" by transfer RNAs (tRNA) which interact with a ribosome-bound messenger
RNA (mRNA) containing the coding sequence. There are 64 different tRNAs, each of which has an anticodon
loop (used to recognize codons in the mRNA). 61 of these have a bound amino acyl residue; the appropriate
"charged" tRNA binds to the respective next codon in the mRNA and the ribosome catalyzes the transfer of
the amino acid from the tRNA to the growing (nascent) protein/polypeptide chain. The remaining 3 codons are
used for "punctuation"; that is, they signal the termination (the end) of the growing polypeptide chain.
Lastly, the Genetic Code in the table above has also been called "The Universal Genetic Code". It is known as
"universal", because it is used by all known organisms as a code for DNA, mRNA, and tRNA. The
universality of the genetic code encompases animals (including humans), plants, fungi, archaea, bacteria, and
viruses. However, all rules have their exceptions, and such is the case with the Genetic Code; small variations
in the code exist in mitochondria and certain microbes. Nonetheless, it should be emphasized that these
variances represent only a small fraction of known cases, and that the Genetic Code applies quite broadly,
certainly to all known nuclear genes.