Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Worksheet: Identify the bz gene in DNA Assuming the bz gene could be a simple ORF gene try to identify it by detecting and analyzing the ORFs in the sequence. o o o o o o o o o o o o o o o Go to http://www.bioservers.orgFind SEQUENCE SERVER ENTER Click MANAGE GROUPS Find Sequence sources, click Classes, then Public. Find Jumping Genes Across Kingdoms Check the box to the left, click OK Click the title for the first entry and set it to corn, purple endosperm; wt Click View Highlight and copy the entire sequence. Open http://www.dnai.org/geneboy In the Sequences panel click Your Sequence Paste the sequence into the central window. Replace the header Your Sequence with a name of your choosing (i.e. corn bz gene region). Click Save Sequence How long is the sequence? __2221_ bp In the Operations panel click Find Genes, then ORFs Click Reverse. Record in the table below the ORFs indicated by Gene Boy. ORF ORF 1 ORF 2 ORF 3 ORF 4 ORF 5 RF +1 _ +2 _ -1 _ -1 _ -3 _ From – To 247 - 834 842 - 1762 220 - 819 1117 - 1500 890 - 1867 _ _ _ _ _ Length [bp] 588 _ 921 _ 600 _ 384 _ 978 _ Protein length [aa] 195 _ 306 _ 299 _ 127 _ 325 _ The protein sequencing lab provides you with the amino acid for the protein product of the bz-gene (see Attachment 1). o o o o How many amino acids long is it? ___ 471 aa ____________ How many nucleotides would be required to encode a protein of this length? 1413 Could it be encoded by any of the ORFs determined above? Nope What do you think might be going on? At what point may we have made a wrong assumption? _________________________________________________________________ _ Assuming the gene is a single ORF gene. Instead, it may be a spliced gene. _ _________________________________________________________________ _________________________________________________________________ Using the DNA sequence from Sequence Server and the translation tool at http://www.dnalc.org/bioinformatics/2003/2003_dnalc_nucleotide_analyzer.htm#translat or, the Bioinformatics Department has provided you with a translation of the sequence in all three forward reading frames (see Attachment 2). Detect the amino acid sequence for the bz protein product (Attachment 1) in these three deduced amino acid sequences. Highlight in the translated sequences the amino acid stretches that are entailed in the BZ protein sequence. In order to identify the bz gene in the DNA sequence highlight the nucleotide stretches that correspond to the highlighted amino acid stretches. If necessary consult the genetic code table in Attachment 3. Discuss the structure of the gene: o What is the structure of the bz gene? __It consists of two exons and one intron ________________________ o At what position are the start and stop codons located? ___Start:____247 – 249 ________ Stop:___1760 – 1762 __________ o How many substructures does the coding region of the gene consist of? How long are these substructures? Are they divisible by three? ____Two exons;_CDS 1: 523 bp, CDS 2: 890 bp ________________________ o Concatenate the coding substructures. How long is this sequence? Is it a multiple of three? Would it be able to encode a protein of the length of the BZ protein? _____Total length CDS: 1413; encodes 471 amino acids ___________________ Use the Internet sites at http://wwwmgs.bionet.nsc.ru/mgs/programs/bdna/tata_bdna.html and http://rulai.cshl.org/tools/polyadq/polyadq_form.html for the prediction of TATAboxes and PolyA Signal, respectively. ______ see annotation in the sequence below ___________________________ _________________________________________________________________ _________________________________________________________________ Finally, run the sequence through the two gene prediction programs listed in Gene Boy under WWW Tools Gene Prediction. _________________________________________________________________ _________________________________________________________________ _________________________________________________________________ Discuss the results by comparing them with the annotation for the gene at. http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=22361 _________________________________________________________________ _________________________________________________________________ Attachment 1: Zea mays bz gene product; 471 amino acids ---------+---------+---------+---------+---------+---------+ MAPADGESSPPPHVAVVAFPFSSHAAVLLSIARALAAAAAPSGATLSFLSTASSLAQLRK 60 ---------+---------+---------+---------+---------+---------+ ASSASAGHGLPGNLRFVEVPDGAPAAEETVPVPRQMQLFMEAAEAGGVKAWLEAARAAAG 120 ---------+---------+---------+---------+---------+---------+ GARVTCVVGDAFVWPAADAAASAGAPWVPVWTAASCALLAHIRTDALREDVGDQAANRVD 180 ---------+---------+---------+---------+---------+---------+ GLLISHPGLASYRVRDLPDGVVSGDFNYVINLLVHRMGQCLPRSAAAVALNTFPGLDPPD 240 ---------+---------+---------+---------+---------+---------+ VTAALAEILPNCVPFGPYHLLLAEDDADTAAPADPHGCLAWLGRQPARGVAYVSFGTVAC 300 ---------+---------+---------+---------+---------+---------+ PRPDELRELAAGLEDSGAPFLWSLREDSWPHLPPGFLDRAAGTGSGLVVPWAPQVAVLRH 360 ---------+---------+---------+---------+---------+---------+ PSVGAFVTHAGWASVLEGLSSGVPMACRPFFGDQRMNARSVAHVWGFGAAFEGAMTSAGV 420 ---------+---------+---------+---------+---------+ATAVEELLRGEEGARMRARAKELQALVAEAFGPGGECRKNFDRFVEIVCRA 471 Attachment 2: bz gene, Zea mays, 2221 nucleotides 1--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: GGTCCCCAAACTCCACGGCACCAACAGCTAAGCCCGATGCGCTGCGTGCGCGGCGATCCAACCGCCGGCTCACCTAAAAATTTCGGCACGTCTAACTGCGAC +1: G P Q T P R H Q Q L S P M R C V R G D P T A G S P K N F G T S N C D +2: V P K L H G T N S * A R C A A C A A I Q P P A H L K I S A R L T A T +3: S P N S T A P T A K P D A L R A R R S N R R L T * K F R H V * L R L 102 ------------------------------------------------------------------------------------------------------------------------------103----+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: TGGCAGGTGCGCACGCGTGGTCGCGCGGAATAAAGCGGACACGTTGCGCCCCCAGCGAAGCCCGCACGCATCGCATTCGCATCGCATCGCAGGTCGCATCCG +1: W Q V R T R G R A E * S G H V A P P A K P A R I A F A S H R R S H P +2: G R C A R V V A R N K A D T L R P Q R S P H A S H S H R I A G R I R +3: A G A H A W S R G I K R T R C A P S E A R T H R I R I A S Q V A S D 204 ------------------------------------------------------------------------------------------------------------------------------205--+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: ACGCTAGCGGCTAGCCTAGCCGAACAGCCTGAGCGCGCGAAGATGGCGCCCGCCGACGGCGAGTCCTCCCCGCCGCCGCACGTGGCCGTGGTCGCCTTCCCG +1: T L A A S L A E Q P E R A K M A P A D G E S S P P P H V A V V A F P +2: R * R L A * P N S L S A R R W R P P T A S P P R R R T W P W S P S R +3: A S G * P S R T A * A R E D G A R R R R V L P A A A R G R G R L P V 306 ------------------------------------------------------------------------------------------------------------------------------3--+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: TTCAGCTCCCACGCGGCGGTGCTGCTCTCCATCGCGCGCGCCCTGGCTGCCGCCGCGGCGCCGTCCGGGGCCACGCTCTCGTTCCTCTCCACCGCGTCCTCC +1: F S S H A A V L L S I A R A L A A A A A P S G A T L S F L S T A S S +2: S A P T R R C C S P S R A P W L P P R R R P G P R S R S S P P R P P +3: Q L P R G G A A L H R A R P G C R R G A V R G H A L V P L H R V L P 408 ------------------------------------------------------------------------------------------------------------------------------409--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+ DNA: CTCGCGCAGCTCCGCAAGGCCAGCAGCGCCTCCGCCGGGCACGGGCTCCCGGGGAACCTGCGCTTCGTCGAGGTACCGGACGGCGCGCCCGCGGCCGAGGAG +1: L A Q L R K A S S A S A G H G L P G N L R F V E V P D G A P A A E E +2: S R S S A R P A A P P P G T G S R G T C A S S R Y R T A R P R P R R +3: R A A P Q G Q Q R L R R A R A P G E P A L R R G T G R R A R G R G D 510 ------------------------------------------------------------------------------------------------------------------------------511------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: ACCGTGCCGGTGCCGCGGCAGATGCAGCTGTTCATGGAGGCCGCGGAGGCCGGCGGGGTGAAGGCCTGGCTGGAGGCGGCCCGCGCCGCGGCGGGCGGCGCC +1: T V P V P R Q M Q L F M E A A E A G G V K A W L E A A R A A A G G A +2: P C R C R G R C S C S W R P R R P A G * R P G W R R P A P R R A A P +3: R A G A A A D A A V H G G R G G R R G E G L A G G G P R R G G R R Q 612 613----+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: AGGGTGACCTGCGTGGTGGGCGACGCGTTCGTGTGGCCGGCGGCGGACGCGGCCGCCTCCGCGGGGGCGCCGTGGGTGCCGGTGTGGACGGCCGCGTCGTGC +1: R V T C V V G D A F V W P A A D A A A S A G A P W V P V W T A A S C +2: G * P A W W A T R S C G R R R T R P P P R G R R G C R C G R P R R A +3: G D L R G G R R V R V A G G G R G R L R G G A V G A G V D G R V V R 714 ------------------------------------------------------------------------------------------------------------------------------715--+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: GCGCTCCTGGCGCACATCCGCACCGACGCGCTCCGGGAGGACGTTGGCGACCAGGGTGCGTTGGATTCTACTACTACTACTTCTCTCCCTTCCTTGTCCCTT +1: A L L A H I R T D A L R E D V G D Q G A L D S T T T T S L P S L S L +2: R S W R T S A P T R S G R T L A T R V R W I L L L L L L S L P C P F +3: A P G A H P H R R A P G G R W R P G C V G F Y Y Y Y F S P F L V P S 816 ------------------------------------------------------------------------------------------------------------------------------817+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: CATTGCGCGCGGGTTTGATGATCGAATGGCTGTTGCATTTCCATCGTTCGCAGCAGCAAACAGGGTGGACGGGCTACTGATCTCCCACCCGGGCCTCGCCAG +1: H C A R V * * S N G C C I S I V R S S K Q G G R A T D L P P G P R Q +2: I A R G F D D R M A V A F P S F A A A N R V D G L L I S H P G L A S +3: L R A G L M I E W L L H F H R S Q Q Q T G W T G Y * S P T R A S P A 918 ------------------------------------------------------------------------------------------------------------------------------919--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+ DNA: CTACCGCGTCCGTGACCTCCCAGACGGCGTCGTCTCCGGCGACTTCAACTACGTCATCAACCTCCTCGTCCACCGCATGGGGCAGTGCCTCCCGCGCTCTGC +1: L P R P * P P R R R R L R R L Q L R H Q P P R P P H G A V P P A L C +2: Y R V R D L P D G V V S G D F N Y V I N L L V H R M G Q C L P R S A +3: T A S V T S Q T A S S P A T S T T S S T S S S T A W G S A S R A L P 1020 ------------------------------------------------------------------------------------------------------------------------------1021-----+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: CGCCGCCGTGGCACTCAACACGTTCCCAGGCCTGGACCCGCCCGACGTCACCGCGGCGCTCGCGGAGATCCTGCCCAACTGCGTCCCGTTCGGCCCCTACCA +1: R R R G T Q H V P R P G P A R R H R G A R G D P A Q L R P V R P L P +2: A A V A L N T F P G L D P P D V T A A L A E I L P N C V P F G P Y H +3: P P W H S T R S Q A W T R P T S P R R S R R S C P T A S R S A P T T 1122 ------------------------------------------------------------------------------------------------------------------------------1123---+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: CCTCCTCCTCGCCGAGGACGACGCCGACACCGCCGCACCAGCCGACCCGCACGGCTGCCTCGCCTGGCTGGGCCGCCAACCCGCGCGCGGCGTCGCGTACGT +1: P P P R R G R R R H R R T S R P A R L P R L A G P P T R A R R R V R +2: L L L A E D D A D T A A P A D P H G C L A W L G R Q P A R G V A Y V +3: S S S P R T T P T P P H Q P T R T A A S P G W A A N P R A A S R T S 1224 1225-+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: CAGCTTCGGCACGGTGGCGTGCCCGCGGCCCGACGAGCTCCGCGAGCTGGCGGCCGGGCTGGAGGACTCGGGCGCGCCGTTCCTGTGGTCGCTGCGCGAGGA +1: Q L R H G G V P A A R R A P R A G G R A G G L G R A V P V V A A R G +2: S F G T V A C P R P D E L R E L A A G L E D S G A P F L W S L R E D +3: A S A R W R A R G P T S S A S W R P G W R T R A R R S C G R C A R T 1326 ------------------------------------------------------------------------------------------------------------------------------1327---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: CTCGTGGCCGCACCTCCCGCCGGGTTTCCTGGACCGCGCCGCGGGCACCGGGTCCGGGCTCGTGGTGCCCTGGGCGCCGCAGGTGGCCGTGCTGCGCCACCC +1: L V A A P P A G F P G P R R G H R V R A R G A L G A A G G R A A P P +2: S W P H L P P G F L D R A A G T G S G L V V P W A P Q V A V L R H P +3: R G R T S R R V S W T A P R A P G P G S W C P G R R R W P C C A T L 1428 ------------------------------------------------------------------------------------------------------------------------------1429-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+ DNA: TTCCGTGGGCGCGTTCGTGACGCACGCCGGGTGGGCGTCGGTGCTGGAGGGCTTGTCCAGCGGGGTGCCCATGGCGTGCCGCCCCTTCTTCGGCGACCAGCG +1: F R G R V R D A R R V G V G A G G L V Q R G A H G V P P L L R R P A +2: S V G A F V T H A G W A S V L E G L S S G V P M A C R P F F G D Q R +3: P W A R S * R T P G G R R C W R A C P A G C P W R A A P S S A T S G 1530 ------------------------------------------------------------------------------------------------------------------------------1531-----+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: GATGAACGCGCGGTCCGTGGCGCACGTGTGGGGGTTCGGCGCCGCGTTCGAGGGCGCTATGACGAGCGCCGGAGTGGCCACGGCCGTGGAGGAGCTGCTGCG +1: D E R A V R G A R V G V R R R V R G R Y D E R R S G H G R G G A A A +2: M N A R S V A H V W G F G A A F E G A M T S A G V A T A V E E L L R +3: * T R G P W R T C G G S A P R S R A L * R A P E W P R P W R S C C A 1632 ------------------------------------------------------------------------------------------------------------------------------1633---+---------+---------+---------+---------+---------+---------+---------+---------+---------+---DNA: CGGGGAGGAAGGGGCGCGGATGAGGGCAAGGGCCAAGGAGCTGCAGGCCTTGGTGGCCGAGGCGTTCGGGCCAGGCGGTGAGTGCAGGAAGAACTTCGACAG +1: R G G R G A D E G K G Q G A A G L G G R G V R A R R * V Q E E L R Q +2: G E E G A R M R A R A K E L Q A L V A E A F G P G G E C R K N F D R +3: G R K G R G * G Q G P R S C R P W W P R R S G Q A V S A G R T S T G 1734 ------------------------------------------------------------------------------------------------------------------------------1735-+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----DNA: GTTCGTCGAGATAGTCTGTCGCGCGTGAAAGGTCGTCTTGCTGTTCAGAGGTTTTACCAACAGAAGAACATAATGAATTGGATGGCATGCTACGTCGTATTC +1: V R R D S L S R V K G R L A V Q R F Y Q Q K N I M N W M A C Y V V F +2: F V E I V C R A * K V V L L F R G F T N R R T * * I G W H A T S Y S +3: S S R * S V A R E R S S C C S E V L P T E E H N E L D G M L R R I L 1836 1837---------+---------+---------+---------+---------+---------+---------+---------+---------+-------DNA: TCTTTTTTTGTTGATCCCTGAGTTGATACATTTTGTACTTGATACATGAGTTGCAGCAGCAGCAGCAACAGCCTTCTGTACCTTGGCTTTGGATCTGTATTC +1: S F F V D P * V D T F C T * Y M S C S S S S N S L L Y L G F G S V F +2: L F L L I P E L I H F V L D T * V A A A A A T A F C T L A L D L Y S +3: F F C * S L S * Y I L Y L I H E L Q Q Q Q Q Q P S V P W L W I C I L 1938 ------------------------------------------------------------------------------------------------------------------------------1939-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+ DNA: TTGTCACCAGTTATCTGAAAGCATCAATAACCTTCTGTCTTCTAGCAGTTGCCTCTCCAGATTGCCAAAATAGCATTTATTATAAGGTCTTATGCAATGTTT +1: L S P V I * K H Q * P S V F * Q L P L Q I A K I A F I I R S Y A M F +2: C H Q L S E S I N N L L S S S S C L S R L P K * H L L * G L M Q C F +3: V T S Y L K A S I T F C L L A V A S P D C Q N S I Y Y K V L C N V F 2040 ------------------------------------------------------------------------------------------------------------------------------2041-----+---------+---------+---------+---------+---------+---------+---------+---------+---------+-DNA: TCAGATTGTTCCGATTAAATCTACGATTAGCATTTTAGCCCAGCAGTCCAGCCCATTGAAGGCTTATTCAGTTATTTTTAATCCATATAAATCAAAAAAGAT +1: S D C S D * I Y D * H F S P A V Q P I E G L F S Y F * S I * I K K D +2: Q I V P I K S T I S I L A Q Q S S P L K A Y S V I F N P Y K S K K I +3: R L F R L N L R L A F * P S S P A H * R L I Q L F L I H I N Q K R L 2142 ------------------------------------------------------------------------------------------------------------------------------2143---+---------+---------+---------+---------+---------+---------+---------+DNA: TGATATAGATTAGAAAATATTTTAGTTTACTAGGAATTAAAACCCCTCAATTTTTCTTAATCCATATAAATTGTGGCAG +1: * Y R L E N I L V Y * E L K P L N F S * S I * I V A +2: D I D * K I F * F T R N * N P S I F L N P Y K L W Q +3: I * I R K Y F S L L G I K T P Q F F L I H I N C G 2221 ------------------------------------------------------------------------------------------------------------------------------- Attachment 3: Genetic Code (http://psyche.uthct.edu/shaun/SBlack/geneticd.html) Second Position of Codon T C A G TTT Phe [F] TCT Ser [S] TTC Phe [F] TCC Ser [S] T TTA Leu [L] TCA Ser [S] F TTG Leu [L] TCG Ser [S] i r CTT Leu [L] CCT Pro [P] s CTC Leu [L] CCC Pro [P] t C CTA Leu [L] CCA Pro [P] CTG Leu [L] CCG Pro [P] P TAT Tyr [Y] TAC Tyr [Y] TAA Ter [end] TAG Ter [end] TGT Cys [C] TGC Cys [C] TGA Ter [end] TGG Trp [W] CAT His [H] CAC His [H] CAA Gln [Q] CAG Gln [Q] CGT Arg [R] CGC Arg [R] CGA Arg [R] CGG Arg [R] T C A G T h T i C r d A G P o ATT Ile [I] s ATC Ile [I] i A ATA Ile [I] t ATG Met [M] i o GTT Val [V] n GTC Val [V] G GTA Val [V] GTG Val [V] ACT Thr [T] ACC Thr [T] ACA Thr [T] ACG Thr [T] AAT Asn [N] AAC Asn [N] AAA Lys [K] AAG Lys [K] AGT Ser [S] AGC Ser [S] AGA Arg [R] AGG Arg [R] T C A G GCT Ala [A] GCC Ala [A] GCA Ala [A] GCG Ala [A] GAT Asp [D] GAC Asp [D] GAA Glu [E] GAG Glu [E] GGT Gly [G] GGC Gly [G] GGA Gly [G] GGG Gly [G] o s i t i o T n C A G An explanation of the Genetic Code: DNA is a two-stranded molecule. Each strand is a polynucleotide composed of A (adenosine), T (thymidine), C (cytidine), and G (guanosine) residues polymerized by "dehydration" synthesis in linear chains with specific sequences. Each strand has polarity, such that the 5'-hydroxyl (or 5'-phospho) group of the first nucleotide begins the strand and the 3'-hydroxyl group of the final nucleotide ends the strand; accordingly, we say that this strand runs 5' to 3' ("Five prime to three prime") . It is also essential to know that the two strands of DNA run antiparallel such that one strand runs 5' -> 3' while the other one runs 3' -> 5'. At each nucleotide residue along the double-stranded DNA molecule, the nucleotides are complementary. That is, A forms two hydrogen-bonds with T; C forms three hydrogen bonds with G. In most cases the twostranded, antiparallel, complementary DNA molecule folds to form a helical structure which resembles a spiral staircase. This is the reason why DNA has been referred to as the "Double Helix". One strand of DNA holds the information that codes for various genes; this strand is often called the template strand or antisense strand (containing anticodons). The other, and complementary, strand is called the coding strand or sense strand (containing codons). Since mRNA is made from the template strand, it has the same information as the coding strand. The table above refers to triplet nucleotide codons along the sequence of the coding or sense strand of DNA as it runs 5' -> 3'; the code for the mRNA would be identical but for the fact that RNA contains U (uridine) rather than T. An example of two complementary strands of DNA would be: (5' -> 3') ATGGAATTCTCGCTC (Coding, sense strand) (3' <- 5') TACCTTAAGAGCGAG (Template, antisense strand) (5' -> 3') AUGGAAUUCUCGCUC (mRNA made from Template strand) Since amino acid residues of proteins are specified as triplet codons, the protein sequence made from the above example would be Met-Glu-Phe-Ser-Leu... (MEFSL...). Practically, codons are "decoded" by transfer RNAs (tRNA) which interact with a ribosome-bound messenger RNA (mRNA) containing the coding sequence. There are 64 different tRNAs, each of which has an anticodon loop (used to recognize codons in the mRNA). 61 of these have a bound amino acyl residue; the appropriate "charged" tRNA binds to the respective next codon in the mRNA and the ribosome catalyzes the transfer of the amino acid from the tRNA to the growing (nascent) protein/polypeptide chain. The remaining 3 codons are used for "punctuation"; that is, they signal the termination (the end) of the growing polypeptide chain. Lastly, the Genetic Code in the table above has also been called "The Universal Genetic Code". It is known as "universal", because it is used by all known organisms as a code for DNA, mRNA, and tRNA. The universality of the genetic code encompases animals (including humans), plants, fungi, archaea, bacteria, and viruses. However, all rules have their exceptions, and such is the case with the Genetic Code; small variations in the code exist in mitochondria and certain microbes. Nonetheless, it should be emphasized that these variances represent only a small fraction of known cases, and that the Genetic Code applies quite broadly, certainly to all known nuclear genes. Suggestions for additions or changes can be sent to Dr. Shaun D. Black Last update August 25, 1998 Shaun D. Black, University of Texas Health Center at Tyler