Download 1 - People

INBRE Bioinformatics Workshop for Undergrads Exercise 5 – Aug 9, 2005 Dinosaurs, frogs & secret messages Background In 1990, Michael Crichton published the book Jurassic Park about the re-creation of dinosaurs using DNA extracted from dinosaur blood preserved in the stomachs of insects, after they became encased in tree sap, which later turned into the mineral amber. At one point in the book, Dr. Henry Wu is asked to explain some of DNA techniques used in reconstructing the extinct dinosaur genomes. Dr. Wu describes the use of restriction enzymes and how the fragmented pieces of dinosaur DNA can be spliced together with these enzymes. He also alludes to the fact that they don't have the entire genome but they “fill in the gaps” with modern day frog DNA. Later, he points to a computer screen and remarks, “Here you see the actual structure of a small fragment of dinosaur DNA.” Dr. Mark Boguski, at the NIH’s National Center for Biotechnology Information, NCBI, having read the book Jurrasic Park, entered this sequence into a text editor and searched all of the known DNA sequences at the time. This collection of sequences makes up a database referred to as GenBank. The sequence was garbage, of course; there was no DNA derived from dinosaurs. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. In his second book, The Lost World, Mr. Crichton used Mark as a consultant. Mark chose a DNA sequence from a living organism that is closely related to the dinosaurs. Mark also mixed in some frog (Xenopus) DNA, just like Dr. Wu described, to fill in the holes in their dino-genomes. However, Mark played a little trick on Mr. Crichton by embedding a message in the protein translation of the DNA sequence that he submitted for use in the book. The Problem This exercise will take you through the steps required to examine the sequences Mark used in “The Lost World” to show how he imagined dinosaur DNA sequences might look, and then to decode the message he embedded in the sequences. The exercise will use a toolkit called The Next Generation Biology Workbench (www.ngbw.org). It is assumed here that you can find the site, create an account, and log in. The NGBW is based on the use of folders for data and tasks. Once you login you will see those. The NGBW also has flash help files to assist you in undertaking the analyses described below. The Approach 1) First upload the dinosaur DNA sequences to the data area of your active NGBW folder. To do this, open the Data folder, and when the Data Management pane appears, click the “Upload Data” button. When the Data Upload pane appears, open a second browser window and go to the website listed below. Once you find it, copy and paste the sequence into the appropriate box, and fill in the rest of the form by providing a label (required), and by using the dropdown menus to specify that it is a Nucleic Acid Sequence, in Fasta format. Now click the “Save” button to save it into your data area. 2) Now you will want to run a blast search of this Nucleic Acid sequence against all of the GenBank DNA sequences. The purpose of this activity is to compare the DinoDNA sequence, to all of the DNA sequences in GenBank (which is nearly all of the known sequences). The more similar any given DNA sequence is to a known sequence, the more likely it is the two sequences are close relatives in structure, function, and evolutionary history. Comparisons between sequences are conducted using BLAST (Basic Local Alignment Search Tool). To run a BLAST search on a nucleic acid sequence, one uses the BlastN tool. To do this in the NGBW, login, and click on the Tasks folder. When the Task Management pane opens, click the “Create a New Task” button. When the Task Creation pane opens, enter some text in the “Description” box, and click the “Set Description” button. INBRE Bioinformatics Workshop for Undergrads Exercise 5 – Aug 9, 2005 Now click the “Select Tool” button. Under the toolkit pane, find and click on the BlastN tool. It is under the “Nucleic Acid Tools” tab. This will return you to the Task Creation pane. The most important part of creating a BLAST job is to specify the Database you will be searching. To do this, click on the “Set Parameters” button, and when the Parameters pane opens, find the “nucleotide db” dropdown, and satisfy yourself that it is set to search “All GenBank”. Now click on the “Select Input Data” button. Find the DinoDNA data file, and select it by checking the box on the left of the sequence, and then clicking the “Select Data” button at the bottom of the page. This will return you again to the task creation pane. When this happens, click on the “Save and Run” button. This will deploy your job, and return you to the Task Management pane. On this pane you can watch your job progress (it will take a few minutes to complete this job). While you are waiting, you can begin creating the next job, or just click the “Refresh Tasks” button until you see the text on the right-most column change from “Check Status” to “Check Results”. 3) To decode the hidden message, repeat the process you followed in step 2, except this time, you should use a translated blast search. The translated blast search translates the probe sequence into 6 different protein sequences, one for each of the six reading frames; i.e. three reading frames on the explicit strand you have provided, and three reading frames in the opposite direction of its complementary sequence. These six protein sequences are then searched against a protein database. As you repeat the process in Step 2, select the tool BlastX, which translates the DNA sequence into its 6 reading frames, and open the parameters page, and select the SwissProt protein database as the target for the query. 4) Now return to the first task, the nucleotide-nucleotide blast search (BlastN) using the given DinoDNA sequence against the nr nucleotide database. Click the “View Results” button, and this will expose all the results produced by your search. Click on the link to blast2_1.png to view a graphical representation of the regions that had matches to particular parts of the sequence. After carefully examining this picture, click on Return to Task Window, and select the blast2.txt link.This will expose the list of sequences with strong similarity to the Dino sequences. Scroll through the output to identify the genes and species that are related to the Dino DNA sequence. Please consider the following questions: What are the top hits? Why did Mark Boguski use that organism to resemble Dinosaur DNA? 5) Similarly, return to the second task, the translated blast search (BlastX) using the given DinoDNA sequence against the Swissprot Database. Click the “View Results” button, and this will expose all the results produced by your search. Click on the link to blast2.txt, and this will expose the list of sequences with strong similarity to the Dino sequences. Are the top hits the same for BlastX and BlastN? What is the hidden message? Resources 1) DinoDNA sequence: http://people.ibest.uidaho.edu/~celesteb/Dino_DNA.txt 2) The Next Generation Biology Workbench http://www.ngbw.org

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1 - People