Download 1 - People

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA polymerase wikipedia , lookup

Replisome wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA sequencing wikipedia , lookup

DNA profiling wikipedia , lookup

DNA nanotechnology wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
INBRE Bioinformatics Workshop for Undergrads
Exercise 5 – Aug 9, 2005
Dinosaurs, frogs & secret messages
Background
In 1990, Michael Crichton published the book Jurassic Park about the re-creation of dinosaurs using
DNA extracted from dinosaur blood preserved in the stomachs of insects, after they became encased in
tree sap, which later turned into the mineral amber. At one point in the book, Dr. Henry Wu is asked to
explain some of DNA techniques used in reconstructing the extinct dinosaur genomes. Dr. Wu describes
the use of restriction enzymes and how the fragmented pieces of dinosaur DNA can be spliced together
with these enzymes. He also alludes to the fact that they don't have the entire genome but they “fill in the
gaps” with modern day frog DNA. Later, he points to a computer screen and remarks, “Here you see the
actual structure of a small fragment of dinosaur DNA.”
Dr. Mark Boguski, at the NIH’s National Center for Biotechnology Information, NCBI, having read the
book Jurrasic Park, entered this sequence into a text editor and searched all of the known DNA sequences
at the time. This collection of sequences makes up a database referred
to as GenBank. The sequence was garbage, of course; there was no
DNA derived from dinosaurs.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
In his second book, The Lost World, Mr. Crichton used Mark as a
consultant. Mark chose a DNA sequence from a living organism that
is closely related to the dinosaurs. Mark also mixed in some frog
(Xenopus) DNA, just like Dr. Wu described, to fill in the holes in
their dino-genomes. However, Mark played a little trick on Mr.
Crichton by embedding a message in the protein translation of the
DNA sequence that he submitted for use in the book.
The Problem
This exercise will take you through the steps required to examine the sequences Mark used in “The Lost
World” to show how he imagined dinosaur DNA sequences might look, and then to decode the message
he embedded in the sequences. The exercise will use a toolkit called The Next Generation Biology
Workbench (www.ngbw.org). It is assumed here that you can find the site, create an account, and log in.
The NGBW is based on the use of folders for data and tasks. Once you login you will see those. The
NGBW also has flash help files to assist you in undertaking the analyses described below.
The Approach
1) First upload the dinosaur DNA sequences to the data area of your active NGBW folder. To do this,
open the Data folder, and when the Data Management pane appears, click the “Upload Data” button.
When the Data Upload pane appears, open a second browser window and go to the website listed
below. Once you find it, copy and paste the sequence into the appropriate box, and fill in the rest of
the form by providing a label (required), and by using the dropdown menus to specify that it is a
Nucleic Acid Sequence, in Fasta format. Now click the “Save” button to save it into your data area.
2) Now you will want to run a blast search of this Nucleic Acid sequence against all of the GenBank
DNA sequences. The purpose of this activity is to compare the DinoDNA sequence, to all of the DNA
sequences in GenBank (which is nearly all of the known sequences). The more similar any given
DNA sequence is to a known sequence, the more likely it is the two sequences are close relatives in
structure, function, and evolutionary history.
Comparisons between sequences are conducted using BLAST (Basic Local Alignment Search Tool).
To run a BLAST search on a nucleic acid sequence, one uses the BlastN tool. To do this in the
NGBW, login, and click on the Tasks folder. When the Task Management pane opens, click the
“Create a New Task” button. When the Task Creation pane opens, enter some text in the
“Description” box, and click the “Set Description” button.
INBRE Bioinformatics Workshop for Undergrads
Exercise 5 – Aug 9, 2005
Now click the “Select Tool” button. Under the toolkit pane, find and click on the BlastN tool. It is
under the “Nucleic Acid Tools” tab. This will return you to the Task Creation pane. The most
important part of creating a BLAST job is to specify the Database you will be searching. To do this,
click on the “Set Parameters” button, and when the Parameters pane opens, find the “nucleotide db”
dropdown, and satisfy yourself that it is set to search “All GenBank”.
Now click on the “Select Input Data” button. Find the DinoDNA data file, and select it by checking
the box on the left of the sequence, and then clicking the “Select Data” button at the bottom of the
page. This will return you again to the task creation pane.
When this happens, click on the “Save and Run” button. This will deploy your job, and return you to
the Task Management pane. On this pane you can watch your job progress (it will take a few minutes
to complete this job). While you are waiting, you can begin creating the next job, or just click the
“Refresh Tasks” button until you see the text on the right-most column change from “Check Status”
to “Check Results”.
3) To decode the hidden message, repeat the process you followed in step 2, except this time, you should
use a translated blast search. The translated blast search translates the probe sequence into 6 different
protein sequences, one for each of the six reading frames; i.e. three reading frames on the explicit
strand you have provided, and three reading frames in the opposite direction of its complementary
sequence. These six protein sequences are then searched against a protein database. As you repeat the
process in Step 2, select the tool BlastX, which translates the DNA sequence into its 6 reading frames,
and open the parameters page, and select the SwissProt protein database as the target for the query.
4) Now return to the first task, the nucleotide-nucleotide blast search (BlastN) using the given DinoDNA
sequence against the nr nucleotide database. Click the “View Results” button, and this will expose all
the results produced by your search. Click on the link to blast2_1.png to view a graphical
representation of the regions that had matches to particular parts of the sequence. After carefully
examining this picture, click on Return to Task Window, and select the blast2.txt link.This will expose
the list of sequences with strong similarity to the Dino sequences. Scroll through the output to identify
the genes and species that are related to the Dino DNA sequence.
Please consider the following questions:
What are the top hits?
Why did Mark Boguski use that organism to resemble Dinosaur DNA?
5) Similarly, return to the second task, the translated blast search (BlastX) using the given DinoDNA
sequence against the Swissprot Database. Click the “View Results” button, and this will expose all
the results produced by your search. Click on the link to blast2.txt, and this will expose the list of
sequences with strong similarity to the Dino sequences.
Are the top hits the same for BlastX and BlastN?
What is the hidden message?
Resources
1) DinoDNA sequence: http://people.ibest.uidaho.edu/~celesteb/Dino_DNA.txt
2) The Next Generation Biology Workbench http://www.ngbw.org