Download Exercise 2: 3D Modeling of a Complex Molecular Structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleic acid tertiary structure wikipedia , lookup

Genetic code wikipedia , lookup

Protein moonlighting wikipedia , lookup

Human genome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Metagenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Sequence alignment wikipedia , lookup

Transcript
Bioinformatics
Exploration
Technology, Engineering and Math-Science Academy for Advanced High School Students
Exercise 1: Molecular Phylogeny of Humans, Primates, and … the Yeti?
(Thanks to Dr. Brian Bettencourt, UML, for the following)
Our research team recently received a hair sample, believed to be of the legendary and
mysterious Yeti (“Bigfoot”, “Sasquatch”, etc.), from a group of Nepalese monks living high in
the Himilayas.* Although the hair was apparently old and not preserved in any fashion, we
successfully extracted mitochondrial DNA. To determine whether and how the purported
Bigfoot was related to modern humans
and other primates, we amplified the
“hypervariable region 1” of the
mitochondrial
D-loop
noncoding
sequence. This region has been heavily
used in studies of humanand primate
genetic diversity. Search GenBank**
for “D-loop Hypervariable Region 1,”
you will find almost 2000 entires.
The D-loop, and especially its
hypervariable regions 1 and 2, is noncoding, so it tends to vary quite a bit
between individuals, populations, and
species. D-loop sequences thus work
well to determine close relations (and
not well at all for deep divergence).
Plus, since the mitochondrial genome is
so small, it can often persist for a very
long time in particular tissues –
mitochondrial
genes
have
been
successfully amplified from a variety of
ancient sources, including Neanderthal remains and prehistoric modern humans.
The question to be addressed is where the putative Bigfoot sequence will be placed on a
phylogeny (i.e. “family” tree) of related sequences. A hardworking instructor has done much of
the legwork for this task by searching GenBank to find D-loop Hypervariable Region 1
(henceforth “HVR”) sequences from several primates, Neanderthals, ancient humans, and
modern humans and reassemble the sequences into a multifasta file. So, you’re ready to begin!
• See Instructions on Reverse •
* Fictional statement. Can you guess how the “Bigfoot” sequence was generated?
**GenBank link: http://www.ncbi.nlm.nih.gov/Genbank/
2
Exercise 1 Procedure
1. On your computer’s desktop is a file named: “HVR1.txt”. Double click on the icon to open the text
file and examine the sequences. Here’s how to interpret the names of the sequences in the HVR1.txt
file. If you visit the “Exercise 1” page on the TEAMS Wiki, you can see photos of each “critter.”
Non-Homo sapiens sapiens sequences are prefixed as follows:
a.
b.
c.
d.
e.
f.
pongo
pan
panv
bono
bigfoot
nea
Pongo pygmaeus (orangutan)
Pan troglodytes troglodytes (Common chimpanzee)
Pan troglodytes verus (West African chimpanzee)
Pan paniscus (bonobo or Pygmy Chimpanzee)
Bigfoot! (or Sasquatch or the Yeti)
Homo sapiens neanderthalensis (Neanderthal)
The rest of the DNA sequences relate to Homos sapiens sapiens of one type or another:
Ancient human sequences are named “ancientAUS” = an ancient Australian human. Modern
human sequences are named by country of origin. Here’s a bit of information about the Homo
Sapien DNA sequences in the HVR1.txt file:
g. luke
h. Vietnam
i.
j.
A sequence obtained from a body reported to be that of the Christian Saint
Luke! (2000 years old?) We’re not making this up! Look in GenBank!
add brief descriptions + age for the rest of these (can look up name in
Gen Bank – see link on wiki)
AncientAUS 60,000-year-old Mungo Man skeleton unearthed in New South Wales in 1974
Syria
add brief descriptions
2. Open CLUSTALX program on you desktop the open the HVR1 sequences (browse for HVR1.txt
file in your My Documents folder).
3. Under Alignment -> Output Format Options, change “Output Order” to “Input” then press the
CLOSE button.
4. Under Alignment, select “Do complete alignment”. This process will take a few moments.
5. Next, under Trees, select “Bootstrap N-J Tree”. Make sure 1000 trials will be run, then click on
Run/OK. Afer a few moments, this will produce a “bootstrapped NJ tree file,” suffixed .phb (we’ll
discuss bootstrapping and NJ in lecture).
6. Open the HVR1.phb bootstrapped tree using TreeView (doubleclick on the icon on desktop, then File
Open and Browse in My Documents).


Click on the buttons for the various tree structures (Radial, Slanted Cladogram, Phylogram, etc.)
What does each graph tell you about the relationships between the DNA sequences?
7. Open the HVR1.phb bootstrapped tree using NJPlot (doubleclick on the icon on desktop, then File
Open and browse in My Documents).


Toggle display of Bootstrap values. What do the number on the clade mean?
How well supported is each clade based on the bootstrap values?
8. Discussion Questions:
a. How are modern and ancient human sequences related to one another? (Hint- there’s a surprise)
b. What is the placement of Bigfoot relative to humans and other primates? Who are our most
recent ancestors?
c. How many diagnostic substitutions differentiate humans from Neanderthals? What about
human/Neanderthal shared polymorphisms?
d. Based on the level of variability in the dataset, do you think adding more sequences would
increase or decrease the likelihood of supporting a model whereby humans and Neanderthals
interbred?
e. Finally, can you briefly suggest a model of human evolution based on this dataset?
Bioinformatics
Exploration
3
Technology, Engineering and Math-Science Academy for Advanced High School Students
Exercise 2: 3D Modeling of a Complex Molecular Structure
In this lab, you’ll explore protein structures and how they can be represented graphically.
Proteins are not just linear polymers of amino acids; proteins have very specific 3-dimensional
structures. Protein structure is divided into four categories:
1. Primary structure = the actual sequence of amino acids in the polypeptide chain.
2. Secondary structure = the three-dimensional “folded” shape that the polypeptide
chain (backbone) assumes, most common being  (alpha) helices and  (beta)
pleated sheets.
3. Tertiary structure = Overall three-dimensional shape of a polypeptide made by
interactions of different secondary structures within the protein.
4. Quaternary structure = more than one polypeptide chain interacting to form a
multi-subunit structure (i.e., dimers, trimers, etc…).
a. “homo”–multimers contain >1 molecule of the same polypeptide
b. “hetero”–multimers contain two or more different polypeptide chains.
Since proteins are large, complex molecules, determining the actual 3-D structure of any given
protein can be a very arduous task that can require years of work. However, once the structure of
a particular protein is known, it still is difficult for a scientist to visualize these complex
molecules. With the advent of computer graphics technology, programs have been developed
which use the structural data obtained for a complex protein to create a 3-D image that can be
manipulated in silico. In this exercise, we will be using a very new program called Cn3D
available(free) at:
http://www.ncbi.nlm.nih.gov/Structure
THE OBJECTIVES:
1. Understand basic molecular structure.
2. Become familiar with computer visualization of molecular structures.
3. Learn how to derive information from molecular models.
Protein Structure Scavenger Hunt
You can use NCBI’s MMDB protein-structure database and accompanying Cn3D tool to
hunt down, explore, and illustrate several types of protein structures. To illustrate, follow along
with this first example – hunting for an alpha helix.
1. Go to MMDB at http://www.ncbi.nlm.nih.gov/Structure/
2. In the “Search Entrez Structure/MMDB box (empty white box near top), enter “protein alpha
helix” and click Go.
4
3. The last time we looked (10/22/07) there were 686 listings of different protein structures. You
can pick one of the structures to explore by clicking on its accession number (blue, underlined) –
for example, the top entry in the list, 2JUW – “Nmr Solution Structure of Homodimer Protein
So_2176 From Shewanella oneidensis”.
4. Clicking on an accession number will take you to a MMDB Structure Summary Page. It will tell
you some information about the protein structure, by whom it was submitted, and so forth.
5. Now click “View 3D Structure” or click directly on the image.
6. This should pop up the helper application Cn3D. The structure of the Glms Ribozyme (or
whichever one you choose) will be displayed in a black window, and the primary sequence(s) of
the polypeptide chains will be aligned in a white window. Note that some structures are of only
one sequence, some are of more than one (depending on whether a single protein or a complex
had its structure solved).
7. The default rendering style (how the graphics are displayed) should be set as “Worms”, and the
default coloring style should be set as “Secondary Structure”. You can check those (and/or try
other styles) by using the “Style” menu, “Edit Global Style” option in the top menu bar of the
structure window. This example will use the default settings.
8. In Worms view with Secondary Structure coloring, strands that are parts of Alpha helices are
green, Loops are blue, and Beta Sheets are gold. Arrows always point in the N  C terminal
direction.
9. The task at hand is to illustrate an alpha helix. A good way to do this is by exploiting the
sequence window (lower). You’ll notice that the amino acid residues are colored the same way
as the cartoon view: So, for example, residues that are part of alpha helices are green!
10. In the sequence window, use the mouse to draw a box around the first set of aligned green
residues. They will get highlighted as shown:
11. Next, pull the slider bar to the right to see the rest of the sequence.
12. Holding down the shift key, select the other two chunks of alpha-helical sequence:
13. Now, up in the graphic window, select “Show/Hide  Show Selected Domains”. The cartoon
should now change coloring: The residues wrapped around the green cylinders are now yellow!
In addition, all the non-selected features should disappear.
14. NOW your task is to repeat this process for each of the following structures!
 parallel beta sheet
 antiparallel beta sheet
 helix-loop-helix
 beta barrel
 leucine zipper – HIGHLIGHT THE LEUCINES