Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Nucleic acid tertiary structure wikipedia , lookup
Genetic code wikipedia , lookup
Protein moonlighting wikipedia , lookup
Human genome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Bioinformatics Exploration Technology, Engineering and Math-Science Academy for Advanced High School Students Exercise 1: Molecular Phylogeny of Humans, Primates, and … the Yeti? (Thanks to Dr. Brian Bettencourt, UML, for the following) Our research team recently received a hair sample, believed to be of the legendary and mysterious Yeti (“Bigfoot”, “Sasquatch”, etc.), from a group of Nepalese monks living high in the Himilayas.* Although the hair was apparently old and not preserved in any fashion, we successfully extracted mitochondrial DNA. To determine whether and how the purported Bigfoot was related to modern humans and other primates, we amplified the “hypervariable region 1” of the mitochondrial D-loop noncoding sequence. This region has been heavily used in studies of humanand primate genetic diversity. Search GenBank** for “D-loop Hypervariable Region 1,” you will find almost 2000 entires. The D-loop, and especially its hypervariable regions 1 and 2, is noncoding, so it tends to vary quite a bit between individuals, populations, and species. D-loop sequences thus work well to determine close relations (and not well at all for deep divergence). Plus, since the mitochondrial genome is so small, it can often persist for a very long time in particular tissues – mitochondrial genes have been successfully amplified from a variety of ancient sources, including Neanderthal remains and prehistoric modern humans. The question to be addressed is where the putative Bigfoot sequence will be placed on a phylogeny (i.e. “family” tree) of related sequences. A hardworking instructor has done much of the legwork for this task by searching GenBank to find D-loop Hypervariable Region 1 (henceforth “HVR”) sequences from several primates, Neanderthals, ancient humans, and modern humans and reassemble the sequences into a multifasta file. So, you’re ready to begin! • See Instructions on Reverse • * Fictional statement. Can you guess how the “Bigfoot” sequence was generated? **GenBank link: http://www.ncbi.nlm.nih.gov/Genbank/ 2 Exercise 1 Procedure 1. On your computer’s desktop is a file named: “HVR1.txt”. Double click on the icon to open the text file and examine the sequences. Here’s how to interpret the names of the sequences in the HVR1.txt file. If you visit the “Exercise 1” page on the TEAMS Wiki, you can see photos of each “critter.” Non-Homo sapiens sapiens sequences are prefixed as follows: a. b. c. d. e. f. pongo pan panv bono bigfoot nea Pongo pygmaeus (orangutan) Pan troglodytes troglodytes (Common chimpanzee) Pan troglodytes verus (West African chimpanzee) Pan paniscus (bonobo or Pygmy Chimpanzee) Bigfoot! (or Sasquatch or the Yeti) Homo sapiens neanderthalensis (Neanderthal) The rest of the DNA sequences relate to Homos sapiens sapiens of one type or another: Ancient human sequences are named “ancientAUS” = an ancient Australian human. Modern human sequences are named by country of origin. Here’s a bit of information about the Homo Sapien DNA sequences in the HVR1.txt file: g. luke h. Vietnam i. j. A sequence obtained from a body reported to be that of the Christian Saint Luke! (2000 years old?) We’re not making this up! Look in GenBank! add brief descriptions + age for the rest of these (can look up name in Gen Bank – see link on wiki) AncientAUS 60,000-year-old Mungo Man skeleton unearthed in New South Wales in 1974 Syria add brief descriptions 2. Open CLUSTALX program on you desktop the open the HVR1 sequences (browse for HVR1.txt file in your My Documents folder). 3. Under Alignment -> Output Format Options, change “Output Order” to “Input” then press the CLOSE button. 4. Under Alignment, select “Do complete alignment”. This process will take a few moments. 5. Next, under Trees, select “Bootstrap N-J Tree”. Make sure 1000 trials will be run, then click on Run/OK. Afer a few moments, this will produce a “bootstrapped NJ tree file,” suffixed .phb (we’ll discuss bootstrapping and NJ in lecture). 6. Open the HVR1.phb bootstrapped tree using TreeView (doubleclick on the icon on desktop, then File Open and Browse in My Documents). Click on the buttons for the various tree structures (Radial, Slanted Cladogram, Phylogram, etc.) What does each graph tell you about the relationships between the DNA sequences? 7. Open the HVR1.phb bootstrapped tree using NJPlot (doubleclick on the icon on desktop, then File Open and browse in My Documents). Toggle display of Bootstrap values. What do the number on the clade mean? How well supported is each clade based on the bootstrap values? 8. Discussion Questions: a. How are modern and ancient human sequences related to one another? (Hint- there’s a surprise) b. What is the placement of Bigfoot relative to humans and other primates? Who are our most recent ancestors? c. How many diagnostic substitutions differentiate humans from Neanderthals? What about human/Neanderthal shared polymorphisms? d. Based on the level of variability in the dataset, do you think adding more sequences would increase or decrease the likelihood of supporting a model whereby humans and Neanderthals interbred? e. Finally, can you briefly suggest a model of human evolution based on this dataset? Bioinformatics Exploration 3 Technology, Engineering and Math-Science Academy for Advanced High School Students Exercise 2: 3D Modeling of a Complex Molecular Structure In this lab, you’ll explore protein structures and how they can be represented graphically. Proteins are not just linear polymers of amino acids; proteins have very specific 3-dimensional structures. Protein structure is divided into four categories: 1. Primary structure = the actual sequence of amino acids in the polypeptide chain. 2. Secondary structure = the three-dimensional “folded” shape that the polypeptide chain (backbone) assumes, most common being (alpha) helices and (beta) pleated sheets. 3. Tertiary structure = Overall three-dimensional shape of a polypeptide made by interactions of different secondary structures within the protein. 4. Quaternary structure = more than one polypeptide chain interacting to form a multi-subunit structure (i.e., dimers, trimers, etc…). a. “homo”–multimers contain >1 molecule of the same polypeptide b. “hetero”–multimers contain two or more different polypeptide chains. Since proteins are large, complex molecules, determining the actual 3-D structure of any given protein can be a very arduous task that can require years of work. However, once the structure of a particular protein is known, it still is difficult for a scientist to visualize these complex molecules. With the advent of computer graphics technology, programs have been developed which use the structural data obtained for a complex protein to create a 3-D image that can be manipulated in silico. In this exercise, we will be using a very new program called Cn3D available(free) at: http://www.ncbi.nlm.nih.gov/Structure THE OBJECTIVES: 1. Understand basic molecular structure. 2. Become familiar with computer visualization of molecular structures. 3. Learn how to derive information from molecular models. Protein Structure Scavenger Hunt You can use NCBI’s MMDB protein-structure database and accompanying Cn3D tool to hunt down, explore, and illustrate several types of protein structures. To illustrate, follow along with this first example – hunting for an alpha helix. 1. Go to MMDB at http://www.ncbi.nlm.nih.gov/Structure/ 2. In the “Search Entrez Structure/MMDB box (empty white box near top), enter “protein alpha helix” and click Go. 4 3. The last time we looked (10/22/07) there were 686 listings of different protein structures. You can pick one of the structures to explore by clicking on its accession number (blue, underlined) – for example, the top entry in the list, 2JUW – “Nmr Solution Structure of Homodimer Protein So_2176 From Shewanella oneidensis”. 4. Clicking on an accession number will take you to a MMDB Structure Summary Page. It will tell you some information about the protein structure, by whom it was submitted, and so forth. 5. Now click “View 3D Structure” or click directly on the image. 6. This should pop up the helper application Cn3D. The structure of the Glms Ribozyme (or whichever one you choose) will be displayed in a black window, and the primary sequence(s) of the polypeptide chains will be aligned in a white window. Note that some structures are of only one sequence, some are of more than one (depending on whether a single protein or a complex had its structure solved). 7. The default rendering style (how the graphics are displayed) should be set as “Worms”, and the default coloring style should be set as “Secondary Structure”. You can check those (and/or try other styles) by using the “Style” menu, “Edit Global Style” option in the top menu bar of the structure window. This example will use the default settings. 8. In Worms view with Secondary Structure coloring, strands that are parts of Alpha helices are green, Loops are blue, and Beta Sheets are gold. Arrows always point in the N C terminal direction. 9. The task at hand is to illustrate an alpha helix. A good way to do this is by exploiting the sequence window (lower). You’ll notice that the amino acid residues are colored the same way as the cartoon view: So, for example, residues that are part of alpha helices are green! 10. In the sequence window, use the mouse to draw a box around the first set of aligned green residues. They will get highlighted as shown: 11. Next, pull the slider bar to the right to see the rest of the sequence. 12. Holding down the shift key, select the other two chunks of alpha-helical sequence: 13. Now, up in the graphic window, select “Show/Hide Show Selected Domains”. The cartoon should now change coloring: The residues wrapped around the green cylinders are now yellow! In addition, all the non-selected features should disappear. 14. NOW your task is to repeat this process for each of the following structures! parallel beta sheet antiparallel beta sheet helix-loop-helix beta barrel leucine zipper – HIGHLIGHT THE LEUCINES