Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Lab Name __________________________________ Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data and combines computer science, statistics, mathematics, and engineering to analyze and interpret the data. Bioinformatics is both an umbrella term for the body of biological studies that use computer programming as part of their methodology, as well as a reference to specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of bioinformatics include the identification of candidate genes and nucleotides. Often, such identification is made with the aim of better understanding the genetic basis of disease, unique adaptations, desirable properties, or differences between populations. Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in sequencing and annotating genomes and their observed mutations. It plays a role in the text mining of biological literature and the development of biological and gene ontologies to organize and query biological data. It also plays a role in the analysis of gene and protein expression and regulation. Bioinformatics tools aid in the comparison of genetic and genomic data and more generally in the understanding of evolutionary aspects of molecular biology. At a more integrative level, it helps analyze and catalogue the biological pathways and networks that are an important part of systems biology. The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques to achieve this goal. Examples include: pattern recognition, data mining, machine learning algorithms, and visualization. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein–protein interactions, genome-wide association studies, the modeling of evolution and cell division/mitosis. Below is an abbreviated list of different types of bioinformatics tools: 1. Public biological databases – Collection of biological data such as nucleic acid sequences, amino acid sequences, published literature, biochemical pathways, etc., that can be searched by the public. 2. Sequence alignment tools – A tool that will attempt to identify as many matching residues in series between two sequences (nucleic or amino). This is very useful to determine how closely related two sequences might be. 3. Sequence searching – Tools used to compare a query sequence to millions of sequences in a database in an effort to find highly similar sequences. 4. Gene prediction – Software to predict open reading frames, genes, exon splice sites, promoter binding sites, etc. from long continuous strings of nucleotides 5. Multiple Sequence Alignment – Tool that will align several sequences at one time. Useful for identifying conserved functional domains in a group of related sequences and extracting information about a gene family. 6. Phylogenetic Analysis – Studying the evolutionary relatedness of a group of sequences or organisms. 7. Protein sequence Analysis – Calculate the isoelectric point, molecular weight, peptide mass fingerprints. Predict secondary structure features and posttranslational modification sites. 8. Protein structure prediction – Predicting the 3D structure of the protein to give insights into how the protein may function given the tight relationship between protein structure and function. 9. Whole genome analysis – Navigating through the genome and annotating the genome. HASD AP Biology 1 The learning objectives for this lab are: The student is able to evaluate data-based evidence that describes evolutionary changes in the genetic makeup of a population over time (1A2 & SP 5.3). The student is able to evaluate evidence provided by data from many scientific disciplines that support biological evolution (1A4 & SP 5.3). The student is able to construct and/or justify mathematical models, diagrams, or simulations that represent processes of biological evolution (1A4 & SP 1.1, SP 1.2). The student is able to create sequence alignments and phylogenetic trees that correctly represent evolutionary history and speciation from a provided data set (1B2 & SP 1.1). The student is able to construct scientific explanations that use the structures and mechanisms of DNA and RNA to support the claim that DNA, and in some cases RNA, is the primary source of heritable information (3A1 & SP 6.5). The student is able to analyze biological data with sophisticated bioinformatics online tools (1B2 & SP 1.1). The student is able to convert a data set from a table of numbers that reflect a change in the genetic makeup of a population over time and to apply mathematical methods and conceptual understandings to investigate the cause(s) and effect(s) of this change (1A2 & SP 5.3). The student is able to evaluate evidence provided by data to qualitatively and quantitatively investigate the role of natural selection in evolution (1B2 & SP 1.1). Between 1990 and 2003, scientists working on an international research project known as the Human Genome Project were able to identify and map the 20,000–25,000 genes that define a human being. The project also successfully mapped the genomes of other species, including the fruit fly, mouse, and Escherichia coli. The location and complete sequence of the genes in each of these species are available for anyone in the world to access via the Internet. Why is this information important? Being able to identify the precise location and sequence of human genes will allow us to better understand genetic diseases. In addition, learning about the sequence of genes in other species helps us understand evolutionary relationships among organisms. Many of our genes are identical or similar to those found in other species. Suppose you identify a single gene that is responsible for a particular disease in fruit flies. Is that same gene found in humans? Does it cause a similar disease? It would take nearly 10 years to read through the entire human genome to try to locate the same sequence of bases as that in fruit flies. This definitely isn’t practical, so a sophisticated technological method is required. Bioinformatics is a field that combines statistics, mathematical modeling, and computer science to analyze biological data. Using bioinformatics methods, entire genomes can be quickly compared in order to detect genetic similarities and differences. An extremely powerful bioinformatics tool is BLAST, which stands for Basic Local Alignment Search Tool. Using BLAST, you can input a gene sequence of interest and search entire genomic libraries for identical or similar sequences in a matter of seconds. In this laboratory investigation, students will use BLAST to compare several genes, and then use the information to construct a cladogram. A cladogram (also called a phylogenetic tree) is a visualization of the evolutionary relatedness of species. Figure 1 is a simple cladogram. HASD AP Biology 2 Lycopodium Selaginella Isoetes Figure 1. Simple Cladogram Representing Different Plant Species Note that the cladogram is treelike, with the endpoints of each branch representing a specific species. The closer two species are located to each other, the more recently they share a common ancestor. For example, Selaginella (spikemoss) and Isoetes (quillwort) share a more recent common ancestor than the common ancestor that is shared by all three species of moss. Figure 2 includes additional details, such as the evolution of particular physical structures called shared derived characters. Note that the placement of the derived characters corresponds to when that character evolved; every species above the character label possesses that structure. For example, tigers and gorillas have hair, but lampreys, sharks, salamanders, and lizards do not have hair. Figure 2. Cladogram of several animal species The cladogram above can be used to answer several questions. Which organisms have lungs? What three structures do all lizards possess? According to the cladogram, which structure — dry skin or hair — evolved first? Historically, physical structures were used to create cladograms; however, modern day cladistics relies more heavily on genetic evidence. Chimpanzees and humans share 95%+ of their DNA, which would place them closely together on a cladogram. HASD AP Biology 3 1. Humans and fruit flies share approximately 60% of their DNA, which would place them farther apart on a cladogram. In the space below draw a cladogram that depicts the evolutionary relationship among humans, chimpanzees, fruit flies, and mosses. 2. Use the following data to construct a cladogram of the major plant groups: Characteristics of Major Plant Groups Organisms Vascular Tissue Flowers Seeds Mosses 0 0 0 Pine trees 1 0 1 Flowering plants 1 1 1 Ferns 1 0 0 Total 3 1 2 HASD AP Biology 4 3. GAPDH (glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step in glycolysis, an important reaction that produces molecules used in cellular respiration. The following data table shows the percentage similarity of this gene and the protein it expresses in humans versus other species. For example, according to the table, the GAPDH gene in chimpanzees is 99.6% identical to the gene found in humans, while the protein is identical. a. Why is the percentage similarity in the gene always lower than the percentage similarity in the protein for each of the species? (Hint: Recall how a gene is expressed to produce a protein.) b. Draw a cladogram in the space below depicting the evolutionary relationships among all five species (including humans) according to their percentage similarity in the GAPDH gene. ■ Percentage Similarity Between the GAPDH Gene and Protein in Humans and Other Species Species HASD AP Biology Gene Percentage Similarity Protein Percentage Similarity Chimpanzee (Pan troglodytes) 99.6% 100% Dog (Canis lupus familiaris) 91.3% 95.2% Fruit fly (Drosophila melanogaster) 72.4% 76.7% Roundworm (Caenorhabditis elegans) 68.2% 74.3% 5 4. The following table contains a multiple alignment of partial sequences from a family of proteins called ETS domains. Each line corresponds to the amino acid sequence from one protein, specified as a sequence of letters each specifying one amino acid. Looking down any column shows the amino acids that appear at that position in each of the proteins in the family. In this way patterns of preference are made visible. a. Using colored highlighters, mark, in each sequence, the amino acid residues in different classes in different colors Based on the chart below). Color code the chart as well. Small residues Medium-sized nonpolar residues Large nonpolar residues Polar residues Positively charged residues Negatively charged residues b. c. GAST CPVIL FYMW HNQ KR DE For each position containing the same amino acid in every sequence, write the letter symbolizing the common residue in upper case below the column. For each position containing the same amino acid in all but one of the sequences, write the letter symbolizing the preferred residue in lower case below the column. What patterns of periodicity of conserved residues suggest themselves? HASD AP Biology 6 5. A typical British breakfast consists of: eggs (from chickens) fried in lard, bacon, kippered herrings, grilled cup mushrooms, fried potatoes, grilled tomatoes, baked beans, toast, and tea with milk. Write the complete taxonomic classification of one animal and one plant from which these ingredients are derived. Use NCBI Taxonomy as your source of information http://www.ncbi.nlm.nih.gov/taxonomy For questions 6-12 you will need to paste screenshots that are resized and cropped and/ or files of your results into a lab report that is to be attached to this packet. Be sure to label, describe, and analyze the results of each procedure in your lab report. This lab form is on my teacher web page, you may want to open it on your computer to expedite access to websites referred to in the directions 6. Use UniProt to find the sequences of myosin from five diverse non-human mammalian species, one non-mammal, and also Human myoglobin. Convert the sequences into FASTA format. Save each of the seven FASTA sequences in the document. UniProt can be found at http://www.uniprot.org/ 7. Making use of Clustal Omega align the seven sequences of myoglobin. Copy a screenshot of the alignment into your Word document. Resize and crop the screenshot as appropriate. Use keys or captions for the alignment as appropriate. Clustal Omega can be found at http://www.ebi.ac.uk/Tools/msa/clustalo/ 8. Use Clustal Omega to produce a phylogenetic tree using your alignment data. Copy a screenshot of the tree into your Word document. Resize and crop the screenshot as appropriate. Use keys or captions for the tree as appropriate. Add a conclusion to your document in which you make a statement as to the evolutionary relationships between the animals in your phylogenetic tree. Paste your word document into your finished lab report. HASD AP Biology 7 9. Primates are mammals, a class we share with marsupials and monotremes. Extant marsupials live primarily in Australia, except for the opossum, found in North and South America. Extant monotremes are limited to two animals from Australia and New Guinea: the platypus and echidna. A file of mannose-6-phosphate proteins from various placental and marsupial animals can be found on my web page. Use ClustalW in MEGA6 to align the mannose 6-phosphate/insulin-like growth factor receptor (file mannose-6-phosphate.txt on my web page). MEGA6 has been downloaded onto your computer. This is a free download if you want use this program on your home computer - http://www.megasoftware.net/ Paste a screenshot of the alignment into your lab report. You will need to take multiple screenshots to cover the entire alignment, crop and resize as appropriate. Use the color printer in the HS library for this page of your lab report only. 10. Construct a neighbor-joining evolutionary tree of the mannose 6-phosphate/insulin-like growth factor receptor for the animals in the previous question in MEGA6. Set the bootstrap value at 500. Also click on the caption tab, this will provide a description of the tree type and results. Paste a screenshot of the tree and caption into your lab report How well do you think this tree reflects the true evolutionary history of placental and marsupials ? Explain. Paste your results into your lab report. Resize and crop the screenshot as appropriate. Use keys or captions for the alignment as appropriate 11. ExPASy Translate is a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence. a. Use ExPASy to translate the following mouse actin gene (you can copy this from the lab form on my webpage): >gi|5598460|gb|AI892558.1|AI892558 mr75h02.y1 mRNA, 3' end mRNA sequence ACATTGACTGATGAGAGATGGTGAGGGAGCTTACAGGATGACAATAATCACAGTACAGGCATCCTGTATA TAAGGTAGTCTACGAGAGAGACATCTCAAATGCACTTGCGGTGGACAATGTATGGGCCTGCCTAATCATA CTCTAGCTTGCTGATCCACATTTGCTGGAAGGTGCACAGAGAGGCCATGATGGAGCCGACAATCCATACA TAGTAATTACGCTCACGCTGAGCAATAATCTGGATCTTTATGGTGCTGTGAGCCAGTGCACCGATTACCT TTCGAATACGATCGGCCATACCAGGGTACATGGTTGTGTCTACAGATACGACATTGTAGACATACAGGTA TTTGCGGATATCTATGAAACACTTCATGATGCTGCCGGAAGTTGTTTAATGAA ExPASy can be found at: http://web.expasy.org/translate/ HASD AP Biology 8 b. Check the accuracy of your translation by doing an NCBI blastp search of your translation and reporting your top BLAST return in your lab report. Describe the returns of the BLAST search. http://blast.ncbi.nlm.nih.gov/Blast.cgi c. Paste the translation into your lab report. Resize and crop the screenshot as appropriate. Use keys or captions for the alignment as appropriate 12. Use the NCBI BLAST website to do a blastp of the following protein sequence with the organism limited to green algae. >sp|P00299|PLAS1_POPNI Plastocyanin A, chloroplastic OS=Populus nigra GN=PETE PE=1 SV=2 MATVTSAAVSIPSFTGLKAGSASNAKVSASAKVSASPLPRLSIKASMKDVGAAVVATAAS AMIASNAMAIDVLLGADDGSLAFVPSEFSISPGEKIVFKNNAGFPHNIVFDEDSIPSGVD ASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQGAGMVGKVTVN Include the following in your lab report: a. Paste a screen shot of the five top scoring hits to the BLAST search in your lab report. Resize and crop the screenshot as appropriate. Use keys or captions for the alignment as appropriate. b. What is the common name of the organism that the starting protein came from? c. What type of organism is the top BLAST return? d. What is the function of this protein? Does it make sense that it would be found both in the organism you started with and the organism returned in the BLAST search? Explain. HASD AP Biology 9 References: Bakermans, Corien. Pennsylvania State University, Altoona College College Board. AB Biology. Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST http://media.collegeboard.com/digitalServices/pdf/ap/biomanual/Bio_Lab3-ComparingDNA.pdf [online] European Molecular Biology Laboratory, Heidelberg. http://www.embl.de/ [online] Lehninger A, Nelson D, Cox M (1993) Principles of Biochemistry. Worth Publishers. New York, NY Lesk A (2014) Introduction to Bioinformatics. University of Oxford Press. Oxford, UK National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD, http://www.ncbi.nlm.nih.gov/ [online] Protein Data Bank. http://www.rcsb.org/pdb/home/home.do [online] Reece J, et al., (2005) Campbell Biology. Benjamin Cummings. Boston Swiss Institute of Bioinformatics. http://www.isb-sib.ch/ [online] Tamura K, Stecher G, Peterson D, Filipski A, and Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Molecular Biology and Evolution: 30 2725-2729. UniProt. http://www.uniprot.org/ [online] HASD AP Biology 10