Download lab form

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics Lab
Name __________________________________
Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding
biological data and combines computer science, statistics, mathematics, and engineering to analyze and
interpret the data. Bioinformatics is both an umbrella term for the body of biological studies that use
computer programming as part of their methodology, as well as a reference to specific analysis
"pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of
bioinformatics include the identification of candidate genes and nucleotides. Often, such identification is
made with the aim of better understanding the genetic basis of disease, unique adaptations, desirable
properties, or differences between populations. Bioinformatics has become an important part of many
areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal
processing allow extraction of useful results from large amounts of raw data. In the field of genetics and
genomics, it aids in sequencing and annotating genomes and their observed mutations. It plays a role in
the text mining of biological literature and the development of biological and gene ontologies to
organize and query biological data. It also plays a role in the analysis of gene and protein expression and
regulation. Bioinformatics tools aid in the comparison of genetic and genomic data and more generally
in the understanding of evolutionary aspects of molecular biology. At a more integrative level, it helps
analyze and catalogue the biological pathways and networks that are an important part of systems
biology.
The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it
apart from other approaches, however, is its focus on developing and applying computationally
intensive techniques to achieve this goal. Examples include: pattern recognition, data mining, machine
learning algorithms, and visualization. Major research efforts in the field include sequence alignment,
gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein
structure prediction, prediction of gene expression and protein–protein interactions, genome-wide
association studies, the modeling of evolution and cell division/mitosis.
Below is an abbreviated list of different types of bioinformatics tools:
1. Public biological databases – Collection of biological data such as nucleic acid sequences, amino acid
sequences, published literature, biochemical pathways, etc., that can be searched by the public.
2. Sequence alignment tools – A tool that will attempt to identify as many matching residues in series
between two sequences (nucleic or amino). This is very useful to determine how closely related two
sequences might be.
3. Sequence searching – Tools used to compare a query sequence to millions of sequences in a database
in an effort to find highly similar sequences.
4. Gene prediction – Software to predict open reading frames, genes, exon splice sites, promoter
binding sites, etc. from long continuous strings of nucleotides
5. Multiple Sequence Alignment – Tool that will align several sequences at one time. Useful for
identifying conserved functional domains in a group of related sequences and extracting information
about a gene family.
6. Phylogenetic Analysis – Studying the evolutionary relatedness of a group of sequences or organisms.
7. Protein sequence Analysis – Calculate the isoelectric point, molecular weight, peptide mass
fingerprints. Predict secondary structure features and posttranslational modification sites.
8. Protein structure prediction – Predicting the 3D structure of the protein to give insights into how the
protein may function given the tight relationship between protein structure and function.
9. Whole genome analysis – Navigating through the genome and annotating the genome.
HASD AP Biology
1
The learning objectives for this lab are:








The student is able to evaluate data-based evidence that describes evolutionary changes in the
genetic makeup of a population over time (1A2 & SP 5.3).
The student is able to evaluate evidence provided by data from many scientific disciplines that
support biological evolution (1A4 & SP 5.3).
The student is able to construct and/or justify mathematical models, diagrams, or simulations
that represent processes of biological evolution (1A4 & SP 1.1, SP 1.2).
The student is able to create sequence alignments and phylogenetic trees that correctly
represent evolutionary history and speciation from a provided data set (1B2 & SP 1.1).
The student is able to construct scientific explanations that use the structures and mechanisms
of DNA and RNA to support the claim that DNA, and in some cases RNA, is the primary source of
heritable information (3A1 & SP 6.5).
The student is able to analyze biological data with sophisticated bioinformatics online tools (1B2
& SP 1.1).
The student is able to convert a data set from a table of numbers that reflect a change in the
genetic makeup of a population over time and to apply mathematical methods and conceptual
understandings to investigate the cause(s) and effect(s) of this change (1A2 & SP 5.3).
The student is able to evaluate evidence provided by data to qualitatively and quantitatively
investigate the role of natural selection in evolution (1B2 & SP 1.1).
Between 1990 and 2003, scientists working on an international research project known as the Human
Genome Project were able to identify and map the 20,000–25,000 genes that define a human being. The
project also successfully mapped the genomes of other species, including the fruit fly, mouse, and
Escherichia coli. The location and complete sequence of the genes in each of these species are available
for anyone in the world to access via the Internet. Why is this information important? Being able to
identify the precise location and sequence of human genes will allow us to better understand genetic
diseases. In addition, learning about the sequence of genes in other species helps us understand
evolutionary relationships among organisms. Many of our genes are identical or similar to those
found in other species. Suppose you identify a single gene that is responsible for a particular disease in
fruit flies. Is that same gene found in humans? Does it cause a similar disease? It would take nearly 10
years to read through the entire human genome to try to locate the same sequence of bases as that in
fruit flies. This definitely isn’t practical, so a sophisticated technological method is required.
Bioinformatics is a field that combines statistics, mathematical modeling, and computer science to
analyze biological data. Using bioinformatics methods, entire genomes can be quickly compared in order
to detect genetic similarities and differences. An extremely powerful bioinformatics tool is BLAST, which
stands for Basic Local Alignment Search Tool. Using BLAST, you can input a gene sequence of interest
and search entire genomic libraries for identical or similar sequences in a matter of seconds. In this
laboratory investigation, students will use BLAST to compare several genes, and then use the
information to construct a cladogram. A cladogram (also called a phylogenetic tree) is a visualization of
the evolutionary relatedness of species. Figure 1 is a simple cladogram.
HASD AP Biology
2
Lycopodium
Selaginella
Isoetes
Figure 1. Simple Cladogram Representing Different Plant Species
Note that the cladogram is treelike, with the endpoints of each branch representing a specific species.
The closer two species are located to each other, the more recently they share a common ancestor. For
example, Selaginella (spikemoss) and Isoetes (quillwort) share a more recent common ancestor than the
common ancestor that is shared by all three species of moss.
Figure 2 includes additional details, such as the evolution of particular physical structures called shared
derived characters. Note that the placement of the derived characters corresponds to when that
character evolved; every species above the character label possesses that structure. For example, tigers
and gorillas have hair, but lampreys, sharks, salamanders, and lizards do not have hair.
Figure 2. Cladogram of several animal species
The cladogram above can be used to answer several questions. Which organisms have lungs? What
three structures do all lizards possess? According to the cladogram, which structure — dry skin or hair —
evolved first? Historically, physical structures were used to create cladograms; however, modern day
cladistics relies more heavily on genetic evidence. Chimpanzees and humans share 95%+ of their DNA,
which would place them closely together on a cladogram.
HASD AP Biology
3
1. Humans and fruit flies share approximately 60% of their DNA, which would place them farther
apart on a cladogram. In the space below draw a cladogram that depicts the evolutionary
relationship among humans, chimpanzees, fruit flies, and mosses.
2. Use the following data to construct a cladogram of the major plant groups:
Characteristics of Major Plant Groups
Organisms
Vascular Tissue
Flowers
Seeds
Mosses
0
0
0
Pine trees
1
0
1
Flowering plants
1
1
1
Ferns
1
0
0
Total
3
1
2
HASD AP Biology
4
3. GAPDH (glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step
in glycolysis, an important reaction that produces molecules used in cellular respiration. The
following data table shows the percentage similarity of this gene and the protein it expresses in
humans versus other species. For example, according to the table, the GAPDH gene in
chimpanzees is 99.6% identical to the gene found in humans, while the protein is identical.
a. Why is the percentage similarity in the gene always lower than the percentage similarity in
the protein for each of the species? (Hint: Recall how a gene is expressed to produce a
protein.)
b. Draw a cladogram in the space below depicting the evolutionary relationships among all five
species (including humans) according to their percentage similarity in the GAPDH gene.
■
Percentage Similarity Between the GAPDH Gene and Protein in Humans and Other Species
Species
HASD AP Biology
Gene Percentage
Similarity
Protein Percentage
Similarity
Chimpanzee (Pan troglodytes)
99.6%
100%
Dog (Canis lupus familiaris)
91.3%
95.2%
Fruit fly (Drosophila melanogaster)
72.4%
76.7%
Roundworm (Caenorhabditis elegans)
68.2%
74.3%
5
4. The following table contains a multiple alignment of partial sequences from a family of proteins
called ETS domains. Each line corresponds to the amino acid sequence from one protein,
specified as a sequence of letters each specifying one amino acid. Looking down any column
shows the amino acids that appear at that position in each of the proteins in the family. In this
way patterns of preference are made visible.
a. Using colored highlighters, mark, in each sequence, the amino acid residues in different classes
in different colors Based on the chart below). Color code the chart as well.
Small residues
Medium-sized nonpolar residues
Large nonpolar residues
Polar residues
Positively charged residues
Negatively charged residues
b.
c.
GAST
CPVIL
FYMW
HNQ
KR
DE
For each position containing the same amino acid in every sequence, write the letter
symbolizing the common residue in upper case below the column. For each position containing
the same amino acid in all but one of the sequences, write the letter symbolizing the preferred
residue in lower case below the column.
What patterns of periodicity of conserved residues suggest themselves?
HASD AP Biology
6
5. A typical British breakfast consists of: eggs (from chickens) fried in lard, bacon, kippered
herrings, grilled cup mushrooms, fried potatoes, grilled tomatoes, baked beans, toast, and tea
with milk. Write the complete taxonomic classification of one animal and one plant from which
these ingredients are derived. Use NCBI Taxonomy as your source of information
http://www.ncbi.nlm.nih.gov/taxonomy
For questions 6-12 you will need to paste screenshots that are resized and cropped and/ or files of your
results into a lab report that is to be attached to this packet. Be sure to label, describe, and analyze the
results of each procedure in your lab report. This lab form is on my teacher web page, you may want to
open it on your computer to expedite access to websites referred to in the directions
6. Use UniProt to find the sequences of myosin from five diverse non-human mammalian species,
one non-mammal, and also Human myoglobin. Convert the sequences into FASTA format. Save
each of the seven FASTA sequences in the document. UniProt can be found at
http://www.uniprot.org/
7. Making use of Clustal Omega align the seven sequences of myoglobin. Copy a screenshot of the
alignment into your Word document. Resize and crop the screenshot as appropriate. Use keys
or captions for the alignment as appropriate. Clustal Omega can be found at
http://www.ebi.ac.uk/Tools/msa/clustalo/
8. Use Clustal Omega to produce a phylogenetic tree using your alignment data. Copy a screenshot
of the tree into your Word document. Resize and crop the screenshot as appropriate. Use keys
or captions for the tree as appropriate.
Add a conclusion to your document in which you make a statement as to the evolutionary
relationships between the animals in your phylogenetic tree. Paste your word document into
your finished lab report.
HASD AP Biology
7
9. Primates are mammals, a class we share with marsupials and monotremes. Extant marsupials
live primarily in Australia, except for the opossum, found in North and South America. Extant
monotremes are limited to two animals from Australia and New Guinea: the platypus and
echidna. A file of mannose-6-phosphate proteins from various placental and marsupial animals
can be found on my web page.
Use ClustalW in MEGA6 to align the mannose 6-phosphate/insulin-like growth factor receptor
(file mannose-6-phosphate.txt on my web page).
MEGA6 has been downloaded onto your computer. This is a free download if you want use this
program on your home computer - http://www.megasoftware.net/
Paste a screenshot of the alignment into your lab report. You will need to take multiple
screenshots to cover the entire alignment, crop and resize as appropriate. Use the color printer
in the HS library for this page of your lab report only.
10. Construct a neighbor-joining evolutionary tree of the mannose 6-phosphate/insulin-like growth
factor receptor for the animals in the previous question in MEGA6. Set the bootstrap value at
500. Also click on the caption tab, this will provide a description of the tree type and results.
Paste a screenshot of the tree and caption into your lab report
How well do you think this tree reflects the true evolutionary history of placental and
marsupials ? Explain.
Paste your results into your lab report. Resize and crop the screenshot as appropriate. Use keys
or captions for the alignment as appropriate
11. ExPASy Translate is a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a
protein sequence.
a. Use ExPASy to translate the following mouse actin gene (you can copy this from the lab form
on my webpage):
>gi|5598460|gb|AI892558.1|AI892558 mr75h02.y1 mRNA, 3' end mRNA sequence
ACATTGACTGATGAGAGATGGTGAGGGAGCTTACAGGATGACAATAATCACAGTACAGGCATCCTGTATA
TAAGGTAGTCTACGAGAGAGACATCTCAAATGCACTTGCGGTGGACAATGTATGGGCCTGCCTAATCATA
CTCTAGCTTGCTGATCCACATTTGCTGGAAGGTGCACAGAGAGGCCATGATGGAGCCGACAATCCATACA
TAGTAATTACGCTCACGCTGAGCAATAATCTGGATCTTTATGGTGCTGTGAGCCAGTGCACCGATTACCT
TTCGAATACGATCGGCCATACCAGGGTACATGGTTGTGTCTACAGATACGACATTGTAGACATACAGGTA
TTTGCGGATATCTATGAAACACTTCATGATGCTGCCGGAAGTTGTTTAATGAA
ExPASy can be found at: http://web.expasy.org/translate/
HASD AP Biology
8
b. Check the accuracy of your translation by doing an NCBI blastp search of your translation
and reporting your top BLAST return in your lab report. Describe the returns of the BLAST
search. http://blast.ncbi.nlm.nih.gov/Blast.cgi
c. Paste the translation into your lab report. Resize and crop the screenshot as appropriate.
Use keys or captions for the alignment as appropriate
12. Use the NCBI BLAST website to do a blastp of the following protein sequence with the organism
limited to green algae.
>sp|P00299|PLAS1_POPNI Plastocyanin A, chloroplastic OS=Populus nigra
GN=PETE PE=1 SV=2
MATVTSAAVSIPSFTGLKAGSASNAKVSASAKVSASPLPRLSIKASMKDVGAAVVATAAS
AMIASNAMAIDVLLGADDGSLAFVPSEFSISPGEKIVFKNNAGFPHNIVFDEDSIPSGVD
ASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQGAGMVGKVTVN
Include the following in your lab report:
a.
Paste a screen shot of the five top scoring hits to the BLAST search in your lab report.
Resize and crop the screenshot as appropriate. Use keys or captions for the alignment as
appropriate.
b.
What is the common name of the organism that the starting protein came from?
c.
What type of organism is the top BLAST return?
d.
What is the function of this protein? Does it make sense that it would be found both in
the organism you started with and the organism returned in the BLAST search? Explain.
HASD AP Biology
9
References:
Bakermans, Corien. Pennsylvania State University, Altoona College
College Board. AB Biology. Comparing DNA Sequences to Understand Evolutionary
Relationships with BLAST http://media.collegeboard.com/digitalServices/pdf/ap/biomanual/Bio_Lab3-ComparingDNA.pdf [online]
European Molecular Biology Laboratory, Heidelberg. http://www.embl.de/ [online]
Lehninger A, Nelson D, Cox M (1993) Principles of Biochemistry. Worth Publishers. New York, NY
Lesk A (2014) Introduction to Bioinformatics. University of Oxford Press. Oxford, UK
National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD,
http://www.ncbi.nlm.nih.gov/ [online]
Protein Data Bank. http://www.rcsb.org/pdb/home/home.do [online]
Reece J, et al., (2005) Campbell Biology. Benjamin Cummings. Boston
Swiss Institute of Bioinformatics. http://www.isb-sib.ch/ [online]
Tamura K, Stecher G, Peterson D, Filipski A, and Kumar S (2013) MEGA6: Molecular Evolutionary
Genetics Analysis version 6.0. Molecular Biology and Evolution: 30 2725-2729.
UniProt. http://www.uniprot.org/ [online]
HASD AP Biology
10