* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Analyzing DNA Sequence Similarity on the Computer
Deoxyribozyme wikipedia , lookup
Genomic library wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Pathogenomics wikipedia , lookup
Sequence alignment wikipedia , lookup
Microsatellite wikipedia , lookup
Non-coding DNA wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Point mutation wikipedia , lookup
Koinophilia wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Metagenomics wikipedia , lookup
DNA barcoding wikipedia , lookup
Microevolution wikipedia , lookup
Analyzing DNA Sequence Similarity on the Computer You are studying biodiversity in the Kalahari Desert of southern Africa by collecting and sequencing DNA samples from the environment. During your studies, you have recovered a fragment of genomic DNA, but you are unsure what organism it is from. First, you need to identify the species from which this DNA fragment is derived. Then, find all known genes that are related to that gene’s DNA sequence. You will then compare the sequences of your gene to those relatives for which the genomic DNA sequence is known. You will learn how to use this information to determine the evolutionary relationships between different species. Instructions: Part A: Identify the origin of this DNA fragment. 1. Open a web browser and direct your browser to (https://blast.ncbi.nlm.nih.gov/Blast.cgi) or search for ncbi BLAST. 2. Select nucleotide BLAST and copy (command+c) the following DNA sequence and paste it (command+v) into the dialogue box titled “enter query sequence”: >UnknownDNA GGCCGCATCTGGAGTGCAAATGCTGCCAGTGTCCCCGAACCAAA TACAAATTTAGCGAGCGGCATCGGGACGGGCAGGGTTGGAGGG CCGAGTGCTCGAGTCGGCCGGACAAGTCGGGATTCCTTAAACAG CTGCGGCACCAGCTCAGAACTATCGGCTGCATTAGACCCGGAAT ACCAAGCTCAAGCCCATGGCCATGGGCAATTGGGTGGACAGTCG CAACGACATGGTCATGGACATAGTAAGCCACATATTAAGAGGCCC ATGAATGCGTTCATGGTCTGGGCCAAGGATGAGAGGCGCAAAAT ATTGAAAGCCTGTCCGGACATGCACAATTCAAATATTTCAAAAATC TTGGGCGCTCGCTGGAAAGCGATGTCGAACGCTGACAAGCAGCC CTATTACGAGGAGCAATCGCGTCTATCCAAGCTGCACATGGAACA ACATCCCGACTATCGCTATCGTCCACGTCCAAAACGCACTTGCAT CGTCGACGGCAAAAAAATGCGCATATCTGAATATAAAGTTCTGAT GCGTAATCGACGTGCGGAAATGCGACAGCTTTGGTGTAGAGGTG GCGGACCTGGTGGTCCGACAGGTTCCAGCTCTAATGGACATAGC CACGATGCTGGCTCCGCCGTGCAAGCTGCAGCAGCAGCAGCCG CTGCCGCCTACCACCTGCAGGAGATGAGTCATGCAATTGTCGAC GATTGTGAAACGCCACCGCCGCCGGCTCAACTACTAGACTCTGG AGCCATCGCGAGCAGCAGTAGCAATTTTTATTATCCACCGGAAAG TTTATCGCCATCGGGTTTCTCCTCCGAGGACATGGAAATTTTATCA TTGCGTGATGACGACTAAGAAACAAGATTATGTCGTTTATATACGT ATACATATGTATAGAGACATACGCGGATTCAAATAGATTGTACTTA TAGCTTGTAAATACTAAAATCATTGTATAGGCCTTTAAGTTTTAGTT CAAGTGCCA 3. Set the Database option to “Others (nr etc.)” and the Program option to “Somewhat similar sequence (blastn)”, then click the BLAST button in the lower left hand corner. 4. What species does this DNA fragment come from? _______________________________ Part B: Search a genome database for all known closely related genes to your gene of interest. Your initial search with this DNA sequence has yielded a match to a gene for an uncharacterized protein with no immediate matches to a wellcharacterized reference sequence protein. Find out more about this gene by translating the DNA sequence to protein sequence. 1. Return to the BLAST Home Page and select “blastx.” Under the title “enter query sequence,” paste the same DNA sequence as in Part A.3. 2. In the “Choose Search Set” section, select from the pull-down menu the option “Non-redundant protein sequences (nr).” 3. Click the checkbox to Exclude “Models (XM/XP).” 4. Click the blue button that says “BLAST.” The database of all known sequences is now being searched for those that resemble your sequence. The page will automatically update until the search is done. This may take a minute or two. Be patient. 5. Scroll down past the box of red lines, and you will see information in a chart that looks like this. This chart is just an example – it is not the one you will actually see. 6. Now looking at your matches, these matches are already sorted by highest to lowest match scores. Using the right hand column labeled “Accession” find the highest scoring match that has an Accession code starting with the letters “NP_”, which is an indication that this protein sequence has been characterized and annotated. What species and gene are identified in the description of this match? _________________________________________________________ 7. Record the eight different species that contain the closest known matches to your sequence. For each species, write its Latin name, the E value that indicates how statistically good the match is, and the identity value that indicates the percentage of your sequence that matches the database sequence. ** Do NOT list the same species more than once in your chart. List 8 different species Species name in Latin E-value % Identity (Genus species) 1. 2. 3. 4. 5. 6. 7. 8. Part C: Creating a phylogenetic tree based on the genomic information for this gene of interest Now that you know this gene is one that is conserved across multiple species, you can use compare your DNA sequence to the sequences of other species in order to create a phylogenetic tree that shows just how closely related species are. While BLAST conducts pairwise comparisons, you now need to align and compare multiple sequences. You can do so using a program called Clustal. 1. Click on each of the 8 accession numbers associated with the species you have identified above in Part B.7. Clicking on an accession number will link you to a new page that gives you move information about that particular sequence. 2. On the left side just under the title heading, click on the link that says “FASTA”. This will show you the amino acid sequence. 3. Select and copy the amino acid sequence into a new plain text document. The first line should include a right pointing bracket with the name of that species, for example: “ >speciesX ZZZZZZZZZZZZZZZZZZZZZZ” 4. Do the same for each of the 8 species you identified above. 5. Once you have all 8 sequences in one text file, go to the Clustal webpage: http://www.ebi.ac.uk/Tools/msa/clustalo/ or search for Clustal Omega 6. In the dropdown menu, the option should be set to “Protein” and then paste all of the sequences from you text file into the input window labeled “sequences in any supported format” Finally, scroll down and click “Submit” 7. Using the alignment information, create a rough sketch of the phylogenetic tree, given which species are most similar based on this one gene. a. (An asterisk “*” indicates complete alignment across all samples, a colon “:” indicates conservation between amino acids with strongly similar properties, a period “.” indicates conservation between amino acids with weakly similar properties) b. It helps to look for shared patterns. For example: Figure 1: Shared sequences between species 2 and 3 (in green), species 5 and 6 (in purple), or species 7 and 8 (in blue) indicate that these pairs of species are likely to be very closely related. Shared residues highlighted in red indicate that Species 5-8 may have shared a common ancestor at one point, and that species 1-4 are more distantly related. 8. Compare your phylogenetic tree to the one that Clustal generated by clicking on the “Phylogenetic tree” tab just under the Results title. Questions 1. For a given gene, is the DNA sequence or protein sequence going to have a greater degree of conservation between species? 2. If mutation processes randomly affect the genome without any specific targeting, why are some regions of the genome more faithfully conserved than other regions? 3. Is a single gene region sufficient to determine speciation? Why or why not? 4. How does your phylogenetic tree match the table of e-values and % identity scores that you found in your BLAST search? Bonus: Why is the species you identified an unexpected result?