Download Assignment1 (50points)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Homologous recombination wikipedia , lookup

DNA nanotechnology wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Assignment1 (50points)
The goals of this exercise:
* To assess the significance of the similarity between sequences (alignment, P
and E values…)
* To get experience with some of the basic multiple sequence alignment procedures
* To see how different algorithms produce different alignments
* To try one example of alignment-based database searching
* To determine common motif elements within a given set of amino acid or
nucleotide sequences.
* To practice with BioEdit for editing sequence and alignments
Part 1: Find a pair of DNA sequences that show no significant similarity but are
homologous
1. Pick a protein-coding DNA sequence (save it as FASTA format)
2. Translate it (save it as FASTA format)
3. Search a database with the protein sequence
4. Pick a significant but distant hit
5. Get its original DNA sequence (save it as FASTA format)
6. Compare to initial DNA sequence (Alignments, P and E values…)
* Submit
1. The two DNA sequences, with any relevant information about them
2. Their translations
3. The top list of hits (~20) from the database search (not the alignments!)
4. The alignment of the two DNA sequences
5. The alignment of the two protein sequences
6. The assessment of similarity significance for the DNA and the protein
comparisons
Part 2: Find a pair of protein sequences (A and C) that show no significant
similarity but are homologous
1. Pick a protein sequence A (or a coding DNA sequence and translate it) use
something of your interest, or pick a random one at
http://www.expasy.ch/sprot/get-random-entry.html. It should not be too short nor
too long, ideally 150-350 aa
2. Search a database with protein sequence A
3. Pick significant but distant hit B
4. Search a database with protein sequence B
5. Pick significant but distant hit C (C should probably not be in A's search output,
but it might be there with a high E value)
6. Compare A, B and C in pairs (see lectures notes for the websites)
7. Show that there is no significant similarity between A and C
8. Show the significant similarities
-between A and B
-between B and C
9. Pay attention to the region of overlap between A-B and B-C. If they are not "the
same", repeat from step 4 but use (for searching) only the part of B that is similar
to A
1
* Submit 1. The three protein sequences with relevant information
2. The three pairwise alignments
3. The three significance estimates and conclusions
Part 3: Muti-sequence alignment.
1. Pick a protein sequence
* Use your favorite protein sequence, or pick any random sequence, or
use what you used in part1
2. Search SwissProt (http://ca.expasy.org/tools/blast/) or NCBI
(http://www.ncbi.nih.gov/BLAST/), using any pairwise-based database search
program
* Keep the output for later on!
3. Pick a few significant but not identical hits
* Pick at least 3
* They should be no more than 80% identical with the query sequence
* For best results, they should also be <80% identical among themselves.
Fill up the table below:
Hit1
Hit2
Hit3
Hit1
-
Hit2
%
-
Hit3
%
%
-
* The more distant the sequences are, the more interesting the results will be
4. Align the query sequence and the hits using ClustalW or ClustalX both require
FASTA-formatted input files.
* Save the output in both MSF and ALN formats
* Submit the MSF file
5. Align the same sequences using the BlockMaker server
(at http://blocks.fhcrc.org/blocks/blockmkr/make_blocks.html)
* Submit the blocks produced, in text format
* Answer the following questions:
Were all the regions aligned by the BlockMaker aligned similarly by
Clustal? Conversely, are there regions that were aligned well by Clustal that
BlockMaker didn't report?
If not, can you explain the differences? Which result looks more reliable?
If yes, which program do you prefer, and why?
2