Download Sequence alignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Designer baby wikipedia , lookup

Gene desert wikipedia , lookup

Human genome wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Point mutation wikipedia , lookup

Metagenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Sequence alignment wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Transcript
2016-10-11
Sequence alignment
Exercise 1: Dot matrix
You will use three different tools to generate dot plot for a pair of sequences.
Find sequences for comparison
Sequence 1: Protein sequence for the gene you worked with during the database section.
Sequence 2: Find the same gene in another species and find its protein sequence.
Create dot matrix and observe the alignments
Use the following programs to generate dot plots and identify probable similarities, duplication,
deletion, and any other features. Also compare the plots generated by the three tools.
1. YASS genomic similarity search tool (http://bioinfo.lifl.fr/yass/yass.php)
a. From the table of alignment determine the E-value, bit score and entropy for the
longest alignment.
b. Are there any duplication or deletion (or insertion) in any of the two sequences?
2. Lalign/Palign (http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=lalign)
a. Can you find the insertions/deletions and the duplication in the dot plot?
3. multi-zPicture (http://zpicture.dcode.org/)
a. See the dot plot and compare it with the ones you generated using other tools. Do
you find the similarities between them?
b. View blast-type alignment from Output files and compare the alignment with the
dot plot.
Are the three dot plots similar? Why?
2016-10-11
Exercise 2: Dynamic Programming
1. Create a substitution matrix for the 4 nucleotides (A, C, G, and T) using the following
scores. (Purine = A, G & Pyrimidine = C, T)
a. Match
= +5
b. Mismatch
i. Purine to Purine (or Pyrimidine to Pyrimidine)
= +2
ii. Purine to Pyrimidine (or Pyrimidine to Purine)
= -2
iii. Gap penalty
= -3
A
C
G
T
-
A
C
G
T
-
Confirm with tutor that your matrix is correct before you proceed!
2. Perform pairwise and global alignment for the following two sequences using Dynamic
Programming and above criteria and a gap penalty -1.
a. GCTTCTC
b. GACTCAG
(Note: A gap is inserted in each of the sequences in dynamic programming.
Remember to use gap penalty.)
3. Use EMBOSS Needle (http://www.ebi.ac.uk/Tools/psa/emboss_needle/) and EMBOSS
Water (http://www.ebi.ac.uk/Tools/psa/emboss_water/) to align the given protein
sequences. Remember Needle is a global alignment tool and Water is local alignment
tool.
a. Observe the parameters used to align the two sequences. Did you use the same
parameters in these two tools?
b. Now compare the alignments obtained from the two tools. Did you obtain the
same results? Compare the lengths of the alignments, identity, similarities and
gaps in the alignments.
c. Which tool gives better score for the alignment between two sequences? Why?
(Note: If you find no difference between the two alignments, delete last 10 amino acids
in one sequence and perform the alignment again using both tools. If you still don’t find
any differences, ask me.)
2016-10-11
Exercise 3: Multiple Sequence Alignment (MSA)
You need more than two sequences to perform MSA. Now find the protein sequences for the
same gene in 5-7 different species. If possible, try to find at least an invertebrate, a bird, and a
fish.
Use Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) and
MUSCLE (http://www.ebi.ac.uk/Tools/msa/muscle/) to align the sequences
And compare the alignments.
1. Is there any difference in the final alignments obtained using these two methods?
2. Now use colors to show different groups of amino acids in different colors. The
documentation about the colors and consensus symbols is available in
http://www.ebi.ac.uk/Tools/msa/clustalw2/help/faq.html#23. Now, can you locate some
conserved regions in the alignment?
3. View the phylogenetic trees from both alignments and compare them. Can you see the
difference between two trees?
If you have time, find protein sequence for the same gene (which you have been
investigating) in a bacteria or yeast. Now perform the alignments with all previous sequences
and the new sequence.
Do you see any differences in the alignments?
2016-10-11
Exercise 4: Nomenclature and keyword search
This section includes some hands on exercise in NCBI GQuery and EMBL-EBI services. In this
exercise, you will use keyword searches to find information about the gene you have been
investigating.
Discuss in your report what keywords you used and why.
1. Find the gene name in http://www.genenames.org. Find the HGNC symbol, previous
gene names and synonyms used for this gene. Have a look at the links to cross-reference
databases.
2. Now think of appropriate keywords and use them to search in
a. GQuery (http://www.ncbi.nlm.nih.gov/gquery)
b. EMBL-EBI (http://www.ebi.ac.uk/services)
3. Observe the number of entries found in different databases.
a. What keywords did you use?
b. How many protein sequences did you find in each of them?
4. Click on the protein sequences. You can then filter the sequences using different criteria
on the left and/or right. Select human (or Homo sapiens) as filter.
a. Did you see any changes in the number of sequences?
b. Try to use other filters and reduce the number of sequences.
5. Find the number of reviewed human sequences that have 3-D structures.
6. Find the number of human sequences in UniProtKB/Swiss-Prot released after 2007/01/01
using GQuery.
7. Find the number of articles in pubmed relevant for this gene from GQuery.
NOTE: Use Boolean operators “AND” and “OR” to combine multiple keywords and “NOT” to
exclude certain keywords.
Email your reports to: [email protected]