Download Exercise 1: BLAST

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein adsorption wikipedia , lookup

Endomembrane system wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Exercise 2
Instructions:
Dead Line: 17/3/08 14:30.
- Please attach a minimal version of your outputs in order to support your
answers.
- Submission according to published pairs.
- Please submit to the homework assignment box 234525 (Taub, first floor).
- Don't forget to write your names and ID numbers.
1.
The Following fragment belongs to a protein from the Cadherin family, which play
important roles in cell to cell adhesion within tissues. In mammalians, the Cadherins are
divided into three main sub groups with similar functions.
>query
SASVPENAPVGTEVLTVTATDADLGPNGRIFYSILGGG
1- Blast the sequence above in order to identify as many members of the Cadherin
family as possible in the Rat (Rattus Norvegicus) Proteome. Which Blast
program is optimal for this run? Explain.
Cadherin proteins are usually anchored to the cellular membrane and are exposed
towards the outer of the cell. Therein they can "stick" to Cadherins from other cells in
order to mediate cell to cell adhesion. The communication between Cadherins is
mediated by various small protein motifs that are 12 amino acids at most.
2- Using the sequences you identified in question 1, use the appropriate tool in order
to find at least three conserved motifs from the Cadherin family.
3- Is it possible to accurately identify the same motifs by simply applying a multiple
sequence alignment? Explain your answer using an MSA tool you learned in
class.
4- Does the cladogram (evolutionary tree) you get for the Cadherin family represent
the similarity distances between sequences?. Do you think this cladogram is
reliable? Explain.
5- Propose an efficient way to classify the Chaderine family into evolutionarily or
functionally related groups (no need to make the tree, just the idea).
Helpful Instructions:
 Limit your BALST searches to swissprot database.
 When running program which return results by e-mail, preferably use an e-mail
address other than your tx or t2, since in many cases the program complains about
the firewall.

In order to turn the BLAST results to FASTA format do as follows:
 Check the boxes beside the desired sequences (in the alignments section).
 Go to the bottom of the file and push “get selected sequences”
 Choose from the list “display” the option FASTA.
 If you have a large number of sequences make sure that you can see them allat-once by choosing a suitable number from the “show” list.
 Choose from the list “send to” the option text, then copy-paste your results
into a fresh notepad file and you are done!
2.
You are given with the following 5 aligned sequences, taken from different organisms:
> seq1
MVWLMEALKTKENETTKEKLLTKKVEKSEKKEENVREEEIVCPICGSKEVVKDY
ERAEIVCAKCGCVIKE
> seq2
MTWLMEALKTKENETTKEKKLTKKKEKSETTLENVREPIIVCPICGSKEVVKDYE
RAEIVCAKCGCVIKE
> seq3
MVWLMEALKTKENETTKEKKLTTKVEKSEKKEENVREEEIVCPICGSKEVVKDY
ERAEIVCAKCGCVIKE
> seq4
MTKEMEALKTKEQKITKEKKLTTKVEKSEKKEENVREEEIVCPICGSKEVVKDY
VTREIVCAKCGCVIKE
> seq5
MTKEMNKKREKEQKITKEKKLTTKVEKSEKKEENVREEEIVCPICGSKEVVKDY
VTREIVCAKCGCVIKE
Assume the following (per amino acid) scoring system:
ab
0
S ( a, b)  
2 otherwise
A) Compute the Euclidean sequence distance between all pairs in the five given
organisms (producing a 5x5 matrix) using the above scoring system.
B) Construct a phylogenetic dendogram using the Neighbor Joining algorithm, as shown
in the tutorial, based on the distance matrix computed in the previous section:
B.1) Using the current distance matrix, compute the relative distance matrix.
B.2) Choose closest neighbors according to the relative distance matrix. In case several
choices yield the same value, choose randomly between the pairs.
B.3) Compute the distance Dx,i and Dx,j between the two selected neighbors from phase
[B.2] (i) and (j), and their common ancestor (x) .
B.4) Compute the distance from the new common ancestor (x) and the rest of the nodes.
B.5) Using the new distance matrix based on phase [B.4], repeat the process (starting
from [B.1]) until only two organism remain.
The Dendogram produced in the above process should include all the distances along the
edges, as computed at phase (B.3).
Please detail all of the computations along the process.