Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exercise 2 Instructions: Dead Line: 17/3/08 14:30. - Please attach a minimal version of your outputs in order to support your answers. - Submission according to published pairs. - Please submit to the homework assignment box 234525 (Taub, first floor). - Don't forget to write your names and ID numbers. 1. The Following fragment belongs to a protein from the Cadherin family, which play important roles in cell to cell adhesion within tissues. In mammalians, the Cadherins are divided into three main sub groups with similar functions. >query SASVPENAPVGTEVLTVTATDADLGPNGRIFYSILGGG 1- Blast the sequence above in order to identify as many members of the Cadherin family as possible in the Rat (Rattus Norvegicus) Proteome. Which Blast program is optimal for this run? Explain. Cadherin proteins are usually anchored to the cellular membrane and are exposed towards the outer of the cell. Therein they can "stick" to Cadherins from other cells in order to mediate cell to cell adhesion. The communication between Cadherins is mediated by various small protein motifs that are 12 amino acids at most. 2- Using the sequences you identified in question 1, use the appropriate tool in order to find at least three conserved motifs from the Cadherin family. 3- Is it possible to accurately identify the same motifs by simply applying a multiple sequence alignment? Explain your answer using an MSA tool you learned in class. 4- Does the cladogram (evolutionary tree) you get for the Cadherin family represent the similarity distances between sequences?. Do you think this cladogram is reliable? Explain. 5- Propose an efficient way to classify the Chaderine family into evolutionarily or functionally related groups (no need to make the tree, just the idea). Helpful Instructions: Limit your BALST searches to swissprot database. When running program which return results by e-mail, preferably use an e-mail address other than your tx or t2, since in many cases the program complains about the firewall. In order to turn the BLAST results to FASTA format do as follows: Check the boxes beside the desired sequences (in the alignments section). Go to the bottom of the file and push “get selected sequences” Choose from the list “display” the option FASTA. If you have a large number of sequences make sure that you can see them allat-once by choosing a suitable number from the “show” list. Choose from the list “send to” the option text, then copy-paste your results into a fresh notepad file and you are done! 2. You are given with the following 5 aligned sequences, taken from different organisms: > seq1 MVWLMEALKTKENETTKEKLLTKKVEKSEKKEENVREEEIVCPICGSKEVVKDY ERAEIVCAKCGCVIKE > seq2 MTWLMEALKTKENETTKEKKLTKKKEKSETTLENVREPIIVCPICGSKEVVKDYE RAEIVCAKCGCVIKE > seq3 MVWLMEALKTKENETTKEKKLTTKVEKSEKKEENVREEEIVCPICGSKEVVKDY ERAEIVCAKCGCVIKE > seq4 MTKEMEALKTKEQKITKEKKLTTKVEKSEKKEENVREEEIVCPICGSKEVVKDY VTREIVCAKCGCVIKE > seq5 MTKEMNKKREKEQKITKEKKLTTKVEKSEKKEENVREEEIVCPICGSKEVVKDY VTREIVCAKCGCVIKE Assume the following (per amino acid) scoring system: ab 0 S ( a, b) 2 otherwise A) Compute the Euclidean sequence distance between all pairs in the five given organisms (producing a 5x5 matrix) using the above scoring system. B) Construct a phylogenetic dendogram using the Neighbor Joining algorithm, as shown in the tutorial, based on the distance matrix computed in the previous section: B.1) Using the current distance matrix, compute the relative distance matrix. B.2) Choose closest neighbors according to the relative distance matrix. In case several choices yield the same value, choose randomly between the pairs. B.3) Compute the distance Dx,i and Dx,j between the two selected neighbors from phase [B.2] (i) and (j), and their common ancestor (x) . B.4) Compute the distance from the new common ancestor (x) and the rest of the nodes. B.5) Using the new distance matrix based on phase [B.4], repeat the process (starting from [B.1]) until only two organism remain. The Dendogram produced in the above process should include all the distances along the edges, as computed at phase (B.3). Please detail all of the computations along the process.