Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multiple alignment June 26, 2003 Learning objectives-Understand usefulness of multiple alignment. Become familiar with ClustalW. Announcement on seminar tomorrow Lunch 12-12:45 in PS 612 Seminar 1-2 pm PS158 Cookies and punch 2-2:45 pm Student briefings 3-5 pm July 7 Local Alignment Project Demo July 11 Writing assignment/Presentation Steps to multiple alignment Create Alignment Edit the alignment to ensure that regions of functional or structural similarity are preserved Phylogenetic Structural Find conserved motifs Design of to deduce function Analysis PCR primers Analysis Clustal W (Thompson et al., 1994) CLUSTAL=Cluster alignment The underlying concept is that groups of sequences are phylogenetically related. If they can be aligned then one can construct a tree. Step1-pairwise alignments Step2-create a guide tree Step3-progressive alignment Flowchart of computation steps in Clustal W (Thompson et al., 1994) Pairwise Alignment: Calculation of distance matrix Creation of unrooted Neighbor-Joining Tree Rooted NJ Tree (guide tree) and calculation of sequence weights Progressive alignment following the Guide Tree Step 1-pairwise alignments Compare each sequence with each other and calculate a distance matrix. A Different sequences - B .87 - C .59 .60 A B C Each number represents the number of exact matches divided by the sequence length (ignoring gaps). Thus, the higher the number the more closely related the two sequences are. In this distance matrix, sequence A is 87% identical to sequence B Step 2-Create Guide Tree Use the Distance Matrix to create a Guide Tree to determine the “order” of the sequences. Different sequences A - B .87 0.87 0.60 C .59 .60 A B C A B C Guide Tree Branch length proportional to estimated divergence between A and B (0.13) Step 3-Progressive Alignment A B C Align A and B first. Then add sequence C to the previous alignment. In the closely aligned sequences, gaps are given a heavier weight than more divergent sequences. Guide Tree Why a heavier weight? Because those gaps suggest separations between functional or structural entities. In more divergent sequences gaps may be produced as an artifact of sequences that are dissimilar and may disrupt important entities. Gap treatment Short stretches of 5 specific hydrophilic residues often indicate loop or random coil regions and therefore gap penalties are reduced in they occur in such stretches. Gap penalties for closely related sequences are lowered compared to more distantly related sequences. It is thought that those gaps occur in regions that do not disrupt the structure or function. Gap penalties increase when required at 8 residues or less for alignment. Because the minimum functional entity is 8 residues (from structure analysis) A gap penalty after each aa is given according the frequency that such a gap naturally occurs in nature Amino acid weight matrices As we know, there are many scoring matrices that one can use that depend on the relatedness of the aligned proteins. In ClustalW, as the alignment proceeds to longer branches the aa scoring matrices are changed to more divergent scoring matrices. The length of the branch is used to determine which matrix to use and contributes to the alignment score. Flowchart of computation steps in Clustal W (Thompson et al., 1994) Pairwise Alignment: Calculation of distance matrix Creation of unrooted Neighbor-Joining Tree Rooted NJ Tree (guide tree) and calculation of sequence weights Progressive alignment following the Guide Tree From Baxenavis and Oullette, 2001 Example of Sequence Alignment using Clustal W Asterisk represents identity : represents high similarity . represents low similarity Multiple Alignment Considerations Quality of guide tree. It would be good to have a set of closely related sequences in the alignment to set the pattern for more divergent sequences. If the initial alignments have a problem, the problem is magnified in subsequent steps. CLUSTAL W is best when aligning sequences that are related to each other over their entire lengths Do not use when there are variable N- and C- terminal regions If protein is enriched for G,P,S,N,Q,E,K,R then these residues should be removed from gap penalty list. (what types of residues are these?) Reference: http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/