* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Tutorial_4 (2016) - Protein Alignments
Survey
Document related concepts
Circular dichroism wikipedia , lookup
Protein folding wikipedia , lookup
Protein domain wikipedia , lookup
Homology modeling wikipedia , lookup
Protein structure prediction wikipedia , lookup
Structural alignment wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein purification wikipedia , lookup
Protein moonlighting wikipedia , lookup
Western blot wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Transcript
Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda • Why study distant homologies? • Substitution Matrices – PAM - Point Accepted Mutations – BLOSUM - Blocks Substitution Matrix • PSI-BLAST Cool story of the day: Why should we care about cellular fusion in worms? 2 How proteins evolve • Throughout evolution proteins change • Some change more than others, and at different rates in different regions of the protein. 3 Why study distant homologies? • When we study a new organism we may find a lot of unknown sequences that we would like to characterize. We might not be able to find any close homologies. • Substitution matrices model different evolutional distances. • PSI-BLAST enable to find more distant relations between proteins. 4 Amino acids were not born equally Both substitution matrices and PSI-BLAST are designed to model the process by which AAs mutate. 5 Substitution Matrix • Scoring matrix S of size 20x20 • Si,j represents the gain/penalty due to substituting AAj by AAi (i – line , j – column) – Based on likelihood this substitution is found in nature – Computed differently in PAM and BLOSUM • Each matrix is tailored to a particular evolutionary distance 6 Computing probability of Mutation (Mi,j) • PAM - Point Accepted Mutations – Based on a small set of proteins that are closely related – Other than PAM1 the matrices are theoretical. • BLOSUM - Blocks Substitution Matrix – Based on a wider database of proteins that includes families of proteins with conserved regions. – The matrices are empirical. 7 PAM • Based on a small set of proteins that are closely related • PAM1 Captures mutation rates between close proteins – protein with 1% divergence • Problematic when comparing distant proteins. The 1% divergence does not capture more sporadic mutations 8 PAM-X • In order to apply for more distant proteins PAM-1 was self-multiplied. This models the evolutionary process of accumulation of mutations. • The higher the number of the matrix – the more suitable it is to find distant homologies. • Other than PAM1 the matrices are theoretical. 9 BLOSUM • Scores for each position are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins. • BLOSUM62 contains all blocks whose members shared at most 62% identity with any other member of that block. 10 BLOSUM-X Groups of proteins with similar functions BLOCKS DB Up to 50% similarity Up to 50% similarity Substitution Matrix A Up to 32% similarity Substitution Matrix B 11 PAM vs. BLOSUM PAM Based on global alignments of closely related proteins. BLOSUM Based on local alignments. BLOSUM 62 is calculated from The PAM1 is calculated from comparisons of sequences with no more comparisons of sequences with no more than 62% identity in the blocks. than 1% divergence. Other PAM matrices are extrapolated from PAM1. All BLOSUM matrices are based on observed alignments. They are not extrapolated from comparisons of closely related proteins. BLOSUM are the substitution matrices in use 12 Use Recommendations PAM100 ~ BLOSUM90 PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Closely Related Highly Divergent Query length Matrix Gap costs <35 PAM30 9,1 35-50 PAM70 10,1 50-85 BLOSUM80 10,1 >85 BLOSUM62 11,1 http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html 13 Example • Query: an uncharacterized (hypothetical) protein • Data Base: nr • Blast Program: BLASTP • Matrices: PAM30 / PAM250 BLOSUM45 / BLOSUM90 14 15 16 PSI-BLAST Position Specific Iterative BLAST Aimed to find more distant proteins than BLAST allows 17 PSI-BLAST Steps 1. Search a query against a protein database 2. Constructs a specialized multiple sequence alignment based on the top results. 3. Creates a position-specific scoring matrix (PSSM). 4. The PSSM is used as a query against the database 5. PSI-BLAST estimates statistical significance (E values) Repeat steps 3-5 iteratively. Iterations Query Results Search Protein DB PSSM 18 Example We will use a sequence of an uncharacterized (hypothetical) protein: 19 Threshold for initial BLAST Search (default: 10) Threshold for inclusion in PSI-BLAST iterations (default: 0.005) 20 The results are all hypothetical proteins 21 22 Cool Story of the day Why should we care about cellular fusion in worms? Cellular fusion In cellular fusion two cells unite and form one cell • Fertilization • Muscle cells are composed of rows of fused cells • Placenta is made up of powerful multinucleated cells that are actually numerous individual cells that have fused • The eyes' lenses are formed of rows of fused cells • In bones too cellular fusion occurs. • The fusion processes are also involved in cancer, viral infections and stem cells. http://www1.technion.ac.il/_local/includes/blocks/scinews-items/100513-elegans/news-item-en.htm 24 Cellular fusion in C.elegans • The exact way fusion takes place is still not completely clear and is the focus of work in Prof. Podbilewicz's lab. • The worm suits cell fusion research because in its skin intensive cell-cell fusion processes take place and can be easily followed. • They identified the protein responsible for the worm's fusion activity - the EFF-1 protein. • The researchers showed that in mutant worms skin cells do not fuse and the cells begin to migrate through the body. Beni Podbilewicz 25 26 “...we identified fusion family (FF) proteins within and beyond nematodes, and divergent members from the human parasitic nematode Trichinella spiralis and the chordate Branchiostoma floridae could also fuse mammalian cells…” 27