* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download tutorial4_scoringMatices
Survey
Document related concepts
Protein design wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein folding wikipedia , lookup
Protein domain wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
List of types of proteins wikipedia , lookup
Homology modeling wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein purification wikipedia , lookup
Structural alignment wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Western blot wikipedia , lookup
Transcript
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1 Amino acids were not born equally 2 Comparing Protein Sequences Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix Advance comparison tools Psi-BLAST Phi-BLAST 3 Substitution Matrix Scoring matrix S 20x20 for protein alignment (Amino-acid) Si,j represents the gain/penalty due to substituting AAj by AAi (i – line , j – colomn) Based on likelihood this substitution is found in nature Computed differently in PAM and BLOSUM 4 Computing probability of Mutation (Mi,j) PAM - Point Accepted Mutations Based on closely related proteins (X% divergence) Matrices for comparison of divergent proteins computed BLOSUM - Blocks Substitution Matrix Based on conserved blocks bounded in similarity (at least X% identical) Matrices for divergent proteins are derived using appropriate X% 5 PAM-1 Captures mutation rates between close proteins 1% divergence Mi,j = AB / #A Problematic when comparing far proteins The 1% divergence does not capture more sporadic mutations PAM250 is theoretical (extrapolation based) 6 PAM-1 7 BLOSUM62 Captures mutation rates between divergent proteins Why is BLOSUM62 called BLOSUM62? Basically, this is because all blocks whose members shared at least 62% identity with ANY other member of that block were averaged and represented as 1 sequence. 8 BLOSUM62 The idea of BLOSUM matrices is to get a better measure of differences between two proteins specifically for more distantly related proteins. Similar AA have high score 9 PAM & BLOSUM PAM BLOSUM Based on local alignments. Based on global alignments of closely related proteins. 10 The PAM1 is calculated from comparisons of sequences with no more than 1% divergence. BLOSUM 62 is calculated from comparisons of sequences with at least 62% identity in the blocks. Other PAM matrices are extrapolated from PAM1. All BLOSUM matrices are based on observed alignments. They are not extrapolated from comparisons of closely related proteins. Use Recommendations PAM100 PAM120 PAM160 PAM200 PAM250 ~ ~ ~ ~ ~ BLOSUM90 Closely Related BLOSUM80 BLOSUM60 BLOSUM52 BLOSUM45 Highly Divergent Query length Matrix <35 PAM30 11 35-50 50-85 >85 Gap costs 9,1 PAM70 10,1 BLOSUM80 10,1 BLOSUM62 11,1 Example Query: >ADRM1_HUMAN (Proteasomal ubiquitin receptor) Data Base: nr on Human genome. Blast Program: BLASTP Matrices: PAM30,BLOSUM45 12 What difference do we observe? •With BLOSUM45 we found related and divergent sequences. •With PAM30 we found only related sequences. PAM 30 13 BLOSUM45 With BLOSUM45 we can discover interesting relations between proteins PAM 30 BLOSUM45 . . . 14 Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens Using different scoring matrices can produce slightly Different alignments: With PAM 30 With BLOSUM45 15 A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45): 16 PSI-BLAST Position Specific Iterative BLAST We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVI DEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNK MENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIM GSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS 17 18 Threshold for initial BLAST Search (default:10) Threshold for inclusion in PSI-BLAST iterations (default:0.005) 19 The query itself Orthologous sequences in two other archaeal species Other homologous sequences 20 21 . . . . . . . . . 22 Is MJ0577 a filament protein? Is MJ0577 a cationic amino transporter? Is MJ0577 a universal stress protein?