* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Amino Acids and Their Properties
History of molecular evolution wikipedia , lookup
Butyric acid wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Citric acid cycle wikipedia , lookup
Protein adsorption wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Bottromycin wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Peptide synthesis wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Molecular evolution wikipedia , lookup
Proteolysis wikipedia , lookup
Protein structure prediction wikipedia , lookup
Genetic code wikipedia , lookup
Amino Acids and Their Properties Recap: ss-rRNA and mutations Ribosomal RNA (rRNA) evolves very slowly – Much slower than proteins – ss-rRNA is typically used So by aligning ss-rRNA of one organism with that of another – We can estimate relatedness Amino Acid Substitutions Recall we can align DNA & RNA sequences… What does that mean? We can also align two amino acid sequences Can 2 nucleotides partially match? Can 2 amino acids partially match? Amino Acid Substitutions Aligning sequences Can 2 nucleotides partially match? • Are some nucleotide mutations more significant than others? Can 2 amino acids partially match? • Are some amino acid mismatches more significant than others? Amino Acid Substitutions Can 2 nucleotides partially match? • Significance of a nucleobase mutation – Does name matter? – Does location matter? Can 2 amino acids partially match? • Significance of an amino acid mutation – Name? Location? Sequence matching and evolution rate Proteins tend to evolve slower than DNA Many DNA changes have no affect on a protein A changed codon may map to the same amino acid Non-coding DNA changes may have no effect What does this mean for gauging the relatedness of humans and chimpanzees? humans and fish? Sequence matching and evolution rate Ribosomal RNA (rRNA) evolves very slowly – Much slower than proteins What might rRNA matching be good for measuring the relatedness of? humans and chimpanzees? humans and fish? humans and what? Sequence matching and evolution rate Ribosomal RNA (rRNA) evolves very slowly – Much slower than proteins – ss-rRNA is typically used • (what's that?) However, different regions of ss-rRNA mutate at different rates (Ribosome images next) The Ribosome Source: www.buzzle.c om/articles/ri bosomesfunction.html Ribosomes: diagrams and images ...check images.google.com for: – Ribosome diagram – Ribosome structure Videos includehttp://www.youtube.com/watch?v=ID7tDAr39Ow Recap: ss-rRNA and mutations Ribosomal RNA (rRNA) evolves very slowly – Much slower than proteins – ss-rRNA is typically used So by aligning ss-rRNA of one organism with that of another – We can estimate relatedness Relatedness and Mutations Much DNA mutates relatively quickly Much ss-rRNA mutates relatively slowly Much protein mutates at intermediate rates – Let's focus on protein mutation next Amino acid subsitutions Some amino acids substitutions are more likely than others Why? Amino acid substitutions Some amino acids substitutions are more likely than others Why? • Some are closer to others in terms of nucleobase codons • Some are closer in terms of resulting protein function Amino acid substitutions II Substituting similar ones is likely to Substituting dissimilar ones is likely to Retain the protein structure and function Change the protein structure and function Similarity of amino acids means what? Amino acid substitutions III Similarity of amino acids means similar physicochemical properties Physicochemical: Concerning the physical and chemical Concerning physical chemistry Physical chemistry: Connecting macroscopic properties of substances with their molecular properties Amino acid physicochemical properties Nonpolar(Hydrophobic) ACFGILMPVW Polar (hydrophilic): NQSTY Aromatic: FHWY (having to do with 6-carbon rings) Basic: HKR Acidic: DE (See http://www.bio.davidson.edu/courses/genomics/jmol/aatable.html By way of contrast, can anyone think of a nonphysicochemical property of some amino acids? Aromatic Special type of ring-shaped molecule Characterized by an unusual stabilizing property Aliphatic Non-aromatic Amino acid abbrevs. G=glycine, P=proline, T=threonine, A=alanine, …, but why the following?? F=phenylalanine Y=tyrosine N=asparagine Q=glutamine W=tryptophan Scoring protein sequence alignments Simple way: Two matching (identical) amino acids score 1 Two mismatching (non-identical) ones score 0 Goal: maximize % of matching amino acids Works well for very similar sequences Example: CADQH CADPM Alignment score=___ Scoring protein sequence alignments II Simple way ignores degree of similarity better to account for degree of similarity! Solution: substitution matrices PAM (Accepted Point Mutation, but “PAM” easier to say than “APM”) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix: answers question, “if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?” Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but “PAM” easier to say than “APM”) matrix PAM1 matrix: answers question, “if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?” – PAM2 matrix: • Not 2%! • Rather, 1%, twice • What is the difference? Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but “PAM” easier to say than “APM”) matrix PAM1 matrix: answers question, “if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?” – PAM250 matrix: • Not 250%, obviously • Why “obviously”? • It is 1%, repeated 250 times! Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but “PAM” easier to say than “APM”) matrix PAM1 matrix: answers question, “if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?” – PAM250 matrix: • It is 1%, repeated 250 times! – BLOSUM matrix is a popular type also Scoring protein sequences: PAM250 Here is PAM250 • source: http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gif CADQH CADPM Alignment score=? Scoring protein sequences: BLOSUM62 (default in Blast 2.0) Source=http://bioinfo.cnio.es/docus/courses/SEK2003Filoge nias/seq_analysis/pairwise.html. Why do self “substitutions” have the highest numbers? Why use PAM, BLOSUM, etc.? Sequence similarity is related to evolutionary distance Simple base matching (match/not) may work ok for closely related organisms humans and chimps, for example Amino acid matching works better as evolutionary distance increases (why?) We’d like to be able to assess relatedness of organisms that diverged long ago humans and worms, for example Relatedness Long Ago See images.google.com for domains of life We still are not sure, but the 3-domain system seems likely But cladistics demands binary splits, so 3 domains requires 2 splits, and 2 domains are more related than the 3rd Why use PAM, BLOSUM…(II) Organisms that diverged long ago have divergent analogous amino acid sequences Since different amino acid substitutions occur at different frequencies… …we can measure relatedness back farther …e.g. when the fraction of identical amino acids is surprisingly low …and the fraction of identical base pairs… …is even lower Comparing Sequences with PAMs (+ recap) What does “PAM” mean? PAM is considered an acronym for Point Accepted Mutation Accepted Point Mutation (original) Percent Accepted Mutations A point mutation is a substitution of 1 amino acid for another An accepted mutation is one that is passed down through the generations Will a mutation be accepted if it is helpful? Harmful? Neutral? Helpful in some circumstances, harmful in others? What Does PAM Mean, cont. PAM has two meanings PAM is a unit of evolutionary time PAM is kind of substitution matrix (The meanings are related) PAM as a Unit of Time A PAM is the amount of evolutionary change resulting in: 1 amino acid mutation per 100 amino acids It is an average over >>100 amino acids …because mutations have randomness After 1 PAM, will an organism have exactly 1% of its amino acids different from what they started out as? PAM, Evolution, and Gaps PAM ignores Insertions Deletions Silent nucleotide substitutions (which are?) PAM counts a change from A to B and back to A as 2 accepted point mutations 2 sequences 200 PAMs apart will have about 25% of amino acids the same! PAM Matrices They describe substitutability of amino acids, based on empirical evidence Empirical = experiential The matrices are derived from repositories of actual homologous sequences A PAM 1 matrix is geared to best compare 2 sequences that are 1 PAM apart A PAM 250 matrix is good for comparing quite diverged sequences PAM 250 matrix is standard Creating a PAM Matrix Let fi be the frequency of amino acid i We express fi as a fraction of the total fi = Frequencies range from… 0.091 (L) down to 0.014 (W) The most common amino acid occurs about ____ times more commonly than the least instances of i __ . instances of any amino acid Creating PAM matrix, cont. Determine mutabilities of the amino acids Some amino acids tend to change easily If alanine’s mutability is set to 100 Others not Serine’s mutability is 117 (highest, 1991 data) Tryptophan’s mutability is 25 (lowest, 1991) Let’s look more closely at mi . . . Creating PAM matrix, cont. Mutability is a number Given an evolutionary interval of 1 PAM let mi = # mutations of amino acid i # instances of amino acid i Alternatively, mi = p (an instance of i mutates) Are the formulas on the previous slide identical? Creating PAM matrix, cont. Next, we break mi into constituent mi,j’s That is, i mutates, but into j at what rate? Use actual data from observed mutations Populate a matrix of probabilities The Diagonal Values on the matrix diagonal do not really describe i mutating into itself! (In reality, can that happen?) They basically show p (i does not mutate) Thus, the columns add up to 1 Is the matrix on the last slide Symmetric? Are there about 1% changed? PAM0 What do you think a PAM 0 matrix might look like? PAMn Use matrix multiplication PAM2 = PAM1 x PAM1 PAM3 = PAM2 x PAM1 PAM250? Do it 250 times! PAM∞ What do you imagine a PAM∞ matrix might look sort of like? Logarithmicize Actually, we take logarithms to get the usual matrix from the probability matrices… First, build another, reference matrix of “expected” probabilities Assume all amino acids are equally mutable Also assume they mutate into each other in proportion to their frequencies (I.e., overall amino acid frequencies are maintained, but otherwise they don’t care what they mutate into) Logarithmicize Now we have two matrices Make a 3rd. Each entry is: Observed probability Expected probability …we’re comparing reality to “if mutations were truly random” Take the log of each entry to make a 4th An entry of 1 means 10x more mutations of that type than expected An entry of -1 means what? Carrying On We now use the matrix to measure relative evolutionary distance