* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Sequence - University of California, Davis
Survey
Document related concepts
Circular dichroism wikipedia , lookup
Rosetta@home wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein design wikipedia , lookup
Protein domain wikipedia , lookup
Protein folding wikipedia , lookup
Structural alignment wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Alpha helix wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Transcript
Protein Sequence Amino Acid Composition IEC RP HPLC Ancient Sequencing methods Modern Sequencing methods Sequencing the Gene Then what? Amino Acid Composition 1952 - Complete Acid Hydrolysis Ion Exchange Chromatography with programmed buffer changes (~3 hr) Post-column derivatization with Ninhydrin Fluorescamine 1980 - Complete Acid Hydrolysis Precolumn derivatization to Phenylthiohydantoins Reversed-Phase HPLC (~30 min) Sequencing Sanger Endgroup Analysis Modify the protein with fluorodinitrobenzene (amines), aka FDNB, Sanger’s reagent. Alternative reagent, dansyl chloride, fluorescent. Hydrolyze protein Separate by TLC Identify N-terminal amino acid by Rf Treat protein with Aminopeptidase Repeat until the end gets ragged Use proteolytic fragments for simplicity Sequencing Generate proteolytic fragments Use more than one protease in separate experiments Trypsin cleaves after Arg and Lys residues Chymotrypsin cleaves after Phe, Tyr, Trp Separate fragments (HV paper electrophoresis/HPLC) Sequence all peptides independently Assemble the sequence using overlap info Trypsin Chtr Automated Sequencing Use proteolytic fragments Sequence each peptide using automated Edman Degradation Each Edman cycle removes one amino acid Converts it to PTH amino acid for HPLC Assemble the sequence using overlap info Trypsin Chtr N-Terminal Edman Degradation S S - H H H C N H + O H R' O H2 N C C C C R'' N N R H H C N N H O C C C C N R' N R'' R H O R H O N C O H C C C C S N H R' N R'' N Peptide Attack on Phenylisothiocyanate H R N C H + H+ C O C S Rearrangement N + Analinothiazolinone amino acid H H2 N C R' O C N R'' H R H N C C O Peptide N-1 C N S PTH-amino acid Absorbs 260-275 nm RP-HPLC compatible C-Terminal Edman Degradation O O H RHN O H R' O C C C C N OH R H H3 C C - H H3 C C OH RHN O H3 C C O Activation of carboxyl by acetic anhydride O H R' O C C C C N O R H C O CH3 O - H OH RHN C S Attack by thiocyanate H H3 C C N O H R' O C C C C N NH R C S +H2O RHN C O OH R R' Hydrolysis Peptide N-1 C O H HN NH S TH-amino acid Alternative Sequencing - MS Use non-fragmenting ionization Electrospray Ionization + traditional mass Spec Matrix-assisted laser desorption-ionization + timeof-flight mass spec (MALDI-TOF) Measures mass of mature, intact protein and/or complexes Sequencing the Gene DNA synthesis in vitro requires Template (the DNA you want to sequence) Primer (complementary to region up stream of where you want to sequence) Polymerase dXTP’s, Mg++ Primer pairs with template, free 3’-OH group ready for action As dXTP’s basepair with template, the 3’-OH attacks the a-phosphate of the dXTP, displacing PPi, making a phosphodiester, extending the nascent DNA chain by one base The Polymerase Reaction R Elongation of a primer that is base-paired with a template Requires a free 3’-0H group O Base O O P OCH 2 Base O O OH 5’ O OH P P P P P P P P P P P PP P O O P O P O P OCH 2 Base O O O O A G C A A C C A T T A A T T C G T T G G T A A T T A C T A G A A T T C A P P P P P P P P P P P P P P P P P P P P P P 3’ O OH 5’ Di-deoxy Terminators If 2’, 3’-dideoxy nucleoside triphosphates were used, the reaction would proceed for only one cycle because there would be no free 3’-OH group to attack the next dXTP If a fraction of a percent of ONE 2’, 3’-dideoxy nucleoside triphosphate (say ddTTP) were used SOME polymer would be terminated EACH time that base was incorporated, i.e., each time dA occurs in the template. If 1/1000th of the dTTP were ddTTP, then 1/1000th of the polymers would terminate at each dA in the template… the rest would continue You would get many polymers of different sizes, each corresponding to the occurrence of a dA in the template Use four separate reactions, one with ddTTP, one with ddATP, one with ddGTP, and one with ddCTP (and all other components) One of the reaction mixtures would contain a polymer that terminated at each base ddTTP ddCTP ddGTP Agarose gel Sequence of template ddATP Base in polymer Use fluorescent or radioactive primer so you can see every polymer Separate them by size (gel electrophoresis) Read sequence of polymers from gel Infer the sequence of the template by Watson-Crick small large Dideoxy Terminators 3’ A T G T C A C A G G A C A G A 5’ 5’ T A C A G T C T C C T G T C T 3’ A, T, G, and C. What are the Amino Acids? Standard Genetic Code First/Second U C A G UUU UUC Phe Phe UCU UCC Ser Ser UAU UAC Tyr Tyr UGU UGC Cys Cys UUA UUG CUU CUC Leu Leu Leu Leu UCA UCG CCU CCC Ser Ser Pro Pro UAA UAG CAU CAC *** *** His His UGA UGG CGU CGC *** Trp Arg Arg CUA CUG AUU AUC Leu Leu Ile Ile CCA CCG ACU ACC Pro Pro Thr Thr CAA CAG AAU AAC Gln Gln Asn Asn CGA CGG AGU ACC Arg Arg Ser Ser AUA AUG GUU GUG Ile Met Val Val ACA ACG GCU GCC Thr Thr Ala Ala AAA AAG GAU GAC Lys Lys Asp Asp AGA AGG GGU GGC Arg Arg Gly Gly GUA GUG Val Val GCA GCG Ala Ala GAA GAG Glu Glu GGA GGG Gly Gly U C A G ORFs - Look for longest uninterrupted sequence Protein Sequence from Nucleotide Sequence 5' 3' GCCCTTTCTAAAATGTCCAAAATGGCGCAAACCAAACTGTATGATGTGA CGGGAAAGATTTTACAGGTTTTACCGCGTTTGGTTTGACATACTACACT 5' 3' 5' Coding Strand Template Strand GCCCUUUCUAAAAUGUCCAAAAUGGCGCAAACCAAACUGUAUGAUGUGA 3' Message A L S K M S K M A Q T K L Y D V ... P F L K C P K W R K P N C M M * ... P F * N V Q N G A N Q T V * C E ... How do you know which strand is the coding strand? You don't... There are six possible frames for translation. So, you’ve got the sequence…So what? Next topic: Bioinformatics Inferences based on homology Questions 1. 2. 3. 4. 5. 6. 7. Has the gene been sequenced before? (Will I be able to publish?) What is the sequence of the protein encoded by the gene? Has the protein been sequenced before? Is the gene similar to one that has been sequenced before? 1. Did I sequence the right gene? 2. Will I be able to find structural or functional relatives? Is the protein similar to one that has been sequenced before? 1. How similar? 2. What does the similarity mean? Can I predict the function of the gene product, or is the predicted function consistent with what I know about the protein? Can I get information about structural features of the gene product? 1. Secondary structure 2. Folding domains or other common patterns 3. Hydropathy profiles 1. How might predicted helices and/or sheet pack? 2. Is it likely to be a membrane protein, a transmembrane protein? Answers: Sequence Similarities and Similarity Searches 1. Search sequence databases for homologous proteins. 2. Find families of proteins that are similar to your protein. 3. Use information about the structure and properties of the similar protein(s) to establish inferences about your protein. If the exact sequence is in the database, the similarity search routines will find that, too. 4. Determine whether two sequences are related (or identical) by aligning them so that homologous regions are adjacent. 5. For two identical sequences: MGKARSMVLKHSTKARS MGKARSMVLKHSTKARS But, what about: Imperfect homology MGKARSMLLKHSTKARS MGKARTMVLKHSTRARS Gaps/insertions MGKARSMLLKHSLKARS MGRA LKHSLRART And, how homologous is homologous Need Similarity scores for pairs amino acids Method for dealing with gaps Algorithms for comparing a sequence with a database Ways to assess the degree of homology Ways to link structural info with sequence info Dynamic Programming Needleman-Wunsch Algorithm Compares similarity of two proteins a & b at positions i & j: NWi,j = max(NWi-1, j-1 + s(aibj); NWi-1, j; +g; NWi, j-1 +g) NWi-1, j-1 = running total s(aibj)= similarity between residue i of protein a and residue j of protein b g = gap penalty http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html Fill a Matrix with all possibilities Simple example: s = 1,0 and g = 0 Smith-Waterman Always compare NW terms to zero so that it doesn’t get too small. NWi,j = max (NWi-1, j-1 + s(aibj); NWi-1, j; + g; NWi, j-1 + g; 0) BLAST & FASTA FASTA - great, we won’t talk about it much faster and more selective than SW, but less sensitive Basic Local Alignment Search Tool less selective and more sensitive than FASTA, i.e., you may get more hits, but some of them may be wrong BLAST Divide sequence into “words” of length W (eg. BLASTp, initial W = 3) Compare all W-length words Retain only pairs with similarity above a threshold,T Call them High-Scoring Pairs Increase W, repeat with HSPs Keep going remaining above a minimum similarity, and compare to random probability (E) Scoring MatricesMaking similarity quantitative Compare the actual frequency to the frequency expected by chance alone. Probablilty that alanine appears at position x in a protein = fraction of Ala in all proteins pAla Probability that one protein has Ala at position x, and another protein has Gly? =pAlapGly The frequency due to chance, alone. Similarity qAla,Gly = ACTUAL frequency that Ala and Gly are at position x in two proteins (in your database) Ri,j = qi,j/pipj Score: Si,j = log2(Ri,j) = log2(qi,j/pipj) “Log-Odds Scores” 1 q i,j log p p Remember Chou & Fasman? i j PAM Matrices Margaret Dayhoff assembled the Atlas of Protein Structure Evolutionarily-accepted mutations Calculated qi,j for all aa’s in closely-related proteins These were accepted by Nature as similar/close enough Generate half matrices: Point Accepted Mutation/Percent Accepted Mutations Scale, so PAM1 reflects 1 mutation per 100 residues, PAM50, 50 allowed mutation/100 BLOSUM Henikoff and Henikoff BLOcks of Amino Acid SUbstitution Matrix BLOCKS is a database of related proteins BLAST Search Go to BLAST Website Enter Nucleotide or AA sequence Choose BLAST type Nucleotide-nucleotide; BLASTn Protein-protein, BLASTp 6-frame-translated nucleotideProtein:BLASTx others Then? Does it make sense? Multisequence Alignment Secondary structure prediction Domains Families Caveat It ain't what you don't know that'll kill you, it's what you know that ain't so.