* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Метод поиска SDP
Expression vector wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Point mutation wikipedia , lookup
Peptide synthesis wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Catalytic triad wikipedia , lookup
Homology modeling wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Acetylation wikipedia , lookup
Western blot wikipedia , lookup
Structural alignment wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Anthrax toxin wikipedia , lookup
SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application thereof to the MIP family of membrane transporters Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova Large families of proteins: generally similar biochemical function but many different specificities… Example: ~800 transcription factors of the LacI family. Average sequence identity 30%. Bind different effectors and operators. Some effectors: • lactose (LacI) • D-fructose-6-phosphate (FruR) • guanine, hypoxantine (PurR) • cytidine, adenosine (CytR) • trehalose-6-phosphate (TreR) • D-gluconate (GntR) • • • • • • D-galactose (GalR) D-ribose (RbsR) maltose (MalR) raffinose (RafR) ……. Х?? Q9KDW9 Q8Y6Z1 Q97JG6 GLPF_ECOLI Q8ZJK5 GLPF_HAEIN GLPF_PSEAE AQPZ_BRUME Q92NM3 Q8UJW4 AQPZ_ECOLI Description of specificity groups : Group А: No. 1-10,13… Group В: No.12, 14-16… Group С: No. 17-45… … ----------MSPFLGEVIGTMILIILGGGVVAGVVLKGTK ----MIDTSLATQFLGEVIGTAILIILGAGVVAGVSLKRSK ----------MTIFFAELVGTLLLILLGDGVVANVVLKNSK MSQT---STLKGQCIAEFLGTGLLIFFGVGCVA--ALKVAG MSQTA-SSTLKGQCIAEFLGTGLLIFFGAGCVA--ALKLAG MDKS-----LKANCIGEFLGTALLIFFGVGCVA—-ALKVAG MTTAAPTPSLFGQCLAEFLGTALLIFFGTGCVA--ALKVAG ---------MLNKLSAEFFGTFWLVFGGCGSAILAA--AFP ---------MFRKLSVEFLGTFWLVLGGCGSAVLAA--AFP ---------MGRKLLAEFFGTFWLVFGGCGSAVFAA--AFP ---------MFRKLAAECFGTFWLVFGGCGSAVLAA--GFP SDPpred Testing on families that include proteins with resolved 3D structure Positions that account for specificity Assignment of specificity to new proteins ? Experiment What are SDPs? (SDP = Specificity Determining Position) • Specificity group = group of proteins that have the same specificity (experimental data, genome analysis, etc.) • SDP = alignment position that is conserved within specificity groups but differs between them SDP is not equivalent to a functionally important position! Algorithm • Mutual information Ip reflect the extent to which an alignment position tends to be a SDP. N - number of groups, f (i ) - fraction of proteins in group i. N 20 f p ( , i) f ( , i) - ratio of occurrences of amino acid In group i in position p to the length of the whole alignment column, I p f p ( , i) log f ( ) - frequency of amino acid in the whole alignment f p ( ) f (i ) i 1 1 p p column in position p, • Statistical significance of Ip. Expected mutual information Ipexp of an alignment column. I p I exp Z-score. p Z p (I exp) p (Mirny&Gelfand, 2002, J Mol Biol, 321(1)) • Smoothed amino acid frequencies: a leucine is more a methionine than a valine, and any arginine has a dash of lysine… f ( , i) n( , i) n(i) 20 n( , i) n( , i)m( ) ~ 1 f ( , i) n(i) n(i) n(i) • Are 5 SDP with Z-score >10.5 better than 10 SDP with Z-score >9.0? Bernoulli estimator for selection of proper number of SDPs * k arg min Pthere are at least k observed Z - scores Z Z k Z• 1 ы Z 2 … k n arg min 1 C ni q i p n i k i n k 1 p P( Z Z k ) Zk 1 exp( Z 2 )dZ 2 q 1 p • Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): 443-56 • http://math.belozersky.msu.ru/~psn/ Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8. Web interface Input: multiple alignment of proteins divided into specificity groups === AQP === %sp|Q9L772|AQPZ_BRUME -------------------------------------mlnklsaeffgtfwlvfggcgsa ilaa--afp-------elgigflgvalafgltvltmayavggisg--ghfnpavslgltv iiilgsts------------------------------slap-----------------qlwlfwvaplvgavigaiiwkgllgrd-------------------------------------%sp|P48838|AQPZ_ECOLI -------------------------------------mfrklaaecfgtfwlvfggcgsa vlaa--gfp-------elgigfagvalafgltvltmafavghisg--ghfnpavtiglwa lvihgatd------------------------------kfap-----------------qlwffwvvpivggiiggliyrtllekrd------------------------------------%tr|Q92ZW9 -------------------------------------mfkklcaeflgtcwlvlggcgsa vlas--afp-------qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglav iiilgsth------------------------------rrvp-----------------qlwlfwiaplfgaaiagivwksvgeefrpvd---------------------------------=== GLP === %sp|P11244|GLPF_ECOLI ----------------------------msqt---stlkgqciaeflgtglliffgvgcv aalkvag---------a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwl glilaltd------------------------------dgn--------------g-vpr -flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl------------%sp|P44826|GLPF_HAEIN ----------------------------mdks-----lkancigeflgtalliffgvgcv … Web interface Output Alignment of the family with the SDPs highlighted (Alignment view) Detailed description Plot of probabilities, of each SDP used by the Bernoulli (List of SDPs) estimator to set the cutoff (Probability plot view) Examples: the LacI family of bacterial transcription factors • Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups – 44 SDPs 10 residues contact NPF (analog of the effector) 7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ) 6 residues make up intersubunit contacts 5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ) 7 residues contact the operator sequence 6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ) LacI from E.coli Examples: bacterial membrane channels of the MIP family • Training set: 17 sequences, average length 280 amino acids, 2 specificity groups: Aquaporines & glyceroaquaporines – 21 SDPs 8 residues contact glycerol (substrate) (dmin<5Ǻ) 8 residues oriented to the channel GlpF from E.coli 5 residues make up contacts with other subunits Why does the prediction make sense? LacI from E.coli • Total 348 amino acids Non-contacting residues (distance to the DNA, effector, or the other subunit >10Ǻ) Contact zone (may be functional) • 44 SDP Contacting residues (distance to the DNA, effector, or the other subunit <5Ǻ) Why does the prediction make sense? GlpF from E.coli • Total 281 amino acids Non-contacting residues (distance to the substrate, or another subunit >10Ǻ) Contact zone (may be functional) • 21 SDP Contacting residues (distance to the substrate, or another subunit <5Ǻ) GlpF from E.coli, a membrane channel from the MIP family: SDPs either interact with the substrate or are located on the outer surface of the monomer Structure of the GlpF monomer Predicted SDPs Glycerol SDPs located on the outer surface of the GlpF monomer form subunit contacts 20Leu, 24Ile, 108Tyr of one subunit, 193Ser from another subunit Glu43 from all four subunits SDPs located on the outer surface of the GlpF monomer (continued) Subunit I Subunit I Subunit II Subunit II Subunit IV Residue Atom Residue Atom Residue Atom (Ǻ) Residue Atom Residue Atom (Ǻ) Glu43 OE1 Ser38 O 4.8 Leu20 CD2 Ile158 CD1 4.3 Glu43 OE2 Glu43 OE2 4.1 Leu20 CD1 Leu162 CD2 4.5 Glu43 CG Trp42 CD1 3.7 Phe24 CZ Ile158 CG2 3.9 Glu43 OE2 Glu43 OE2 4.1 Phe24 CZ Leu186 CD1 3.9 Phe24 CE2 Val189 CG2 3.8 Phe24 CE2 Ile190 CG1 3.7 Phe24 CA Ser193 CB 3.9 Phe24 O Ser193 OG 4.2 Phe24 O Ser193 CB 3.3 Gly27 O Ser193 O 3.2 Cys28 CA Ser193 CA 3.8 Tyr108 OH Ser193 O 2.6 Tyr108 CE1 Met194 CE 3.7 Tyr108 CE1 Leu197 CD1 3.9 SDPs located on the outer surface of the GlpF monomer (continued) Structure of contacts in the type A cluster Structure of contacts in the type B cluster Conclusions I. SDPpred: the SDP prediction method • A method for identification of amino acid residues that account for differences in protein functional specificity – Does not rely on the protein 3D structure – Automatically determines the number of significant positions – Considers substitutions according to the chemical properties of substituted amino acids • Results agree with available structural and experimental data • Applicable to any protein family in a standard way Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): 443-56 http://math.belozersky.msu.ru/~psn/ Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8. Conclusions II. SDPs for GlpF from E.coli • In protein families, whose members function as oligomers, predicted SDPs are often localized on the contact surface between subunits • 5 “surface” SDPs in GlpF: 20Leu, 24Ile, 43Glu, 108Tyr, 193Ser. All of them participate in forming the quaternary structure Evolutionary pressure on amino acids that establish intersubunit contacts correlates with evolutionary pressure on amino acids that account for the correct recognition of the substrate • These residues form compact spatial clusters “structural clasps” for recognition of proper subunits • • • • • Olga V. Kalinina • Acknowledgements Pavel S. Novichkov – Leonid A. Mirny Andrey A. Mironov – Olga Laikova Mikhail S. Gelfand – Vsevolod Makeev Aleksandra B. Rakhmaninova – Roman Sutormin – Shamil Sunyaev – Department of Bioengineering – Aleksey Finkelstein and Bioinformatics, Moscow State University, Moscow, Russia – Institute for Information Transmission Problems RAS, Moscow, Russia – State Scientific Center GosNIIGenetika, Moscow, Russia