* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Secondary structure prediction
Amino acid synthesis wikipedia , lookup
Catalytic triad wikipedia , lookup
Biosynthesis wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Point mutation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Biochemistry wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Western blot wikipedia , lookup
Proteolysis wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Secondary structure prediction from amino acid sequence Homology: Paralogs and orthologs a duplication a Paralogs = gene families b in same species speciation a b species 1 a b species 2 orthologs DNA sequence Automatic translation Physico-chemical properties Amino acid primary sequence 1. Search for sequence homologue(s) and construct an alignment 2. Homologue(s) with known 3D structure? 3. Motif recognition: Search secondary databases Secondary structure prediction Fold assignment (e. g., using EMBOSS suite) Primary db searches FASTA, BLAST Homology modelling available Chou-Fasman Parameters • Amino acid propensities Accuracy of prediction • Q3 score Q3 = qa+qb+qcoil total no. of residues X 100% Recent improvements • The availability of large families of homologous sequences has greatly enhanced secondary structure prediction. • The combination of sequence data in multiple alignments with sophisticated computing techniques such as neural networks has lead to accuracies well in excess of 70 %. • The limit of 70-80% may be a function of secondary structure variation within homologous proteins. Stereochemical analysis Patterns of residue conservation are indicative of particular secondary structure types. Alpha helices have a periodicity of 3.6. Many alpha helices in proteins are amphipathic, meaning that one face is pointing towards the hydrophobic core and the other towards the solvent. Patterns of hydrophobic residue conservation showing the i, i+3, i+4, i+7 pattern are highly indicative of an alpha helix. XOOXXOOX Stereochemical analysis The geometry of beta strands means that adjacent residues have their side chains pointing in oppposite directions. Beta strands that are half buried in the protein core will tend to have hydrophobic residues at positions i, i+2, i+4, i+8 etc, and polar residues at positions i+1, i+3, i+5, etc. XOXOXOXOXO Stereochemical analysis Beta strands that are completely buried (as is often the case in proteins containing both alpha helices and beta strands) usually contain a run of hydrophobic residues. XXXXXXXXXXXX Helical transmembrane proteins + • Strong hydrophobicity signal from membrane spanning regions, each ~25 residues in length • Predominance of positively charged amino acid residues on cytoplasmic side •Prediction accuracy with multiple alignment = 95% Helical transmembrane proteins • ~30% of top 100 drugs bind to membrane proteins • Difficult to determine experimentally • But much easier to predict than globular proteins! • TMpred – based on statistical analysis of transmembrane proteins • TMHMM – based on Hidden Markov Model Protein Structure Classification Class(C) secondary structure content – mainly alpha, mainly beta, alpha/beta, few secondary structures (type) Architecture(A) gross arrangement of sec. structure elements (type and number of SS elements) Topology(T) shape and connectivity of SS (type, number and order of SS elements) Homologous superfamily (H) http://www.cathdb.info/latest/index.html Topology Fold families Class Architecture Topology Homologous domains, share common ancestor H-level In CATH, the assignments of structures to fold groups and homologous superfamilies are made by sequence and structure comparisons. Fold families Class Architecture Topology Homologous domains, share common ancestor H-level Homologous domain family ? Architecture: ‘Barrel’ 9 Topologies : type of SS, number and order Secondary structure prediction methods • PSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick) • JPRED Consensus prediction (includes many of the methods given below; Cuff & Barton, EBI) • DSC King & Sternberg • PREDATORFrischman & Argos (EMBL) • PHD home page Rost & Sander, EMBL, Germany • ZPRED server Zvelebil et al., Ludwig, U.K. • nnPredict Cohen et al., UCSF, USA. • BMERC PSA Server Boston University, USA • SSP (Nearest-neighbor) Solovyev and Salamov, Baylor College, USA. http://speedy.embl-heidelberg.de/gtsp/secstrucpred.html Consensus prediction method hydrophobic highly conserved b= buried, e = exposed Consensus prediction method -JPRED hydrophobic highly conserved hydrophobic b= buried, e = exposed amphipathic Neural network prediction - PHD Multiple alignment of protein family SS profile for window of adjacent residues Hidden Markov Models-HMMSTR amino acid secondary structure element structural context Markov state • Recurrent local features of protein sequences • Accuracy of 74% Bystroff et al., 2000