* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 9 Protein Secondary Structure
Magnesium transporter wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Interactome wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Western blot wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic code wikipedia , lookup
Structural alignment wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Introduction to Bioinformatics for Medical Research Gideon Greenspan [email protected] Lecture 9 Protein Secondary Structure Protein Secondary Structure • Protein Structure • Protein Folding – Alpha helices, beta sheets, loops • Secondary Structure Prediction – Artificial Neural Networks • PHDsec • PSIpred 2 Structure Levels 3 Structure Prediction: Motivation • Understand protein function – Locate binding sites • Broaden homology – Detect similar function where sequence differs • Explain disease – See effect of amino acid changes – Design suitable compensatory drugs 4 Prediction Methods • Primary (sequence) to secondary structure – Sequence characteristics • Secondary to tertiary structure – Fold recognition – Threading against known structures • Primary to tertiary structure – Ab initio modelling 5 Protein Folding • Proteins fold in watery substrate – Want hydrophobic residues on interior – But main chain is hydrophyllic • Hydrogen bonds neutralize chain – Alpha helices and beta sheets form core • Many other chemical interactions – Including with external chaperones 6 Alpha Helices • Right-handed spiral – 5 to 40 amino acids (10 average) – 3.6 amino acids per turn 7 Beta Sheets • Parallel or anti-parallel strands – Each strand has 5-10 amino acids (6 average) – Up to 6 strands 8 Loop Regions • All other protein regions – Irregular shape and size – Generally at protein surface 9 Secondary Structure Prediction • Chou-Fasman / GOR Method – Based on amino acid frequencies – No more than 60% accurate • Neural Network methods – PHDsec and PSIpred • Use multiple sequences – Secondary structure based on family • Best accuracy now ~78% 10 Brain Neurons • Outgoing signal determined by incoming • Connected together in networks • Learns from and teaches experience 11 Artificial Neurons • Each input’s weight (+ve or -ve) is learned • Weighted inputs are summed • Output simulates threshold 12 Neural Networks 13 PHDsec and PSIpred • PHDsec – Rost & Sander, 1993 – Based on sequence family alignments • PSIpred – Jones, 1999 – Based on PSI-BLAST profiles • Both consider long-range interactions 14 PHDsec Neural Net A C D E F G H I K L M N P Q R S T V W Y . Amino acid at position Inputs for one position Outputs for alpha helix, beta strand, loop Hidden layer 15 PSIpred Input Input sequence Type of Analysis Include PSIBLAST results? Email address 16 PSIpred Output Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Confidence level Conf: 988766667637889999877999871289878877049963202468899999997887 Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE 10 20 30 40 50 60 Predicted structure Conf: 742888731467888768899999999999999987557888998875227887303678 Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA 70 80 90 100 110 120 17 PHDsec Input (1) Email address Type of prediction Additional output Output format Reduce processing 18 PHDsec Input (2) Type (number) of input sequences Upload file Enter sequence Wait for results? 19 PHDsec Output (1) Protein classification Structure proportions Amino acid proportions 20 PHDsec Output (2) Estimated structure Confidence level Structure with high confidence 21 Future Directions • Finer distinction between features – E.g. different types of coil/loop • Multiple neural networks – Combine different approaches • More global approaches – Take information from entire protein • Still no higher than 80% 22