* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download w0506_tutorial8
G protein–coupled receptor wikipedia , lookup
Gene expression wikipedia , lookup
Western blot wikipedia , lookup
Molecular evolution wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Bottromycin wikipedia , lookup
Biochemistry wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Protein adsorption wikipedia , lookup
Genetic code wikipedia , lookup
Protein design wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein domain wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Point mutation wikipedia , lookup
Circular dichroism wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Structural alignment wikipedia , lookup
Introduction to Bioinformatics Tutorial no. 8 Predicting protein structure PSI-BLAST PHDsec and PSIpred PHDsec PSIpred Rost & Sander, 1993 Based on sequence family alignments Jones, 1999 Based on PSI-BLAST profiles Both consider long-range interactions PSIpred Input Input sequence Type of Analysis PSIpred Input (2) Filtering Options Email address GO! PSIpred Output Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Confidence level Conf: 988766667637889999877999871289878877049963202468899999997887 Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE 10 20 30 40 50 60 Predicted structure Conf: 742888731467888768899999999999999987557888998875227887303678 Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA 70 80 90 100 110 120 PHDsec Input (1) Email address Type of prediction Additional output Output format Reduce processing PHDsec Input (2) Type (number) of input sequences Upload file Enter sequence Wait for results? PHDsec Output (1) Protein classification Structure proportions Amino acid proportions PHDsec Output (2) Estimated structure Confidence level Structure with high confidence PSI-BLAST Position-Specific Iterative BLAST Finds more distantly related sequences Extension to BLASTP Distant sequences with insignificant E values Even in distantly related sequences, important domains can be highly conserved PSI-BLAST gives more weight to those PSI-BLAST Profile When close sequences are aligned – areas of conservation. Scoring matrix becomes position specific Each column has a unique set of a.a. frequencies. Score is column specific, based on a.a. frequency. More frequent a.a. -> higher score. A new sequence is scored based on the new scoring matrix. 123456 AMTYQR CTTYQS SMTYQA Position-Specific Scoring Matrix A PSI-BLAST Iteration Collect all database sequence segments that have been aligned with query sequence with E-value below set threshold (default 0.01) Construct position specific scoring matrix for collected sequences. Rough idea: Align all sequences to the query sequence as the template. Assign weights to the sequences Construct position specific scoring matrix Find sequences that mach the profile Using PSI-BLAST (1) Available from main BLAST page Or switch on in BLASTP E value threshold for initial inclusion in multiple alignment for profile Using PSI-BLAST (2) Align selected sequences, generate profile, search again Number of results to show next iteration New result Select whether to include in next iteration Exercise 1 1. There is a protein with an unknown structure: >some protein MEAFLGTWKMEKSEGFDKIMERLGVDFVTRKMGNLVKPNLIVTDLGGGKYK MRSESTFKTTECSFKLGEKFKEVTRFTRGHFFMITVENGVMKHEQDDKTKV TYIERVVEGNELKATVKVDEVVCVRTYSKVA Can BLAST help us to predict its SS? 2. Use any secondary structure prediction method to predict the secondary structure of 1O8V and compare it to the solved structure. NOTICE! The secondary structure definition in PDB is given in a 7 letter code instead of 3 letter code (H, E, C). For comparison purposes consider: G H and I as H; E as E ; all the rest including spaces as C. 3. What can you conclude about the secondary structure prediction in this case? 4. Are the results consistent with the confidence value of the prediction? 5. Can you explain the prediction results based on the real structure? Exercise 1 • Exercise 2 Prion is the protein which responsible to the Mad Cow Disease. In the normal situation the amino acids in a specific region are arranged in α-helix (H1). In the abnormal situations this region undergoes a change into a β-strand conformation. • This conformational change is thought to be the origin of the disease, which brings to a rapid degeneration of the nerve system, and usually causes death. • It is assumed that the prion molecules, which changed conformations, accelerate the conformational change of additional molecules. 1. Check what conformation is predicted for this protein. 2. The PDB code of the prion protein is 1ag2. The helix is located at positions 21-30 on the sequence in this file. Does the predicted SS correlates with the real one in the region of interest?