Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genetic code wikipedia , lookup
Non-coding DNA wikipedia , lookup
Community fingerprinting wikipedia , lookup
Molecular evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Protein structure prediction wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Introduction to Bioinformatics Tutorial no. 7 Predicting protein structure PSI-BLAST PHDsec and PSIpred PHDsec PSIpred Rost & Sander, 1993 Based on sequence family alignments Jones, 1999 Based on PSI-BLAST profiles Both consider long-range interactions PSIpred Input Input sequence Type of Analysis PSIpred Input (2) Filtering Options Email address GO! PSIpred Output Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Confidence level Conf: 988766667637889999877999871289878877049963202468899999997887 Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE 10 20 30 40 50 60 Predicted structure Conf: 742888731467888768899999999999999987557888998875227887303678 Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA 70 80 90 100 110 120 PHDsec Input (1) Email address Type of prediction Additional output Output format Reduce processing PHDsec Input (2) Type (number) of input sequences Upload file Enter sequence Wait for results? PHDsec Output (1) Protein classification Structure proportions Amino acid proportions PHDsec Output (2) Estimated structure Confidence level Structure with high confidence PSI-BLAST Position-Specific Iterative BLAST Finds more distantly related sequences Extension to BLASTP Distant sequences with insignificant E values Even in distantly related sequences, important domains can be highly conserved PSI-BLAST gives more weight to those PSI-BLAST Profile When close sequences are aligned – areas of conservation. Scoring matrix becomes position specific Each column has a unique set of a.a. frequencies. Score is column specific, based on a.a. frequency. More frequent a.a. -> higher score. A new sequence is scored based on the new scoring matrix. 123456 AMTYQR CTTYQS SMTYQA Position-Specific Scoring Matrix A PSI-BLAST Iteration Collect all database sequence segments that have been aligned with query sequence with E-value below set threshold (default 0.01) Construct position specific scoring matrix for collected sequences. Rough idea: Align all sequences to the query sequence as the template. Assign weights to the sequences Construct position specific scoring matrix Find sequences that mach the profile Using PSI-BLAST (1) Available from main BLAST page Or switch on in BLASTP E value threshold for initial inclusion in multiple alignment for profile Using PSI-BLAST (2) Align selected sequences, generate profile, search again Number of results to show next iteration New result Select whether to include in next iteration