* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Interactome wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene expression wikipedia , lookup
Metalloprotein wikipedia , lookup
Molecular ecology wikipedia , lookup
Community fingerprinting wikipedia , lookup
Western blot wikipedia , lookup
Multilocus sequence typing wikipedia , lookup
Genetic code wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Proteolysis wikipedia , lookup
Point mutation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Transcript
Bioinformatics and Protein Sequence Analysis With sequencing of large number of proteins and subsequent storage of data, it has become easier for researchers to study the proteins. These studies help in providing preliminary insights into the structural and functional aspects of proteins without conducting experiments. Surabhi Agarwal 1 Master Layout (Part 1) This animation consists of 2 parts: Part 1: Protein Sequence Alignment Part 2: Alignment analysis and interpretations 2 Select the relevant algorithm and its associated parameters: 3 4 Pair-wise Sequence alignment Seq 1 Seq 2 Seq 3 5 Extract the newly determined amino acid sequence for your query peptide. Multiple Sequence alignment Assess the significance of the result with its alignment score 1 Definitions of the components Part 1 – Protein sequence alignment 1. Query Peptide: This refers to the unknown protein or peptide that is provided as an input to the sequence analysis server. The sequence of this protein is determined before carrying out further studies for analyzing similarity matches with other proteins. 2. Relevant Algorithm: An algorithm refers to the sequence of logical steps that are used for comparing the query peptide with other given protein sequences. The nature of query such as “Local” or “Global” and “Pair-wise alignment” or “Multiple Sequence Alignment” determines the algorithm that is used. 3. Local Alignment: “Local” alignment represents matching individual blocks of protein sequences in which the protein alignment gets broken at positions where a mismatch occurs. The aim of such alignment studies is to find the longest possible blocks of similarity in aligned protein sequences. 4. Global Alignment: “Global” alignment represents an end-to-end alignment of two or more sequences, where gaps are introduced at the positions where mismatches occur. 5. Pair-wise sequence alignment: This procedure compares and aligns two given sequences. The comparison can either be Global or Local with the quality of alignment being judged by the alignment score. 2 3 4 5 1 2 3 Definitions of the components Part 1 – Protein sequence alignment 6. Multiple Sequence Alignment: This refers to the end-to-end alignment of several given sequences that are provided to the search engine. Multiple alignment tends to introduce minimum gaps and finds regions of similarity within all given sequences. 7. Word –length: The minimum length of an amino acid sequence that needs to match exactly in order to initiate an alignment process in either direction. Sensitivity and speed of alignment are dependent on the word length provided by the user. 8. Scoring Matrix: The matrix of values that are referred to for assigning a score to the alignment of pairs of residues. The matrix used for a BLAST search is selected depending on the type of sequences that one is searching with. These are PAM series matrices and BLOSUM series. a) PAM: PAM stands for Point Accepted Mutations. It is a log-odds, matrix scoring system that is constructed on the amino acid replacements in a set of closely related proteins. PAM value helps in defining the percentage of mutations that get accepted from a given set of proteins. 1 PAM refers to a change in position for an average of 1% of amino-acids residues. b) BLOSUM: This stands for “Blocks of Amino Acid Substitution Matrix” and is constructed from a set of distantly related proteins. BLOSUM provides a comprehensive biological insight into proteins when the evolutionary distance is not known beforehand. It is based on the relative frequency of amino acid residues and the probabilities of their substitution in a set of highly conserved blocks of residues in proteins that are evolutionarily distant. 4 5 1 2 3 4 5 Definitions of the components Part 1 – Protein sequence alignment 9. Threshold: Threshold provides a measure of the statistical significance of the results of an alignment study and represents the expected number of matches occurring by chance event. 10. Gap Penalty and Gap Extension: In an alignment of two or more given protein sequences, a gap is introduced wherever an amino acid mismatch occurs. In this context, “Gap penalty” refers to a deduction in the overall alignment score on introduction of a gap while the “Gap Extension” is for extending an already existing gap. 11. Alignment Score: This is also referred to as the Bit Score and provides a comparative quantification of the quality of alignment. The score increases when a higher number of residue matches and lower number of mismatches are encountered. The alignment having a higher bit score is a better match. 12. Percentage Identity: This indicates the percentage of amino acid residues that are an identical match to each other during the comparison of two sequences. 13. E-value: E-value provides a quantification of any chance alignment between two or more sequences instead of them being a biologically significant match. For similarity match against a database, this value is dependant on the size of the database against which the sequence is compared. The closer the e-value is to zero, the higher is the biological significance of the match. 14. Hit: The results of a search are called a ‘Hit’ and the term ‘best Hit’ would refer to the best result for that particular query. 1 2 Step 1: Pair-wise sequence alignment for two given sequences - INPUT Length of initial set of amino acids that needs to be matched before SEQUENCE DATABASE alignment begins Enter sequence 1 of Matches that Expected Number Word Size >gi|268576797|ref|XP_002643378.1| C. briggsae CBRare allowed to occur chance Values deducted fromby overall alignment COL-186 protein [Caenorhabditis briggsae] MKSTEKKSTELDLELEAQSLRRIAFFGVAMSTVATFV score on introduction and extension of CIITVPLAYNKMQQMQSNMIDQYMASARGIRVA … 1 3 mismatches The reference matrix used to assign scores Enter sequence 2 to matches of residues >gi|6682|emb|CAA35955.1| collagen [Caenorhabditis Enter sequence 1 elegans] MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVP MLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYK … Threshold Gap penalty Scoring Matrix ALIGNMENT ALGORITHM (BLAST) 3 10 Existence 11, Extension 1 BLOSUM62 PAM30 BLOSUM62 4 Action 5 Schematic of the process of pair-wise alignment Description of the action Follow the animation steps. Re-draw all figures. Show all definitions first by highlighting the parameter. Follow it with input of 2 sequences and the parameter values one by one. Downlink after scoring matrix should look like the downlinks seen on web-pages. Click on the downlink and show the BLOSUM62 Matrix getting selected. Click on BLAST tool Audio Narration Alignment algorithms are computer algorithms which take the 2 protein sequences and align them residue by residue. Here we depict alignment done between 2 given sequences. To align two sequences, enter them in input box. We took the example of CBR-COL-186 protein of Caenorhabditis briggsae and collagen of Caenorhabditis elegans. The sequences are abridged for the purpose of animation. To carry out the exact study, users can download the sequences corresponding to the Gene ID. Enter the parameters as per the nature of the query and the purpose of the search and finally click on the BLAST tool. Step 2: Pair-wise sequence alignment for two given sequences - OUTPUT 1 Bit score are the normalized scores which aregraphical found after normalization Dot-Plot is the visualization The statistical measure of the raw scores based onto the scoring of theof The two percentage given sequences of residues find which biological significance. The closer ematrix usedininthe the algorithm approximate were identical overlaps to two identify sequences value is to 0, higher is the biological regions of close similarity significance 2 3 4 5 ALIGNMENT: Sequence 1 Sequence 2 Action Shows the various output formats for pairwise alignment PERCENTAGE DOT-PLOT BITE-VALUE SCORE Shows the match or mismatch Sequence 1 IDENTITY Gaps between each of theintroduced residues in sequence 2 due 6e-19 to lack of similar LELEAQSLRRIAFFGVAMSTVATFVCIITVPLAYNKMQQMQSNMIDQYMASARGIRVARR 77.4 34% bits residues in Sequence 2 + E +SLR++AFFG+A+ST+AT sequence II VP+ 1YN MQ +QS++ + IAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSE----------VEF Description of the action Show the smaller image of the server with every output and definitions coming out of it one at a time as shown in the powerpoint animation http://blast.ncbi.nlm.nih.gov/Blast.cgi Audio Narration Pair-wise alignment with the help of BLOSUM 62 matrix gives various kinds of results after alignment. These are alignment, alignment score, dot-plot, percentage identity and e-value. The raw score from BLOSUM62 matrix is 189 and from PAM30 matrix is 178. Bit score for alignment of the exact same study done using BLOSUM62 is 77.4 and for PAM30 matrix is 78.7. Therefore, the Bit scores give a uniform and normalized measure of the overall quality of alignment irrespective of the scoring system. The biological significance of this result is very high as the e value is very near to 0. For a more detailed study on the types of BLAST tools available, visit http://blast.ncbi.nlm.nih.gov/Blast.cgi 1 Step 3: Pair-wise alignment of sequences against database- INPUT SEQUENCE DATABASE Enter sequence 1 2 MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYM QHVQSSLQSEVEFCQHRSNGLWDEYKRFQGVSGVEGRIKRDAYH RSLGVSGASRKARRQSYGNDAAVGGFGGSSGGSCCSCGSGAAGP AGSPGQDGAPGNDGAPGAPGNPGQDASEDQTAGPDSFCFDCPAG PPGPSGAPGQKGPSGAPGAPGQSGGAALPGPPGP SELECT DATABASE 3 4 5 PROTEIN NUCLEOTIDE GENE PROTEOME GEO EST SNP Action Schematic of the process of pair-wise alignment Word Size 3 Threshold 10 Gap penalty Existence 11, Extension 1 Scoring Matrix PAM30 PAM30 BLOSUM62 ALIGNMENT ALGORITHM (BLAST) Description of the action Follow the animation steps. Re-draw all figures. Show all definitions first by highlighting the parameter. Follow it with input of 1 sequence. Downlink after “Select Database” and “Scoring Matrix” should look like the downlinks seen on web-pages. Select “Protein” under the “Select Database” options box as shown in the animation. Follow this by inputting the parameter values one by one. Click on the downlink against “Scoring Matrix” and show the PAM30 Matrix. Click on BLAST tool. Audio Narration Alignment can also be done by matching a sequence against a related database of sequences to identify it. Input the unknown sequence, and then select the database against which the sequence is to be matched. Fill the parameter values as per the purpose of the search and the nature of the query sequence. In this case we study the hits using PAM30 scoring Matrix. Click on the BLAST tool once all parameters have been entered. 1 Step 4: Pair-wise alignment of sequences against database- OUTPUT SEQUENCE DATABASE Enter sequence 1 MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQKTDTSHH DQDHPTFNKITPNLAEFAFSLYRQLAHQSNSTNIFFSPVSIA TAFAML Word Size 3 Threshold 10 Percentage of residues exactly matching SELECT DATABASE The query is scanned to find domains Existence 11, Extension 1 Gap penalty Identifies the sequence In the case of database searches, E-value and in the query sequence andprotein the selected from Pfam Database. In case,PROTEIN such apairthe organism for unknown NUCLEOTIDE is found bysource the multiplication ofthe hit BLOSUM Scoring Matrix Alignment shows 100% matching with GENE Measure of the quality of the domain is identified, it is shown as part PROTEOME PAM wisealignment e-value number ofsequence sequences identified sequence GEO when compared to bitin the BLOSUM ofthe the result EST database. SNP ALIGNMENT ALGORITHM (BLAST) scores of other hits of the search Pfam ID: pfam01484: Domain Name: Col_cuticle_N Description: Nematode Domain Identified (if any) cuticle collagen N-terminal domain ALIGNMENT:Percentage Identity TOTAL SCORE 17 IDENTIFICATION 69 E-Value 100% 200 1Query 50MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH 250 300 100 150 GENE ID: 179452 col-13 | Collagen [Caenorhabditis MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH 624 bits elegans] 2 3 Database 4 5 Action Shows the various output formats for pairwise alignment 1e-176 MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH Description of the action Show the smaller image of the server with every output and definitions coming out of it one at a time as shown in the powerpoint animation Audio Narration Pair-wise alignment gives various kinds of results after alignment. These are alignment views, alignment score, dot-plot, e-value, percentage identity amongst many others. When compared to bit scores from other hits of the result, the bit score turns out to be the highest for collagen proteins in Caenorhabditis elegans http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html; http://pfam.sanger.ac.uk/ 1 2 Step 5: Multiple Sequence Alignment - INPUT The word-size is the length of the initial seed set of amino acids, which needs to match exactly to get the alignment DATABASE Window Length is the length of SEQUENCE the extended in both directions residues on1either side of the initial Enter sequence matched sequence, till which the >gi|268574584|ref|XP_002642271.1| Hypothetical protein CBG18259 [Caenorhabditis briggsae] Word Size alignment will be extended. MDEKQRLQAYRFVAYSAVTFSTVAVFSLCITLPLVYNYVDGIKTQINHEIKFCKHSARDIF AEVNHIRANPKNASRFARQAGYGTDEAVSGGS Users can choose to see absolute scores for comparing Enter sequence 2or percentage value of the >gi|32565788|ref|NP_871711.1| COLlagen family member (colscores 96) [Caenorhabditis elegans] 3 MDEITRRNAYRFVAYSAVTFSVVAVFSLCITLPMVYNYVHGIKSQINHQISFCKHSARD IFSEVNHIRASPNNATLREKRQAGDCSGCCL Enter sequence 3 >gi|17559060|ref|NP_505677.1| COLlagen family member (col-13) [Caenorhabditis elegans] MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRS NGLWDEYKRFQGVSGVEGRIKRDAYH ADD MORE SEQUENCES 4 Action Schematic of the process of pair-wise alignment 5 Window length 3 10 Gap penalty Existence 11, Extension 1 Score type ABSOLUTE ABSOLUTE PERCENTAGE MULTIPLE SEQUENCE ALIGNMENT (CLUSTAL-W) Description of the action Follow the animation steps. Enter first 2 sequences. Click on “Add more sequences”. Open the 3rd input box for entering thee 3rd sequence. Show the input of 3rd sequence. Show the input of parameters. Select “Absolute” ahead of “Score Type” downlonk. Downlink after scoring matrix should look like the downlinks seen on web-pages. Audio Narration Multiple Sequence Alignment tools are used to compare the amino acid sequences of more than two proteins. The word-size is the length of the seed set of amino acids, which needs to match exactly to get extended in both directions. Window Length is the length of the residues on either side, till which the alignment will be extended. The Gap penalty and extension hold the same meaning as in pair-wise alignment. In the scores, users can choose to see absolute scores for comparing or percentage value of the scores. 1 Step 6: Multiple Sequence Alignment - OUTPUT SEQUENCE DATABASE Enter sequence 1 MPSSVSWGILLLAGLCCLVPVSLAEDP QGDAAQKTDTSHHDQDHPTFNKITP Word Size Enter sequence 2 Threshold Gap penalty MKLLKLTGFIFFLFFLTESLTLPTQPRDIE NFNSTQKFIEDNIEYITIIAFAQYVQEA Enter sequence 2 3 1 0 Existence 11, Extension 1 Mapping ofScoring colors to amino acid Matrix BLOSUM groups MKLLKLTGFIFFLFFLTESLTLPTQPRDIE 2 NFNSTQKFIEDNIEYITIIAFAQYVQEA MULTIPLE SEQUENCE ALIGNMENT (CLUSTAL-W) 3 4 5 Color coded alignment of score querywhich can be Alignment Text alignment ofsequences query sequences compared with other scores to measure the quality of alignmnet MULTIPLE ALIGNMENT COLOR CODEDSEQUENCE ALIGNMENT ALIGNMENT SCORE sequence 1 Sequence 1 sequence 2 Sequence 2 sequence 3 Sequence 3 Action Shows the various output formats for multiple sequence alignment MDE-----KQRLQAYRFVAYSAVTFSTVAVFSLCITLPLVYNYVDGIKTQ MDE-----ITRRNAYRFVAYSAVTFSVVAVFSLCITLPMVYNYVHGIKSQ MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSS Description of the action Show the smaller image of the server with every output coming out of it one at a time http://www.ebi.ac.uk/Tools/es/cgi-bin/clustalw2/ 5269 Audio Narration Multiple sequence alignment gives various kinds of results after alignment. The alignment view in text format displays the residue wise matching for the input sequence. The color coded alignment gives a better graphical picture as the amino acid residues are assigned colors based on their physico-chemical properties. Here we depict one of the many color coding available. Alignment score is an absolute term, as selected previously. It can be compared with other scores to measure the quality of alignment. Users obtain .output file for the summary of the result, .aln files which contains the text alignment and .dnd files which contain the distance based information. For detailed understanding of these outputs, kindly visit http://www.ebi.ac.uk/Tools/clustalw2/index.html 1 Master Layout (Part 2) This animation consists of 2 parts: Part 1: Protein Sequence Alignment Part 2: Alignment analysis and interpretations 2 Phylogram representing evolutionary relationships 3 4 5 Structural features that decide function Protein secondary structures 1 Definitions of the components Part 2 – Alignment analysis and interpretations 1. Computational Phylogenetic Predictions: Sequence alignment studies of proteins can reveal the conserved and variable residues between the two sequences. Protein sequences derived from different organisms, but having a high degree of similarity are assumed to be coming from the same ancestor. Such predictions, which can now be carried out computationally with the help of various algorithms, help in providing an insight into evolutionary processes. 2. Phylogram: Phylogram is a pictorial representation that provides a visualization of evolutionary relationships or phylogeny. In this, the length of branches in the tree are considered to be proportional to the evolutionary distance. 3. Cladogram: A Cladogram is another form of pictorial representation that also gives a visual insight into evolutionary relationships or phylogeny. Unlike the phylogram, the branches of a cladogram are of equal length irrespective of the evolutionary distance. 4. Maximum Parsimony: A method used for alignments which show very strong sequence similarity. This is usually applied for less than twelve sequences. 2 3 4 5 Definitions of the components 1 Part 2 – Alignment analysis and interpretations 5. Distance methods: This predicts the evolutionary distance when there is any sequence variation present and can be used on large number of sequences. As the distance between two sequences increases, the uncertainty of the alignment also increases. 6. Maximum likelihood: This method is useful for prediction of evolutionary distance when sequence variability is high. It can be used for alignments with any amount of variability. 7. Protein structure prediction: The three dimensional structure of a protein is largely specified by its amino acid sequence. Protein structures can be predicted with an accuracy of 70-75% when provided with the sequence. 8. Functional annotation: Function(s) of proteins can be predicted for those proteins having a well-described homology. Gene Ontology terms (GO terms) provide a unique identification of the function that the gene is involved in. These functions are categorized at different levels of functional hierarchy. 9. Protein motif: Common patterns of residues in a set of protein sequences is known as a motif. 2 3 4 5 1 Step 1: Phylogenetic analysis from alignment- Input SEQUENCE DATABASE 2 3 Enter a sequence alignment for 2 or more sequences Select a method USED FOR SEQUENCES Seq1 -------------- LLFLFSSAYSRGVFRRDTHK WITH HIGHLY CONSERVED Seq2 MKWVTFISLLFLFSSAYSRGVFRRDAH RESIDUES Seq3 MKWVTFLLLLFVSGSAFSRGVFRREA USED FOR SEQUENCES WITH MODERATELY CONSERVED RESIDUES USED FOR SEQUENCES WITH HIGHLY VARIABLE RESIDUES MAXIMUM PARSIMONY MAXIMUM PARSIMONY DISTANCE METHODS MAXIMUM LIKELIHOOD PHYLOGENETIC ANALYSIS (PHYLIP) 4 5 Action Schematic of the process of analysis of alignment Description of the action Follow the animation steps. Show the description of each of the methods as the mouse hovers over them. Finally select “Maximum Parsimony” method. Downlink after scoring matrix should look like the downlinks seen on web-pages. Audio Narration Multiple sequence alignment produces alignment files (.aln), which can be used to determine the evolutionary distances of a set of given protein sequences. This can be achieved by many server-based and stand-alone programs. The user needs to select the method for calculating the distance. Here we depict the usage of alignment files for phylogenetic analysis. Step 2: Phylogenetic analysis from alignment- Output 1 SEQUENCE DATABASE Enter a sequence alignment for 2 or more sequences 2 5 MAXIMUM PARSIMONY PGFPPLVAPEPDALCAAFQDN DND files givesisthe distance measure of Phylogram a branching depicting evolutionary PNLPRLVRPEVDVMCTAFHDN PKLK-PDPNTLCDEFKADEKKF the Branching aligned sequences from their common diagram depicting relationships or phylogeny. Inevolutionary this, the length of ancestral relationships or phylogeny. branches in the node tree are considered to be PHYLOGENETIC ANALYSIS (PHYLIP) proportional to the evolutionary distance. 3 4 Select a method PHYLOGRAM CLADOGRAM Action Schemati c of the process of analysis of alignmen t DND FILES Description of the action Follow the animation steps. The server on the previous slide gives the following outputs ( seq 1:0.13525, Seq 2:0.09868, seq 3:0.09868); Audio Narration The outputs from the analysis will be Distance file known as the DND file, Cladogram and Phylogram which are evolutionary trees. In the DND file, there is a common node. The values against the sequence are the distance from the common node. DND files give the distance measure of the aligned sequences from their common ancestral node. Cladograms are the graphical representation of the branching during evolution of the proteins that were aligned. Cladograms do not represent the evolutionary distances or the common ancestral node. Phylograms also represent the evolutionary distance tree in a graphical format. In this, the branch lengths correspond to the evolutionary distance between the two proteins. All branches will converge to a common ancestral root. 1 Step 3: Structural and Functional prediction from alignment- Input SEQUENCE DATABASE 2 Enter a sequence alignment for 2 or more sequences Seq 1 Seq 2 Seq 3 PGFPPLVAPEPDALCAAFQDN PNLPRLVRPEVDVMCTAFHDN PKLK-PDPNTLCDEFKADEKKF Range for width of the motifs to be found 6-50 Maximum number of motifs to be found 3 3 Structural and Functional prediction (MeMe server) 4 Action Schematic of the for structural and functional analysis Description of the action Follow the animation steps. Input the alignment. Input the parameters. Click on the server tool. 5 http://meme.sdsc.edu/meme4_4_0/intro.html Audio Narration Alignment files can also be used for a variety of structural and functional analysis. Here we represent the functioning of such programs and servers by taking a simple example of protein motif prediction. The range of the width and the maximum number of motifs to be found are defined by the user. 1 Step 4: Structural and Functional prediction from alignment- Output SEQUENCE DATABASE Enter a sequence alignment for 2 or more sequences 2 Range for width of the motifs to be The color coded diagram shows the found positions of the motifs in the text alignment Maximum number of Block diagram ofPGFPPLVAPEPDALCAAFQDN motif prediction is the PNLPRLVRPEVDVMCTAFHDN motifs to be found of the compared sequences PKLK-PDPNTLCDEFKADEKKF schematic used to visualize the positions and 6-50 3 kinds of motifs in the alignment of two or more sequences Structural and Functional prediction (MeMe server) 3 Residue-wise sites for motifs Color coded block diagram for motifs 4 Action 5 Schematic of the for structural and functional analysis Description of the action Follow the animation steps., The server on the previous slide gives the following outputs http://meme.sdsc.edu/meme4_4_0/intro.html Audio Narration The outputs obtained are 1. Block Diagram of protein motifs, which is the schematic used to visualize the positions and kinds of motifs in the alignment of two or more sequences. The color coding varies from server to server. 2. Sites of the blocks on a residue-by-residue basis. Step 5: Structural and Functional prediction from alignment- Further Analysis 1 Protein Motif 2 Enzyme Active Subtilisn sites Epitope prediction in antigens Finding Enzyme Active Epitope Prediction in Antigen Site 3 4 5 Finding transmembrane domain Finding Transmembrane domains Action Description of the action Identify DNA binding Residues Identify DNA binding residues Audio Narration Once the protein motifs are detected, they can Animator needs to re-draw all the be used for further analysis, such as images shown as they have been retrieved from web-resources. Show 1. Epitope Prediction 2. Active site determination the pie chart. Highlight one quarter 3. Determination of trans-membrane domains of it one at a time and depict the 4. Identification of DNA binding residues diagram next to it along with narrating it. http://qwickstep.com/search/the-active-site-of-an-enzyme.html, http://www.science.uva.nl/research/its/molsim/research/TMsignalling_lizhe/index.html Functions that can be predicted from sequence data https://www.uzh.ch/oci/ssl-dir/group/files/14_roverview.jpg, http://medgadget.com/archives/2008/03/3d_imaging_of_bleomycindna_binding.html 1 2 Interactivity option 1: Find the evolutionary distance between insulin chain A of human and mouse Chose the protein sequences corresponding to insulin A 2. Store the FASTA sequences mentioned against Human and mouse in separate locations 4 Check the.dnd file to find evolutionary distance 8 3 Run the server to obtain output 6. Check for the .aln file and input it into programs for finding Phylogenetic distances such as phylip 7 Input the two sequences in a multiple alignment server 5 Input the term “insulin chain A” in the protein database of your choice 1 Check the source organism for the protein sequence. 3. 4 Interacativity Type 5 Arrange the steps in the order to be performed. Options Remove the step number from the bottom of the tab . Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again” Results All the tabs must be arranged in right order. 1 Interactivity option 2.a : Match the following 2 3 PAM MATRIX SIMILARITY BASED SCORING MATRIX DOMAIN IDENTIFICATION EVOLUTIONARY TREE PHYLOGRAM MEASURE OF BIOLOGICAL SIGNIFICANCE BIT SCORE DISTANCE BASED SCORING MATRIX E-VALUE MEASURE OF QUALITY OF ALIGNMENT, NORMALIZED ACCORDING TO SCORING MATRIX BLOSUM MATRIX BLAST RESULT LINKED TO PFAM 4 Interacativity Type 5 Match the left column to the right Options Match the meaning of the parameter on the right to the name of the parameter on the left. If the matching is correct, turn the tab green, else flash “Try Again” Results Results on next slide 1 Interactivity option 2.b : Match the following 2 PAM MATRIX SIMILARITY BASED SCORING MATRIX DOMAIN IDENTIFICATION BLAST RESULT LINKED TO PFAM PHYLOGRAM EVOLUTIONARY TREE MEASURE OF QUALITY OF ALIGNMENT, NORMALIZED ACCORDING TO SCORING MATRIX MEASURE OF BIOLOGICAL SIGNIFICANCE BIT SCORE 3 E-VALUE BLOSUM MATRIX DISTANCE BASED SCORING MATRIX 4 Interacativity Type 5 Match the left column to the right Options Match the meaning of the parameter on the right to the name of the parameter on the left. If the matching is correct, turn the tab green, else flash “Try Again” Boundary/limits Results Correct Matching 1 Questionnaire 1. Which is a scoring matrix based on distantly related proteins? Answers: a) PAM 2 b)BLOSUM 2. Which parameter signifies whether the match between two sequences is a chance alignment? Answers: a) word-length 3 b) e-value c) dot-plot d) none 3. Which evolutionary tree has the branch length corresponding to the evolutionary distances? Answers: a) Phylogram 4 c) Both d) None b)Cladogram c) both d) none 4. Which is NOT a ClustalW output file extension? Answers: a) .dnd b) .txt c) .aln d) .output 5. Phylogenetic method for most variable sequence is 5 Answers: a) Distance method b) Maximum Distance c) Maximum Parsimony d) Maximum Likelihood Links for further reading Reference websites: http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ebi.ac.uk/Tools/clustalw2/index.html http://www.pdb.org/pdb/home/home.do http://expasy.org/sprot/ http://expasy.org/prosite/ http://pfam.sanger.ac.uk/ http://www.psc.edu/general/software/packages/phylip/ Links for further reading Following URLs are used for animations http://www.ncbi.nlm.nih.gov/ http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html http://pfam.sanger.ac.uk/ http://www.ebi.ac.uk/Tools/es/cgi-bin/clustalw2/ http://meme.sdsc.edu/meme4_4_0/intro.html http://www.ebi.ac.uk/Tools/clustalw2/index.html http://qwickstep.com/search/the-active-site-of-an-enzyme.html http://www.science.uva.nl/research/its/molsim/research/TMsignalling_lizhe/index.ht ml https://www.uzh.ch/oci/ssl-dir/group/files/14_roverview.jpg http://medgadget.com/archives/2008/03/3d_imaging_of_bleomycindna_binding.html Links for further reading Books: Bioinformatics Sequence and Genome Analysis by David Mount