Download Detecting Protein Function and Protein

Summary of “Detecting Protein Function and Protein-Protein Interactions from Genome Sequences,” a Paper by E. Marcotte [1] By Ben Boral and Uriel Brener Goal of Paper  Determine if protein function and protein-protein interactions be identified computationally from genome sequence. How?  For all proteins of species A, search another species’ genome and identify where two nonhomologous proteins of species A each have different homology with a single protein of species B. Why the method works    Fused Proteins A and B on a single polypeptide have increased affinity than unfused. At some point the proteins broke off the same polypeptide, but because of their previous affinity for one another, they now interact. The interfaces between two linked protein domains has been shown to be very similar to that of two separate, interacting proteins. Confirmation Test  How to confirm that these two proteins actually interact: ◦ Test 1: Domain Fusion Analysis  Search annotations from SWISS-PROT protein database for common keywords between proteins. ◦ Test 2: Database of Interacting Proteins  Search database to see if proposed pairs already exist in literature (from lab experiments). ◦ Test 3: Phylogeny Analysis  Based on analyzing evolution of proteins, identify possible interacting pairs computationally. Test 1: Domain Fusion Analysis Each protein in SWISS-PROT protein database has a set of annotations describing it.  Search annotations of each protein in a proposed pair. If the same words are used in the annotations for each protein, the proteins probably share a function.  Test 2: Database of Interacting Proteins Database consists of pairs of proteins that have been determined to interact by a laboratory experiment.  Search the database to see if the proposed interacting proteins have already been experimentally shown to interact.  Test 3: Phylogenetic Analysis Phylogenetics is the field of evolutionary relatedness.  Phylogenetic analysis can predict sets of interacting proteins.  Compare phylogenetic analysis predictions to the proposed protein pairs.  Actual Results: Finding Possible Pairs 4290 proteins in E. coli compared to other species.  6809 possible pairs of interacting proteins (there are 9x106 possible pairs).  Actual Results: Domain Fusion Analysis Only 3950 of the possible pairs had both proteins in the database with known function.  Of 3950, 2682 pairs share keywords (68%) in the annotations.  Compared to 15% when two E. coli proteins are selected at random.  Actual Results: Database of Interacting Proteins  Of 724 pairs in the database, 46 (6.4% of database) are proposed protein pairs. Actual Results: Phylogenetic Analysis Phylogenetic Analysis performed on the 6009 proposed pairs. Of these, 321 (5%) were proposed by phylogenetic analysis to interact.  This is 8 times the percentage of predicted pairs from a set of random proteins.  Identifying Protein Pathways Determine pathways by ordering pairs of interacting proteins.  If A interacts with B, and B interacts with C, then the pathway ABC can be proposed.  Identifying Protein Pathways  Examples: shikimate synthesis (left and center top) and purine synthesis (center bottom and right). Source: [1] Identifying Protein Pathways  Not all pathways are obvious from the pairings. ◦ For example, the first protein could be paired with the fourth protein.  Explanation: ◦ Large groupings of interconnected proteins could be part of some multienzyme complex. Error Detection Part 1  Reasons for getting false negatives (two proteins that physically interact are not found). ◦ Interactions that develop from mechanism other than fusion. Example: gradual mutations lead to the evolution of a binding site. ◦ Loss of the ancestral protein over the course of evolution. Error Detection Part 2  Reasons for getting false positives (two paired proteins do not interact). ◦ The two domains were found in the fusion protein, but they do not physically interact. ◦ Some domains interact in some instances but not in others. Example: SH2 and SH3 domains interact in some proteins but not in others. Minimizing Error Identify “promiscuous” domains that are present in many proteins and interact with many other domains.  Removing the top 5% promiscuous proteins drastically reduces the rate of false positives.  Citation  [1] E. Marcotte et. al., Science 285, 751-753 (1999)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Detecting Protein Function and Protein