Download Detecting Protein Function and Protein

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Rosetta@home wikipedia , lookup

Ubiquitin wikipedia , lookup

Structural alignment wikipedia , lookup

Protein design wikipedia , lookup

Homology modeling wikipedia , lookup

Circular dichroism wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Protein folding wikipedia , lookup

Protein structure prediction wikipedia , lookup

List of types of proteins wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein wikipedia , lookup

Protein domain wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein purification wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cyclol wikipedia , lookup

Proteomics wikipedia , lookup

Western blot wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Summary of “Detecting Protein
Function and Protein-Protein
Interactions from Genome Sequences,”
a Paper by E. Marcotte [1]
By
Ben Boral and Uriel Brener
Goal of Paper

Determine if protein function and
protein-protein interactions be identified
computationally from genome sequence.
How?

For all proteins of species A, search
another species’ genome and identify
where two nonhomologous proteins of
species A each have different homology
with a single protein of species B.
Why the method works



Fused Proteins A and B on a single
polypeptide have increased affinity than
unfused.
At some point the proteins broke off the
same polypeptide, but because of their
previous affinity for one another, they now
interact.
The interfaces between two linked protein
domains has been shown to be very similar
to that of two separate, interacting proteins.
Confirmation Test

How to confirm that these two proteins
actually interact:
◦ Test 1: Domain Fusion Analysis
 Search annotations from SWISS-PROT protein
database for common keywords between proteins.
◦ Test 2: Database of Interacting Proteins
 Search database to see if proposed pairs already
exist in literature (from lab experiments).
◦ Test 3: Phylogeny Analysis
 Based on analyzing evolution of proteins, identify
possible interacting pairs computationally.
Test 1: Domain Fusion Analysis
Each protein in SWISS-PROT protein
database has a set of annotations
describing it.
 Search annotations of each protein in a
proposed pair. If the same words are used
in the annotations for each protein, the
proteins probably share a function.

Test 2: Database of Interacting Proteins
Database consists of pairs of proteins that
have been determined to interact by a
laboratory experiment.
 Search the database to see if the
proposed interacting proteins have
already been experimentally shown to
interact.

Test 3: Phylogenetic Analysis
Phylogenetics is the field of evolutionary
relatedness.
 Phylogenetic analysis can predict sets of
interacting proteins.
 Compare phylogenetic analysis
predictions to the proposed protein pairs.

Actual Results: Finding Possible Pairs
4290 proteins in E. coli compared to other
species.
 6809 possible pairs of interacting proteins
(there are 9x106 possible pairs).

Actual Results: Domain Fusion Analysis
Only 3950 of the possible pairs had both
proteins in the database with known
function.
 Of 3950, 2682 pairs share keywords
(68%) in the annotations.
 Compared to 15% when two E. coli
proteins are selected at random.

Actual Results: Database of Interacting
Proteins

Of 724 pairs in the database, 46 (6.4% of
database) are proposed protein pairs.
Actual Results: Phylogenetic Analysis
Phylogenetic Analysis performed on the
6009 proposed pairs. Of these, 321 (5%)
were proposed by phylogenetic analysis
to interact.
 This is 8 times the percentage of
predicted pairs from a set of random
proteins.

Identifying Protein Pathways
Determine pathways by ordering pairs of
interacting proteins.
 If A interacts with B, and B interacts with
C, then the pathway ABC can be
proposed.

Identifying Protein Pathways

Examples: shikimate synthesis (left and
center top) and purine synthesis (center
bottom and right).
Source: [1]
Identifying Protein Pathways

Not all pathways are obvious from the
pairings.
◦ For example, the first protein could be paired
with the fourth protein.

Explanation:
◦ Large groupings of interconnected proteins
could be part of some multienzyme complex.
Error Detection Part 1

Reasons for getting false negatives (two
proteins that physically interact are not
found).
◦ Interactions that develop from mechanism
other than fusion. Example: gradual mutations
lead to the evolution of a binding site.
◦ Loss of the ancestral protein over the course
of evolution.
Error Detection Part 2

Reasons for getting false positives (two
paired proteins do not interact).
◦ The two domains were found in the fusion
protein, but they do not physically interact.
◦ Some domains interact in some instances but
not in others. Example: SH2 and SH3 domains
interact in some proteins but not in others.
Minimizing Error
Identify “promiscuous” domains that are
present in many proteins and interact
with many other domains.
 Removing the top 5% promiscuous
proteins drastically reduces the rate of
false positives.

Citation

[1] E. Marcotte et. al., Science 285, 751-753
(1999)