Download Document

Document related concepts
no text concepts found
Transcript
Protein Interaction Networks
Feb. 21, 2013
Aalt-Jan van Dijk
Applied Bioinformatics, PRI, Wageningen UR
& Mathematical and Statistical Methods,
Biometris, Wageningen University
[email protected]
My research
•
Protein complex structures
– Protein-protein docking
– Correlated mutations
•
Interaction site
prediction/analysis
–
–
–
•
Protein-protein interactions
Enzyme active sites
Protein-DNA interactions
Network modelling
–
–
Gene regulatory networks
Flowering related
Overview
•
•
•
•
•
Introduction: protein interaction networks
Sequences & networks: predicting interaction sites
Predicting protein interactions
Sequence and network evolution
Interaction network alignment
Protein Interaction Networks
hemoglobin
Obligatory
Protein Interaction Networks
hemoglobin
Obligatory
Mitochondrial Cu transporters
Transient
Experimental approaches (1)
Yeast two-hybrid (Y2H)
Experimental approaches (2)
Affinity Purification + mass spectrometry (AP-MS)
Interaction Databases
• STRING http://string.embl.de/
Interaction Databases
Interaction Databases
• STRING http://string.embl.de/
• HPRD http://www.hprd.org/
Interaction Databases
Interaction Databases
• STRING http://string.embl.de/
• HPRD http://www.hprd.org/
• MINT http://mint.bio.uniroma2.it/mint/
Interaction Databases
Interaction Databases
•
•
•
•
STRING http://string.embl.de/
HPRD http://www.hprd.org/
MINT http://mint.bio.uniroma2.it/mint/
INTACT http://www.ebi.ac.uk/intact/
Interaction Databases
Interaction Databases
•
•
•
•
•
STRING http://string.embl.de/
HPRD http://www.hprd.org/
MINT http://mint.bio.uniroma2.it/mint/
INTACT http://www.ebi.ac.uk/intact/
BIOGRID http://thebiogrid.org/
Interaction Databases
Some numbers
Organism
Number of
known interactions
H. Sapiens
113,217
S. Cerevisiae
75,529
D. Melanogaster 35,028
A. Thaliana
13,842
M. Musculus
11,616
Biogrid (physical interactions)
Overview
• Introduction: protein interaction networks
• Sequences & networks: predicting interaction
sites
• Predicting protein interactions
• Sequence and network evolution
• Interaction network alignment
Binding site
Binding site prediction
Applications:
Binding site prediction
Applications:
• Understanding network evolution
• Understanding changes in protein function
• Predict protein interactions
• Manipulate protein interactions
Binding site prediction
Applications:
• Understanding network evolution
• Understanding changes in protein function
• Predict protein interactions
• Manipulate protein interactions
Input data:
• Interaction network
• Sequences (possibly structures)
Sequence-based predictions
Sequences and networks
• Goal: predict interaction sites and/or motifs
Sequences and networks
• Goal: predict interaction sites and/or motifs
• Data: interaction networks, sequences
Sequences and networks
• Goal: predict interaction sites and/or motifs
• Data: interaction networks, sequences
• Validation: structure data, “motif databases”
Motif search in groups of proteins
• Group proteins which have same interaction partner
• Use motif search, e.g. find PWMs
Neduva Plos Biol 2005
Correlated Motifs
Correlated Motifs
• Motif model
• Search
• Scoring
Predefined motifs
Predefined motifs
Predefined motifs
Predefined motifs
Predefined motifs
Correlated Motif Mining
Find motifs in one set of proteins which interact with
(almost) all proteins with another motif
Correlated Motif Mining
Find motifs in one set of proteins which interact with
(almost) all proteins with another motif
Motif-models:
• PWM – so far not applied
• (l,d) with l=length, d=number of wildcards
Score: overrepresentation, e.g. χ2
Correlated Motif Mining
Find motifs in one set of proteins which interact with
(almost) all proteins with another motif
Search:
• Interaction driven
• Motif driven
Interaction driven approaches
Mine for (quasi-)bicliques  most-versus-most
interaction
Then derive motif pair from sequences
Motif driven approaches
Starting from candidate motif pairs, evaluate their
support in the network (and improve them)
D-MOTIF
Tan BMC Bioinformatics 2006
IMSS: application of D-MOTIF
protein
Y
Test error
protein
X
Number of selected motif pairs
Van Dijk et al., Bioinformatics 2008
Van Dijk et al., Plos Comp Biol 2010
Experimental validation
protein
Y
Test error
protein
X
Number of selected motif pairs
Van Dijk et al., Bioinformatics 2008
Van Dijk et al., Plos Comp Biol 2010
Experimental validation
protein
Y
Test error
protein
X
Number of selected motif pairs
Van Dijk et al., Bioinformatics 2008
Van Dijk et al., Plos Comp Biol 2010
Experimental validation
protein
Y
Test error
protein
X
Number of selected motif pairs
Van Dijk et al., Bioinformatics 2008
Van Dijk et al., Plos Comp Biol 2010
SLIDER
Boyen et al. Trans Comp Biol Bioinf 2011
SLIDER
• Faster approach, enabling genome wide search
• Scoring: Chi2
• Search: steepest ascent
Validation
• Performance assessment on simulated data
• Performance assessment using using protein structures
Extensions of SLIDER
Extension I: better coverage of network
Boyen et al. Trans Comp Biol Bioinf 2013
Extensions of SLIDER
Extension I: better coverage of network
Extension II: use of more biological information
bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
accessibility
bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
accessibility
Thresholds for conservation and accessibility
Extension of motif model: amino acid similarity (BLOSUM)
bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
Interaction-coverage
accessibility
Using human and yeast data for
training and optimizing parameters
0.5
0.4
0.3
0.2
No conservation, no accessibility
Conservation and accessibility
0.1
0.0
0.0
0.0
0.3
0.3
0.6
0.6
Motif-accuracy
Leal Valentim et al., PLoS ONE 2012
Application to Arabidopsis
Input data: 6200 interactions, 2700 proteins
Interface predictions for 985 proteins (on average 20 residues)
Arabidopsis Interactome Mapping
Consortium, Science 2011
Ecotype sequence data (SNPs)
SNPs tend to ‘avoid’ predicted binding sites
In 263 proteins there is a SNP in a binding site
 these proteins are much more connected to each other
than would be randomly expected
Summary
• Prediction of interaction sites using protein
interaction networks and protein sequences
• Correlated motif approaches
Overview
•
•
•
•
•
Introduction: protein interaction networks
Sequences & networks: predicting interaction sites
Predicting protein interactions
Sequence and network evolution
Interaction network alignment
Protein Interaction Prediction
Lots of genomes are being sequenced…
(www.genomesonline.org)
ARCHAEA
BACTERIA
EUKARYA
TOTAL
Complete
182
3767
183
4132
Incomplete
264
14393
2897
17514
Protein Interaction Prediction
Lots of genomes are being sequenced…
(www.genomesonline.org)
ARCHAEA
BACTERIA
EUKARYA
TOTAL
Complete
182
3767
183
4132
Incomplete
264
14393
2897
17514
But how do we know how the proteins in there work together?!
Protein Interaction Prediction
• Interactions of orthologs: interologs
• Phylogenetic profiles
A1 0 1 1 0 0 1
B1 0 1 1 0 0 1
• Domain-based predictions
Orthology based prediction
Orthology based prediction
Phylogenetic profiles
A 1 0 1 1 0 0 1
C 1 0 1 1 1 0 1
B 1 0 1 1 1 0 1
D 0 1 0 1 0 0 1
Domain Based Predictions
Domain Based Predictions
Overview
•
•
•
•
•
Introduction: protein interaction networks
Sequences & networks: predicting interaction sites
Predicting protein interactions
Sequence and network evolution
Interaction network alignment
Duplications
Duplications and interactions
Gene duplication
Duplications and interactions
Gene duplication
Duplications and interactions
Gene duplication
0.001 Myear-1
Interaction loss
0.1 Myear-1
Duplications and interaction loss
Duplicate pairs share interaction partners
Interaction network evolution
Science 2011
Overview
•
•
•
•
•
Introduction: protein interaction networks
Sequences & networks: predicting interaction sites
Predicting protein interactions
Sequence and network evolution
Interaction network alignment
Network alignment
Local Network Alignment: find multiple, unrelated regions of
Isomorphism
Global Network Alignment: find the best overall alignment
PATHBLAST
Kelley, PNAS 2003
PATHBLAST: scoring
homology
interaction
Kelley, PNAS 2003
PATHBLAST: results
Kelley, PNAS 2003
PATHBLAST: results
For yeast vs H.pylori, with L=4, all resulting paths with p<=0.05 can
be merged into just five network regions
Kelley, PNAS 2003
Multiple alignment
Scoring: Probabilistic model for interaction subnetworks
Sub-networks: bottom-up search, starting with exhaustive
search for L=4; followed by local search
Sharan PNAS 2005
Multiple alignment: results
Sharan PNAS 2005
Multiple alignment: results
Applications include protein function prediction
and interaction prediction
Sharan PNAS 2005
Global alignment
Singh PNAS 2008
Global alignment
Singh PNAS 2008
Global alignment
Alignment: greedy selection of matches
Singh PNAS 2008
Network alignment: the future?
Sharan & Ideker Nature Biotech 2006
Summary
• Interaction network evolution: mostly
“comparative”, not much mechanistic
• Approaches exist to integrate and model network
analysis within context of phylogeny (not
discussed)
• Outlook: combine interaction site prediction with
network evolution analysis
Exercises
The datafiles “arabidopsis_proteins.lis” and
“interactions_arabidopsis.data” contain Arabidopsis MADS
proteins (which regulate various developmental
processes including flowering), and their mutual
interactions, respectively.
SOC1
AGL24
Exercise 1
• Start by getting familiar with the basic Cytoscape
features described in section 1 of the tutorial
http://opentutorials.cgl.ucsf.edu/index.php/Tutori
al:Introduction_to_Cytoscape
• Load the data into Cytoscape
• Visualize the network and analyze the number of
interactions per proteins – which proteins do have
a lot of interactions?
Exercise 2
Write a script that reads interaction data and
implements a datastructure which enables further
analysis of the data (see setup on next slides).
Use the datafiles “arabidopsis_proteins.lis” and
“interactions_arabidopsis.data” and let the script print a
table in the following format:
PROTEIN Number_of_interactions
Make a plot of those data
#two subroutines
#input: filename
#output: list with content of file
sub read_list {
my $infile=$_[0];
YOUR CODE
return @newlist;
}
#input: protein list and interaction list
#output: hash with “proteins”  list of their partners
sub combine_prot_int($$) {
my ($plist,$intlist) = @_;
YOUR CODE
return %inthash;
}
#reading input data
my @plist= read_list($ARGV[0]);
my @intlist= read_list($ARGV[1]);
#obtaining hash with interactions
%inthash=combine_prot_int(\@plist,\@intlist);
YOUR CODE
#loop over all proteins and print their name and their
number of interactions
Exercise 3
In “orthology_relations.data” we have a set of predicted
orthologs for the Arabidopsis proteins from
exercise 1. “protein_information.data” describes a.o. from
which species these proteins are. Finally,
“interactions.data“ contains interactions between those
proteins.
Use the Arabidopsis interaction data from exercise 1
to “predict” interactions in other species using the
orthology information. Compare your predictions
with the real interaction data and make a plot that
visualizes how good your predictions are.
Related documents