Download 02_Murray - Sbkb.org

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein wikipedia , lookup

Protein purification wikipedia , lookup

Proteomics wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

List of types of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Structural alignment wikipedia , lookup

ATP-binding cassette transporter wikipedia , lookup

Protein–protein interaction wikipedia , lookup

SNARE (protein) wikipedia , lookup

Western blot wikipedia , lookup

Protein structure prediction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

P-type ATPase wikipedia , lookup

Homology modeling wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Protein domain wikipedia , lookup

Transcript
How to use computational tools to maximize the coverage of
protein sequence/structure/function space
PSI Bottlenecks
1) Not enough connection between modeling and biology/experiment
2) “Modelability” not used in defining families or a dynamic target selection strategy
3) Incomplete use of functional information in model building
Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong Lee, Frank Indiviglio, Janey Li
Honig Lab: Markus Fischer and Donald Petrey
Phosphoinositide signaling processes
denotes
a phosphoinositide
headgroup
Biophysical properties of cellular protein/membrane interactions
Intracellular membranes contain
distinct lipid compositions and
carry different charge densities
Binding behavior of a +8e peptide
to membranes carrying
different negative charge densities
Proteins that function in phosphoinositide pathways contain
multiple membrane binding motifs
Motif 1
Motif 2
C1/DAG
C2/Ca2+
Protein kinase C–,,
PH/PIP2
C2/Ca2+
Phospholipase C–
PH/PIP2
PX/PI3P
Phospholipase D
FYVE/PI3P
PH/PI
FGD1(a Rho/Rac GEF)
Basic/PS
PH/PIP2
GPCR kinase
C2/Ca2+
Nonpolar
Cytosolic phospholipase A2
ENTH/PIP2
Prot/prot
Epsin1, AP180
Myristate
Basic/PS
Src, MARCKS, (HIV-1 Gag)
Multiple inputs: Temporal and spatial control of subcellular targeting
through coincidence counting
Many peripheral proteins, especially those involved in subcellular
targeting , are either highly basic or charge polarized.
+25 mV
-25 mV
Quantitative physical theory for the interaction of proteins with membrane surfaces
Connection among biophysical properties,
membrane binding behavior, and subcellular localization
No calcium
Calcium
Phospholipase C C2 domains
Homology models of all isoforms
5-lipoxygenase C2 domain
Homology model
Structural genomics and proteomics-level studies of lipid-interacting domains:
Northeast Structural Genomics and Arabidopsis 2010
Apply what we have learned to whole families
BAR domains
C1 domains
C2 domains
ENTH domains
FERM domains
FYVE domains
GRAM domains
PDZ domains
PH domains
PHD domains
PX domains
Sec14 domains
START domains
VHS domains
High-throughout comparative modeling: Leverage structure information
All lipid-binding domains in all model genomes
Use what we have learned computationally and experimentally to develop:
1. More complete lists of peripheral proteins of known structure from the PDB;
2. Detect and model all instances of peripheral proteins in sequence databases;
3. Discover new instances, novel functionalities, new families;
4. Create databases to house this information;
5. Use this information to annotate protein sequences of unknown function.
SkyLine: High-throughput comparative modeling
“Modelability”: Create “reliable” models using known structures as templates
Nebojsa Mirkovic
Proteins 66:766
PDB Structure
Secondary structure
DSSP
ClustalW
Sequence
Multiple
alignments
PSI-BLAST
Modeling alignments
Homologues
Target
reprioritization
MarkUs: Function
annotation
Homologous structures
Family analysis
Modeller or Nest
Data on homologues
Specialized
databases
Models
(species, IDs, coverage,
length, e-value, seq. is.)
Non-redundant
& unsolved
PROSA, pG score
Model quality
pG > 0.7
Leverage: unique models
Web-accessible
models database
NESG Models Database Frank Indiviglio
Hunjoong Lee
Models Database: http://156.145.102.40/nesg3/nesg.php
“Leverage”: Number and quality of 3D models produced from a set of structures as templates
PSI1 and PSI2: NESG leverage ~220 sequence unique models
Alternative models based on different PDB templates,
reliability measures and sequence coverage
Additional search mechanisms:
Expand methodology to the entire PDB, create specialized family and genome databases
C2 domains from phospholipase C isoforms:
Comparative functionality
Kd
2.3x10-9 M
Kd
2.6x10-9 M
C2 domains from phospholipase C isoforms:
Comparative functionality
Kd
8.9x10-8 M → 6.2x10-9 M
4.0x10-8 M
Differences between d1 and d4:
Detection of specificity determinants leads to hypotheses for differential regulation
Kd
2.3x10-9 M
Kd
8.9x10-8 M → 6.2x10-9 M
Whole family modeling: FYVE domains
FYVE domain family: Electrostatic properties of models correlate
with in vitro binding measurements and subcellular localization:
Comparison of different members
FYVE domain family: Electrostatic properties of models correlate
with in vitro binding measurements and subcellular localization:
Residue substitution of a single family member
Dynamic target re-prioritization is an important strategy
Model/Computation
Experiment
Structure
There is no straightforward prescription: Each family has to be dealt with individually
“Modelability”: Create “reliable” models using known structures as templates
START domain leverage
Modelability (7378) versus 30% sequence identity (2767)
409
395
83
410
36
35
171
341
86
29
16
54
78
356
134
71
63
Collaborations with Experimental Groups
Characterize different START domains based on structural information
Discriminate whether START domains bind cholesterol or PC (PI) or other ligands
Provide leads for chemical library studies for function-interfering compounds
Detailed computational analysis and function annotation
Fine-grain structure analysis in the absence and presence of potential ligand
Experimental characterization: Protein production, SPR analysis, cellular studies
Cho Lab: High-throughput analysis of
Human and Arabidopsis START domains
Clark Lab: Docking studies of ubiquinone into
nematode START domain, electron transport
START domains in the Arabidopsis thaliana genome
SkyLine produces quality models for 58 non-redundant sequences
versus
35 Arabidopsis START domains detected by sequence searches (Genome Biology 5:R41)
Key Findings (Tonya Silkov)
1. 45 sequences are of the Birch antigen class
2. Two sequences correspond to AHA1 domains (Activator of Hsp90 ATPase)
SCOP classifies AHA domains as belonging to the Birch antigen superfamily
3. Two sequences predicted in databases as integral membrane proteins of unknown function
4. Five sequences for related models apparently represent a group of uncharacterized plant
START domains
Fig. 1
Cross-genomic studies
Structure similarity among lipid-binding domains
PIP2
PIP2
ENTH domain
ANTH domain
VHS domain
Tonya Silkov
ENTH and ANTH: similar topology, different membrane binding mechanism
Helix 0
J Biol Chem. 278:28993
with Cho Lab
ENTH
ANTH
ANTH
ENTH
Arabidopsis domain with novel dual ENTH and ANTH functionality
Tonya Silkov
ENTH
ANTH
From above
ENTH
Helix 0
Helix 0
ANTH
Cho Lab: First 25 amino acids
are required for both PIP2
binding and membrane
penetration.
Produce enough protein to
obtain crystals.
Fig. 1
A novel functional subclass of VHS domains
ENTH domain
ANTH domain
VHS domain
Tonya Silkov
A new VHS-related family, “VR domains”, found in other genomes
KIAA1530
(Homo sapiens)
XP_420852
(Gallus gallus)
CAB71110
(Arabidopsis thaliana)
XP_747424
(Strongylocentrotus purpuratus)
Tonya Silkov
Among this subset of VHS domains, the basic surface patch is conserved
Hypothesis: It constitutes a phosphoinositide-specific binding site
VR domain family of membrane-binding VHS domains
Human and Arabidopsis constructs are being examined in the Cho lab
Tonya Silkov
The ability to construct a quality model of a sequence is a more strategic
definition of a protein family member
Allows for the discovery of distantly related members
With function annotation, allows for the discovery of new sub-groups
Structures + Sequences -> Models + Function annotation (Markus)
More comprehensive coverage of protein sequence/structure/function space
By constantly updating resources as new information becomes available,
we produce a more relevant (dynamic) target selection strategy