Download B. comparative modelling A. sequence space

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Modelling Proteomes
Ram Samudrala
University of Washington
Rationale for understanding protein structure and function
Protein sequence
-large numbers of
sequences, including
whole genomes
?
Protein function
- rational drug design and treatment of disease
- protein and genetic engineering
- build networks to model cellular pathways
- study organismal function and evolution
structure determination
structure prediction
Protein structure
- three dimensional
- complicated
- mediates function
homology
rational mutagenesis
biochemical analysis
model studies
Protein folding
…-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…
DNA
protein sequence
…-L-K-E-G-V-S-K-D-…
one amino acid
unfolded protein
spontaneous self-organisation
(~1 second)
native state
not unique
mobile
inactive
expanded
irregular
Protein folding
…-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…
DNA
protein sequence
…-L-K-E-G-V-S-K-D-…
one amino acid
unfolded protein
spontaneous self-organisation
(~1 second)
native state
not unique
mobile
inactive
expanded
irregular
unique shape
precisely ordered
stable/functional
globular/compact
helices and sheets
Ab initio prediction of protein structure
sample conformational space such that
native-like conformations are found
select
hard to design functions
that are not fooled by
non-native conformations
(“decoys”)
astronomically large number of conformations
5 states/100 residues = 5100 = 1070
Semi-exhaustive segment-based folding
EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK
generate
…
fragments from database
14-state f,y model
…
minimise
…
monte carlo with simulated annealing
conformational space annealing, GA
…
filter
all-atom pairwise interactions, bad contacts
compactness, secondary structure
Ab initio prediction at CASP
Before CASP (BC):
“solved”
(biased results)
CASP1: worse than
random
CASP2: worse than
random with one
exception
CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues
CASP4: consistently predicted correct topology - ~4-6.0 A for 60-80+ residues
**T97/er29 – 6.0 Å (80 residues; 18-97)
*T98/sp0a – 6.0 Å (60 residues; 37-105)
**T102/as48 – 5.3 Å (70 residues; 1-70)
**T106/sfrp3 – 6.2 Å (70 residues; 6-75)
**T110/rbfa – 4.0 Å (80 residues; 1-80)
*T114/afp1 – 6.5 Å (45 residues; 36-80)
Comparative modelling of protein structure
scan
align
de novo simulation
…
KDHPFGFAVPTKNPDGTMNLMNWECAIP
KDPPAGIGAPQDN----QNIMLWNAVIP
** * *
* *
* * *
**
build initial model
minimum perturbation
refine
physical functions
…
construct non-conserved
side chains and main chains
graph theory, semfold
A graph theoretic representation of protein structure
-0.6 (V1)
represent
residues
as nodes
-0.5 (I)
-0.9 (V2)
weigh
nodes
-0.7 (K)
-1.0 (F)
construct
graph
-0.6 (V1)
-0.5 (I)
W = -4.5
-0.1
-0.3
-1.0 (F)
-0.9 (V2)
-0.1
-0.2
-0.7 (K)
find cliques
-0.5 (I)
-0.1
-0.3
-1.0 (F)
-0.9 (V2)
-0.1
-0.2
-0.7 (K)
-0.2
Comparative modelling at CASP
alignment
side chain
short loops
longer loops
BC
CASP1
CASP2
CASP3
CASP4
excellent
~ 80%
1.0 Å
2.0 Å
poor
~ 50%
~ 3.0 Å
> 5.0 Å
fair
~ 75%
~ 1.0 Å
~ 3.0 Å
fair
~75%
~ 1.0 Å
~ 2.5 Å
fair
~75%
~ 1.0 Å
~ 2.0 Å
CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity
**T128/sodm – 1.0 Å (198 residues; 50%)
**T111/eno – 1.7 Å (430 residues; 51%)
**T122/trpa – 2.9 Å (241 residues; 33%)
**T125/sp18 – 4.4 Å (137 residues; 24%)
**T112/dhso – 4.9 Å (348 residues; 24%)
**T92/yeco – 5.6 Å (104 residues; 12%)
Prediction for Invb using de novo fold recognition
Computational aspects of structural genomics
A. sequence space
B. comparative modelling
*
*
C. fold recognition
*
*
*
*
*
*
*
*
E. target selection
D. ab initio prediction
F. analysis
*
*
*
*
*
*
*
*
*
*
*
*
*
*
targets
(Figure idea by Steve Brenner.)
Computational aspects of functional genomics
structure based methods
microenvironment analysis
G. assign function
*
structure comparison
*
*
*
*
zinc binding site?
homology
+
sequence based methods
sequence comparison
motif searches
phylogenetic profiles
domain fusion analyses
+
experimental data
*
function?
assign function to
entire protein space
Bioverse – explore relationships among molecules and systems
http://bioverse.compbio.washington.edu
Jason Mcdermott
Bioverse – human protein-protein interaction network
Jason Mcdermott/Zach Frazier
Bioverse – mapping pathways on networks
Inisitol phosphate metabolism
Benzoate degradation
Sphningoglycolipid metabolism
Starch/sucrose metabolism
Nicotinate and nicotinamide metabolism
Jason Mcdermott
Take home message
Prediction of protein structure and function can
be used to model whole genomes to understand
organismal function and evolution
Acknowledgements
Group members
Related documents