Download Where can we find disordered proteins?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Magnesium transporter wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

SR protein wikipedia , lookup

Signal transduction wikipedia , lookup

Drug design wikipedia , lookup

Ligand binding assay wikipedia , lookup

Metabolism wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Metalloprotein wikipedia , lookup

Biosynthesis wikipedia , lookup

Structural alignment wikipedia , lookup

Genetic code wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein wikipedia , lookup

Interactome wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Anthrax toxin wikipedia , lookup

Transcript
Intrinsically disordered proteins
Zsuzsanna Dosztányi
EMBO course
Budapest, 3 June 2016
IDPs
Intrinsically disordered proteins/regions
(IDPs/IDRs)
 Do not adopt a well-defined structure in
isolation under native-like conditions
 Highly flexible ensembles
 Functional proteins
 Involved in various diseases

Ordered structures from the PDB
Over 100000 PDB structures
Not everything in the PDB is ordered
Cofactors, complex, DNA-RNA, crystal contacts
Where can we find disordered proteins?
In the literature
Failed attempts to crystallize
Lack of NMR signals
Heat stability
Protease sensitivity
Increased molecular volume
“Freaky” sequences …
Disprot database:
www.disprot.org
Where can we find disordered proteins?
In the PDB
tegniddsliggnasaegpegegtestv
Missing electron density regions from
the PDB
NMR structures with large structural
variations
p53 tumor suppressor
TAD
disordered
DBD
ordered
TD
RD
disordered
Wells et al. PNAS 2008; 105: 5762
Funnels
Flock et al Curr Opin Struct Biol. 2014; 26:62
Sequence properties of IDPs



Amino acid compositional bias
High proportion of polar and charged amino acids (Gln,
Ser, Pro, Glu, Lys)
Low proportion of bulky, hydrophobhic amino acids (Val,
Leu, Ile, Met, Phe, Trp, Tyr)

Low sequence complexity

Signature sequences identifying disordered proteins
Protein disorder is encoded in the amino acid sequence
TDVEAAVNSLVNLYLQAS
YLS
How can we discriminate ordered and disordered regions ?
Prediction: classification problem
Input
1.
2.
3.
4.
sequence
propensity vector
alignment (profile)
interaction energies
Output (property)
1. binary
2. score
Method
1. statistical methods
2. machine learning
3. structural approach
Training/Assessment
1. DisProt
2. PDB
DISOPRED2
…..AMDDLMLSPDDIEQWFTED…..
Trained in missing residues
from X-ray structures
SVM with linear kernel
Assign label: D or O
F(inp)
D
Ward et al. J.Mol. Biol. 2004; 337:653
O
IUPred
Globular
proteins form many favorable interactions to
ensure the stability of the structure
Disordered
protein cannot form enough favourable
interactions
Energy estimation method
Based on globular proteins
No training on disordered proteins
Dosztanyi (2005) JMB 347, 827
GlobPlot
Globular proteins form regular secondary structures, and different amino acids
have different tendencies to be in them
Compare the tendency of
amino acids:


to be in coil (irregular)
structure.
to be in regular secondary
structure elements
Linding (2003) NAR 31, 3701
Typical output
GlobPlot
downhill regions correspond to
putative domains (GlobDom)
up-hill regions correspond
to predicted protein
disorder
Different flavors of disorder
Short and long disordered regions have different compositional biases
PONDR VSL2
Differences in short and long disorder

amino acid composition

Short disorder is often at the termini

methods trained on one type of dataset tested on other dataset
resulted in lower efficiencies
- Short version – Long version
- PONDR VSL2:
separate predictors for short and long disorder
combined
length independent predictions
Peng (2006) BMC Bioinformatics 7, 208
Evaluation
ROC curve
Prediction of protein disorder
 Disordered is encoded in the amino acid sequence
 Can be predicted from the sequence
 ~80% accuracy
 Large-scale studies
 Evolution
 Function
 Binary classification
Genome level annotations

Bridging over the large number of sequences and the small
number of experimentally verified cases

Combining experiments and predictions

MobiDB: http://mobidb.bio.unipd.it
D2P2: http://d2p2.pro

IDEAL: http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL/


Multiple predictors

How to resolve contradicting experiments/ predictions?

Majority rules
Functions of IDPs
I
II
III
IV
V
Entropic chains
Linkers
Molecular recognition
Protein modifications (e.g. phosphorylation)
Assembly of large multiprotein complexes
Protein interactions of IDPs
Complex between p53 and MDM2
Coupled folding and binding
 Entropic penalty
 Functional advantages
Weak transient, yet specific interactions
 Post-translational modifications
 Flexible binding regions that can overlap
 Evolutionary plasticity

Signaling
Regulation
Binding regions within IDPs



Complexes of IDPs in the PDB:
~ 200
Known instances:
~ 2 000
Estimated number of such interactions in the human
proteome:
~ 1 000 000
 Experimental characterization is very difficult
 Computational methods
Binding regions within IDPs

SLIMs: Short linear motifs
3-11 residues long, average size 6-7 residues
although enriched in IDRs, around 20% are in located within
IDRs

Disordered binding regions, Morfs
undergo disorder to order transition upon binding
usually less then 30 residues, can be up to 70

Intrinsically disordered domains
evolutionary conserved disordered segments
p27
Inhibitor of CDK2-CyclinA complex.
Bioinformatical approaches
(~10, as opposed to the more than 50 disorder prediction methods)
Biophysical properties (ANCHOR)
Machine Learning methods
(MorfPred, Morfchibi, DISOPRED3)
Linear motifs
(Regular Expression, PSSMs)
Conservations patterns (SlimPrints, PhyloHMM )
Prediction of binding sites
located within IDPs




Interaction sites are usually linear (consist of only
1 part)
enrichment of interaction prone amino acids
can be predicted from sequence without
predicting the structure
Heterogeneity



adopted secondary structure
elements
size of the binding regions
flexibility in the bound form
Prediction of disordered
binding regions – ANCHOR
What discriminates disordered binding regions?
• A cannot form enough favorable interactions with
their sequential environment
• It is favorable for them to interact with a globular
protein
Based on simplified physical model
• Based on an energy estimation method using
statistical potentials
• Captures sequential context
ANCHOR
C-terminal region of human p53
Dosztanyi et al. Bioinformatics. 2009;25:2745
DISOPRED3

Uses three SVMs




Simple sequence profile
PSI-Blast profiles (very slow)
PSI-Blast profiles with global features
trained on short chains in complex
Jones DT, Cozzetto D. Bioinformatics. 2015; 31:857
DISOPRED3
Prediction of binding regions
within IDPs
Combined predictions provide more biologically meaningful
predictions
 Lot of rooms for improvements...
 What is the binding partner?